Not another software testing framework, please¶
Note: testkraut is still in its infancy – some of what is written here could still be an anticipation of the near future.
This is a framework for software testing. That being said, testkraut tries to minimize the overlap with the scopes of unit testing, regression testing, and continuous integration testing. Instead, it aims to complement these kinds of testing, and is able to re-use them, or can be integrated with them.
In a nutshell testkraut helps to facilitate statistical analysis of test results. In particular, it focuses on two main scenarios:
- Comparing results of a single (test) implementation across different or changing computational environments (think: different operating systems, different hardware, or the same machine before an after a software upgrade).
- Comparing results of different (test) implementations generating similar output from identical input (think: performance of various signal detection algorithms).
While such things can be done using other available tools as well, testkraut aims to provide a lightweight (hence portable), yet comprehensive description of a test run. Such a description allows for decoupling test result generation and analysis – opening up the opportunity to “crowd-source” software testing efforts, and aggregate results beyond the scope of a single project, lab, company, or site.
At this point your probably want to Get started (quickly).
Bug tracker | Build status | Documentation | Downloads | PyPi

Wanna help?¶
If you think it would be worthwhile to contribute to this project your input would be highly appreciated. Please report issues, send feature-requests, and pull-request without hesitation!
License¶
All code is licensed under the terms of the MIT license, or some equally liberal alternative license. Please see the COPYING file in the source distribution for more detailed information.
Documentation¶
Get started (quickly)¶
This should not take much time. testkraut contains no compiled code. It should run with Python 2.6 (or later) – although Python 3x hasn’t been tested (yet). If you are running Python 2.6 you should install the argparse package, otherwise you won’t have much fun. Here is a list of things the make life more interesting:
- NumPy
- not strictly required, but strongly recommended. There should be no need to have any particular version.
- SciPy
- will improve the test result reporting – any reasonably recent version should do
- libmagic
- helps to provide more meaningful information on file types
- python-colorama
- for more beautiful console output – but monochrome beings don’t need it
Download ...¶
testkraut is available from PyPi, hence it can be installed with
easy_install
or pip
– the usual way. pip
seems to be a little saner
than the other one, so we’ll use this:
% pip install testkraut
This should download and install the latest version. Depending on where you are
installing you might want to call sudo
for additional force.
pip
will tell you where it installed the main testkraut
script.
Depending on your setup you may want to add this location to your PATH
environment variable.
... and run¶
Now we’re ready to run our first test. The demo
test requires FSL to be
installed and configured to run (properly set FSLDIR
variable and so on...).
The main testkraut script supports a number of commands that are used to prepare
and run tests. A comprehensive listing is available form the help output:
% testkraut --help
To run the demo
test, we need to obtain the required test data first. This
is done by telling testkraut to cache all required files locally:
% testkraut cachefiles demo
It will download an anatomical image from a webserver. However, since the image is the MNI152 template head that comes with FSL, you can also use an existing local file to populate the cache – please explore the options for this command.
Now we are ready to run:
% testkraut execute demo
If FSL is functional, this command will run a few seconds and create a
subdirectory testbeds/demo
with the test in/output and a comprehensive
description of the test run in JSON format:
% ls testbeds/demo
brain_mask.nii.gz brain.nii.gz head.nii.gz spec.json
That is it – for now...
Prototypes of a testkraut user¶
The concerned scientist¶
This scientist came up with a sophisticated data analysis pipeline, consisting of many pieces of software from different vendors. It appears to work correctly (for now). But this scientist is afraid to upgrade any software on the machine, because it might break the pipeline. Rigorous tests would have helped, but “there was no time”. testkraut can help to (semi-automatically) assess the longitudinal stability of analysis results.
The thoughtful software developer¶
For any individual software developer or project it is almost impossible to confirm proper functioning of their software on all possible computing environments. testkraut can help generate informative performance reports that can be send back to a developer and offer a more comprehensive assessment of cross-platform performance.
The careful “downstream”¶
A packager for a software distribution needs to apply a patch to some software to improve its integration into the distribution environment. Of course, such a patch should not have a negative impact on the behavior of the software. testkraut can help to make a comparative assessment to alert the packager if something starts to behave in unexpected ways.
The SPEC¶
A test specification (or SPEC) is both primary input and output data for a test case. As input data a SPEC defines test components, file dependencies, expected test output, and whatever else that is necessary to describe a test case. As test output, a SPEC is an annotated version of the input SPEC, with a detailed descriptions of various properties of observed test components and results. A SPEC is text file in JSON format.
Path specifications for files can make use of environment variables which get
expanded appropriately. The special variable TESTKRAUT_TESTBED_PATH
can be used to reference the directory in which a test is executed.
The following sections provide a summary of all SPEC components.
authors
¶
A JSON object, where keys email addresses of authors of a SPEC and corresponding values are the authors’ names.
dependencies
¶
A JSON object where keys are common names for dependencies of a test case. Values are JSON objects with fields described in the following subsections.
location
¶
Where is the respective namespace?
For executables this may contain absolute paths and/or environment variables
which will be expanded to their actual values during processing. Such variables
should be listed in the environment
section.
type
¶
Could be an executable
or a python_mod
.
optional
¶
A JSON boolean indicating whether an executable is optional (true
),
or required (false
; default). Optional executables are useful for writing
tests that need to accommodate changes in the implementation of the to-be-tested
software.
version_cmd
¶
A JSON string specifying a command that will be executed to determine a
version of an executable that is added as value to the version
field of the
corresponding entry for this executable in the entities
section. If an
output to stderr
is found, it will be used as version. If no stderr
output is found, the output to stdout
will be used.
Alternatively, this may be a JSON array with exactly two values, where
the first value is, again, the command, and the second value is a regular
expression used to extract matching content from the output of this command.
Output channels are evaluated in the same order as above (first stderr
, and
if no match is found stdout
).
version_file
¶
A JSON string specifying a file name. The content of this file will be
added as value to the version
field of the corresponding entry for this
executable in the entities
section.
Alternatively, this may be a JSON array with exactly two values, where the first value is, again, a file name, and the second value is a regular expression used to extract matching content from this file as a version.
Example¶
"executables": {
"$FSLDIR/bin/bet": {
"version_cmd": [
"$FSLDIR/bin/bet2",
"BET \\(Brain Extraction Tool\\) v(\\S+) -"
]
},
"$FSLDIR/bin/bet2": {
"version_file": "$FSLDIR/etc/fslversion"
}
description
¶
A JSON string with a verbal description of the test case. The description should contain information on the nature of the test, any input data files, and where to obtain them (if necessary).
This section is identical in input SPEC and corresponding output SPEC.
entities
¶
A JSON object, where keys are unique identifiers (JSON string),
and values are JSON objects. Identifiers are unique but identicial
for identical entities, even across systems (e.g. the file sha1sum). All items
in this section describe entities of relevance in the context of a test run –
required executables, their shared library dependencies, script interpreters,
operating system packages providing them, and so on. There are various
categories of values in this section that can be distinguished by their
type
field value, and which are described in the following subsections.
This section only exists in output SPECs.
type
: binary
¶
This entity represents a compiled executable. The following fields are supported
path
(JSON string)- Executable path as specified in the input SPEC.
provider
(JSON string)- Identifier/key of an operating system package entry in the
entities
section. realpath
(JSON string)- Absolute path to the binary, with all variables expanded and all symlinks resolved.
sha1sum
(JSON string)- SHA1 hash of the binary file. This is identical to the item key.
shlibdeps
(JSON array)- Identifiers/keys of shared library dependency entries in the
entities
section. version
(JSON string)- Version output generated from the
version_cmd
orversion_file
settings in the input SPEC for the corresponding executable.
type
: deb
or rpm
¶
This entity represents a DEB or RPM package. The following fields are supported
arch
(JSON string)- Identifier for the hardware architecture this package has been compiled for.
name
(JSON string)- Name of the package.
sha1sum
(JSON string)- SHA1 hash for the package.
vendor
(JSON string)- Name of the package vendor.
version
(JSON string)- Package version string.
type
: library
¶
This entity represent a shared library. The types and meaning of the supported
fields are identical to binary
-type entities, except that there is no
version
field.
type
: script
¶
This entity represents an interpreted script. The types and meaning of the
supported fields are identical to binary
-type entities, except that there
is no shlibdeps
field, but instead:
interpreter
(JSON string)- Identifier/key for the script interpreter entry in the
entities
section.
environment
¶
A JSON object, where keys represent names of variables in the system
environment. If the corresponding value is a string the respective variable
will be set to this value prior test execution. If the value is null
any
existing variable of such name will be unset. If the value is true
the
presence of this variable is required and its value is recorded in the protocol.
If the value is false
, the variable is not required and its (optional)
value is recorded.
comparisons
¶
yet to be determined
id
¶
A JSON string with an ID that uniquely identifies the test case.
In a test library the test case needs to be stored in a directory whose name is
equal to this ID, while the SPEC is stored in a file named spec.json
inside
this directory. While not strictly required, it is preferred that this ID is
“human-readable” and carries an reasonable amount of semantic information. For
example: fsl-mcflirt
is a test the is concerned with the MCFlirt component
of the FSL suite.
This section is identical in input SPEC and corresponding output SPEC.
inputs
¶
A JSON object, where keys represent IDs of required inputs for a test
case. Corresponding values are, again, JSON objects with a mandatory
type
field. The value of type
is a JSON string
identifying the type of input. Currently only type file
is supported. For a
file
-type input the following additional fields should be present:
sha1sum
(JSON string)- SHA1 hash that uniquely identifies the input file.
tags
(JSON array)- Optional list of JSON strings with tags categorizing the input (see tags).
url
(JSON string)- URL where the respective file can be downloaded.
value
(JSON string)- name of the input file.
Example¶
"inputs": {
"head.nii.gz": {
"sha1sum": "41d817176ceb99ac051d8bd066b500f3fb89be89",
"type": "file",
"value": "head.nii.gz"
}
}
outputs
¶
This section is very similar to the inputs
section, and may contain similar
information in matching fields with identical semantics. In contrast to
inputs
this section can be substantially extended in the output SPEC. For
example, output files may not have a SHA1 hash specified in the input SPEC, but
a SHA1 hash for the actually observed output file will be stored in the
output’s sha1sum
field. Most importantly, for any output file whose
tags
match one or more of the configured fingerprint generators a fingerprints
field will be added to the
JSON object for the corresponding output file.
fingerprints
¶
The value of this field is a JSON object where keys are names of
fingerprint generators, and values should be JSON objects with a
custom structure that is specific to the particular type of fingerprint.
All fingerprints should contain a version
field (JSON number;
integer) that associates any given fingerprint with the implementation
of the generator that created it.
processes
¶
A JSON object describing causal relationships among test components. Keys are arbitrary process IDs. Values are JSON objects with fields described in the following subsections.
This section is currently not modified or extended during a test run.
argv
(JSON array)argv
-style command specification for a process. For example:["$FSLDIR/bin/bet", "head.nii.gz", "brain", "-m"]
executable
(JSON string)- ID/key of the associated executable from the
executables
section. generates
(JSON array)- IDs/keys of output files (from the
outputs
section) created by this process. started_by
(JSON string)- ID/key of the process (from the same section) that started this process.
uses
(JSON array)- IDs/keys of input files (from the
inputs
section) required by this process.
Example¶
"0": {
"argv": [
"$FSLDIR/bin/bet2",
"head",
"brain",
"-m"
],
"executable": "$FSLDIR/bin/bet2",
"generates": [
"brain.nii.gz",
"brain_mask.nii.gz"
],
"started_by": 1,
"uses": [
"head.nii.gz"
]
},
system
¶
A JSON object listing various properties of the computational environment a test was ran in. This section is added by the test runner and only exists in output SPECs.
tests
¶
A JSON array of JSON objects describing the actual test cases. All (sub-)test cases are executed in order of appearance in the array, in the same test bed, using the same environment. Multiple sub-tests can be used to split tests into sub parts to improve error reporting, while minimizing test SPEC overhead. However, output fingerprinting is only done once after all subtests have completed successfully.
For each JSON object describing a sub-test, the mandatory type
field identifies the kind of test case and the possible content of this section
changes accordingly. Supported scenarios are described in the following
subsections.
For any test type, a test can be marked as an expected failure by adding a field
shouldfail
and setting its value to true
.
An optional field id
can be used to assign a meaningful identifier to a
subtest that is used in the test protocol. If no id
is given, as subtest’s
index in the tests array is used as identifier.
type
: shell
¶
The test case is a shell command. The command is specified in a text field
code
, such as:
"code": "$FSLDIR/bin/bet head.nii.gz brain -m"
In the output SPEC of a test run this section is amended with the following fields:
exitcode
(JSON number; integer)- Exit code for the executed command.
type
: python
¶
Explain me
type
: nipype
¶
Explain me
version
¶
A JSON number (integer) value indicating the version of a SPEC. This version must be incremented whenever a change to a SPEC is done.
This section is identical in input SPEC and corresponding output SPEC.
Output fingerprinting¶
Modules with fingerprint generators:
fingerprints |
|
fingerprints.base |
Writing a custom fingerprinting function¶
Writing a custom fingerprint implementation for a particular kind of output is pretty straightforward. Start by creating a function with the following interface:
def fp_my_fingerprint(fname, fpinfo, tags):
pass
The variable fname
will contain the filename/path of the output for which a
fingerprint shall be created. fpinfo
is an empty dictionary to which the
content of the fingerprint needs to be added. A test runner will add this
dictionary to the fingerprints
section of the respective output file in the
SPEC. The name of the fingerprinting function itself will be used as key for
this fingerprint element in that section. Any fp_
-prefix, as in the example
above, will be stripped from the name. Finally, tags
is a sequence of
Output tags that categorize a file and can be used to adjust the
content of a fingerprint accordingly.
Any fingerprinting function must add a __version__
tag to the fingerprint.
The version must be incremented whenever the fingerprint implementation
changes, to make longitudinal comparisons of test results more accurate.
There is no need to return any value – all content needs to be added to the
fpinfo
dictionary.
A complete implementation of a fingerprinting function that stores the size of an input file could look like this:
>>> import os
>>> def fp_file_size(fname, fpinfo, tags):
... fpinfo['__version__'] = 0
... fpinfo['size'] = os.path.getsize(fname)
>>> #
>>> # test it
>>> #
>>> from testkraut.fingerprints import proc_fingerprint
>>> fingerprints = {}
>>> proc_fingerprint(fp_file_size, fingerprints, 'COPYING')
>>> 'file_size' in fingerprints
True
>>> 'size' in fingerprints['file_size']
True
There is no need to catch exceptions inside fingerprinting functions. The test
runner will catch any exception and everything that has been stored in the
fingerprint content dictionary up to when the exception occurred will be
preserved. The exception itself will be logged in a __exception__
field.
To enable the new fingerprinting function, add it to any appropriate tag in the
fingerprints
section of the configuration file:
[fingerprints]
want size = myownpackage.somemodule.fp_file_size
With this configuration this fingerprint will be generated for any output that
is tagged want size
. It is required that the function is “importable” from
the specified location.
Output tags¶
This glossary lists all known tags that can be used to label test outputs. According to the assigned tags appropriate fingerprinting or evaluation methods are automatically applied to the output data.
- 3D image, 4D image
- a sub-category of volumetric image with a particular number of axes
- columns
- columns of a matrix or an array should be described individually
- nifti1 format
- a file in any variant of the NIfTI1 format
- numeric values
- a file containing an array/matrix of numeric values
- rows
- rows of a matrix or an array should be described individually
- store
- as much of the file content should be keep verbatim in a fingerprint
- text file
- a file with text-only, i.e. non-binary content
- table
- a file with data table layout (if a text format, column names are in first line; uniform but arbitrary delimiter)
- tscores
- values from a Student’s t-distribution
- volumetric image
- a multi-dimensional (three or more) image
- whitespace-separated fields
- data in a structured text format where individual fields are separated by any white-space character(s)
- zscores
- standardized values indicating how many standard deviations an original value is above or below the mean
Terminology¶
- JSON array
- An ordered sequence of values, comma-separated and enclosed in square brackets; the values do not need to be of the same type (for more information see the “JSON” Wikipedia entry section on data types)
- JSON boolean
- Boolean value:
true
orfalse
(for more information see the “JSON” Wikipedia entry section on data types) - JSON number
- Double precision floating-point format in JavaScript (for more information see the “JSON” Wikipedia entry section on data types)
- JSON object
- an unordered collection of key:value pairs with the ‘:’ character separating the key and the value, comma-separated and enclosed in curly braces; the keys must be strings and should be distinct from each other (for more information see the “JSON” Wikipedia entry section on data types)
- JSON string
- Double-quoted Unicode, with backslash escaping (for more information see the “JSON” Wikipedia entry section on data types)
Frequently Asked Questions¶
- Why this name?
- The original aim for this project was to achieve “crowd-sourcing” of software testing efforts. “kraut” is obviously almost a semi-homonym of “crowd”, while at the same time indicating that this software spent its infancy at the Institute of Psychology Magdeburg, Germany.
Design¶
Goal¶
This aims to be a tool for testing real-world (integrated) software environments with heterogeneous components from different vendors. It does not try to be
- a unit test framework (you better pick one for you programming language of choice)
- a continuous integration testing framework (take a look at Jenkins or buildbot)
- a test framework for individual pieces of software (although that could work)
Instead, this tool is targeting the evaluation of fully deployed software on “production” systems. It aims at verifying proper functioning (or unchanged behavior) of software systems comprised of components that were not specifically designed or verified to work with each other.
Objectives¶
- Gather comprehensive information about the software environment
- Integrate test case written in arbitrary languages or toolkits with minimal overhead
- Make it possible to easily deploy the system on users’ machines to verify their environments
Dump of discussion with Satra¶
- NiPyPE will do the provenance, incl. the gathering of system/env information
- anything missing in this domain needs to be added to nipype
- a test is a json file
- test code is stored directly in the json file
- tests will typically be nipype workflows
- individual tests will not depend on other tests (although a test runner could resolve data dependencies with outputs from other tests)
- a tests definition specifies: test inputs, test dependencies (e.g. software), and an (optional) evaluative statement
Dump of discussion with Alex¶
- A test fails or passes
- Evaluation assesses the quality of the test results (but doesn’t necessarily let a test fail)
- Dashboard-level evaluation will provide highly aggregated analysis (e.g. distributions of evaluation metrics)
- Threshold levels for evaluation might need to be pulled from the dashboard
- Compare test output spec to actual content of the testbed after a test run
- Write little tool to check a test spec for comprehensive usage of all test output in evaluations
Generate test descriptions¶
cde -o /tmp/betcde/ bet head.nii.gz brain find /tmp/betcde/cde-root -executable -a -type f -a ! -name ‘.cde’ -a ! -name ‘.so’
- “depends”: [
- {
- “type”: “executable”, “path”: “$FSLDIR/bin/remove_ext”, “dpkg”: “fsl-5.0”
}
]
via Tomoyo (Just an idea)¶
Tomoyo is a lightweight and easy-to-use MAC (Mandatory Access Control) system for Linux, available in stock Linux kernel and tools shipped on Debian. In Learning mode it can easily collect provenance information on what executables/libraries were used for a particular parent process, what files were accessed, environment variables, etc.
- Pros:
- should have virtually no run-time impact
- Cons:
- might require admin privileges to get into learning mode and harvest result information
On SPECs¶
All nested dicts – except for leaves of the tree. That implies that no list can be used anywhere inside the tree!!