[TIP] Result protocol / data content

Mark Sienkiewicz sienkiew at stsci.edu
Mon Apr 13 12:12:50 PDT 2009

Jesse Noller wrote:
>> Maybe we should work out what this DSL "talks about" independent of
>> the serialization, and consider the serialization/formatting
>> separately? Or should we hash out first if it is going to be Latin-1
>> or UTF-8 or ??
> That's what my goal is/was: define the fields in the file. It just so
> happens my example is somewhat valid YAML.

I'm with you here.  It is _far_ more important to define the content of 
each record than to choose a wire format.

I think the result record is a way to transmit data to your database.  
This is true if your database is objects in memory, dbm files, sql 
based, etc -- it has nothing to do with data storage or access.

My #1 rule of databases:

#1. You don't put data into databases.  You get data out of databases.  
Inserting the data is just a derived requirement.

The important implication is this: To make an effective design, you look 
at the final use of the data and work backward to the data sources.  
Obviously, we all think about this implicitly, but we need to be 
explicit about it from time to time.

With that in mind, I am going to skip right to the report record, but it 
might help to point out what the data is used for.

Here is what I think when I read this record description.  Am I getting 
it right?

> job_id: UUID
> recv_time: FLOAT
> build_id: STR
> machine_id: STR (uuid?)
> execution_line: STR

Just from looking at these names, I assume:

- job_id is an arbitrary number.  Presumably, you have some external 
system that can do something with this number?

- recv_time is ? - time that the request was delivered to the test 
runner.  (distinct from the time that the test runner actually started 
running tests.)

- build_id is ? - a version identifier for the subsystem being tested or 
a version identifier of the test suite?

- machine_id identifies the physical hardware that ran this group of 
tests; if you divide the test run among multiple machines, you must have 
multiple result files?  Or do you include multiple instances of the 
top-level record, one for each machine?  I'm going to expand pandokia's 
knowledge from "host" to "host" for the name of the machine and 
something like "execution environment" which is a name for the actual 
environment it runs in.  There may be multiple execution environments 
per host; I have not worked out all the semantics yet.

- execution_line - command that runs the test set?

We name our test runs.  There should be a place for something like that 
here.  The name of a test run is not necessarily unique, since a test 
run "release_candidate_1" might run in many different environments. 

I think that maybe job_id is a little like the test name that I am 
thinking of, but it is an arbitrary number that has no particular 
meaning to the human looking at the test reports.  That is, I could run 
a set of tests with the same UUID on each of several machines.  Is that 

All of this is just information for the user to look at later.  
Effectively, this first part is a record about the test execution that 
applies to all the test results that follow.

> run_stats:
>    total_tests: INT
>    total_pass: INT
>    total_fail: INT
>    total_other: INT

Why are these totals here?  What does the reader do if these totals do 
not match the data that follows?

>    start_time: FLOAT
>    stop_time: FLOAT
>    total_time: FLOAT

wall clock time as a UTC time_t.

Are you thinking that maybe ( stop_time - start_time ) != total_time?  I 
didn't think of that because it doesn't happen in any of my scenarios.  
I can see how it might happen to some people, though.

I don't collect the total time of the entire test suite.  In my system, 
what most people would call a test suite will run as several 
mostly-unrelated processes, with the results from all the subsets 
aggregated at the end.  There are lots of ways I could use 
start/stop/total time, but I don't really care enough.

I suggest that all times be optional.

The times and stats are just for the user to look at.  Maybe you have a 
way to compare yesterday's times to today's to see trends. 

> test_cases:
>    test:
>        id: STR
>        result: STR
>        result_id: STR
>        start: FLOAT (time.time())
>        stop: FLOAT (time.time())
>        total_time: FLOAT (seconds)
>        additional:
>            coverage_info: Big str
>            stdout: Big Str
>            stderr: Big str

What is the significance of nesting test inside test_cases?  Does the 
name "test_cases" mean something specific?

"id" is the name of the test.  "id" is unique across what scope?  In 
pandokia, we called this the test_name, and they are arranged in a 
hierarchy.  We use "/" or "." to separate levels.  e.g. if there is a 
test named cluster/subsystem_a/test1 - the reporting system knows that 
every name that starts cluster/subsystem_a/ is related for display 
purposes.  A particular installation may not make use of this, but the 
naming convention should be part of the protocol.  We used "/" because 
part of the hierarchy might come from a directory/filename and part 
might be a module.class.function name detected by nose.  On Windows, 
I'll probably just turn \ into /.

result is pass/fail/error/whatever (I see your reply under the subject 
"Result protocol / pass-fail-error")

What is result_id ?  How is it different from result?

Again, maybe ( total_time != stop - start ), times optional.

coverage_info is something about which lines of code, which branches 
were taken, etc?  This is probably the most complicated part of the spec 
here, unless we just define it as a blob to be passed downstream.  If it 
is a blob, I suggest also adding coverage_info_type so that a downstream 
processor can know how to handle coverage_info from various sources.  We 
would define names for known types and have a convention for 
user-defined types.

Should stdout/stderr really be separated?  Sometimes it is nice to see 
where in the test run a particular error occurred.  Sometimes that just 
turns in to a mess.

It takes a few pages to describe what you can do with a test_result 
record, so I won't go in to it here.

> extended:
>    requestor_id: INT
>    artifacts_uri: <your path here>
>    artifacts_chksum: MD5
>    configuration_file: (the config file passed in in my case)

"extended" is not connected to a specific test?  In that case, why is it 
separate from the set of fields at the top?

What is a requestor_id ?  I might expect the name of the person who ran 
the test, but you list an INT here.

artifacts_uri is an arbitrary description of where I can find various 
files left over after the test ran.  artifacts_chksum is a checksum of 
those files (in some deterministic order), not of the URI itself.  I 
would expect this to be attached to a specific test.

You have a single configuration file.  Is this the name of the file or 
the content of the file?  Is this the configuration of the test system, 
or of the items being tested?

Vicki came up with "test configuration attributes" to store information 
gathered by examining the environment.  We don't have a single 
configuration file, and we potentially have lots of things we might 
gather information about.  I don't plan for the reporting system to do 
anything more than pass the information to the user, though.

Some things that I would add to this record format:

- I run the same test on the same software in multiple test 
environments.  (Linux vs Solaris, Python 2.5 vs 2.6, etc)  I need a way 
to identify a set of results from the same test in all the different 
environments.  In pandokia, I called this test_run, which is a name that 
I explicitly assign to a group of related tests.  I suspect this is kind 
of like the job_id, except that I would not expect a field named 
"job_id" that is a "UUID" to have the same value on multiple machines.

- I want a single reporting system that processes data from largely 
unrelated projects.  To do that, the report file must identify which 
project it came from. 

- I have a "location" which is vaguely defined as "a string that 
contains information to help a human find this test definition".  This 
is kind of like artifacts_uri

I think a lot of the fields you define could be optional.  I only 
explicitly mentioned the times, but I think lots of the others could 
too.  In fact, we could probably do better by listing the fields that 
are _not_ optional.  What do you think?

So, how well have I understood your report record?

Mark S.

More information about the testing-in-python mailing list