[TIP] Result protocol / data content

Mon Apr 13 13:07:49 PDT 2009

On Mon, Apr 13, 2009 at 3:12 PM, Mark Sienkiewicz <sienkiew at stsci.edu> wrote:

> With that in mind, I am going to skip right to the report record, but it
> might help to point out what the data is used for.
>

> Here is what I think when I read this record description.  Am I getting it
> right?
>
>> job_id: UUID
>> recv_time: FLOAT
>> build_id: STR
>> machine_id: STR (uuid?)
>> execution_line: STR
>>
>
> Just from looking at these names, I assume:
>
> - job_id is an arbitrary number.  Presumably, you have some external system
> that can do something with this number?
>

Yes, the reason I put this here is so that a job can be "named" - I
suggested UUID (the client provides this for the executor, or the
executor makes one up, either way). Obviously a None value can be
provided (null) for those who don't need it. Personally I've found
have a unique ID for the job is insanely helpful.

> - recv_time is ? - time that the request was delivered to the test runner.
>  (distinct from the time that the test runner actually started running
> tests.)
>

Correct

> - build_id is ? - a version identifier for the subsystem being tested or a
> version identifier of the test suite?
>

This is the ID/Version number of the System-Under-Test. I don't know
that we need a version for the test suite, although that's not a bad
idea. In my case it would be the svn revision the tests are pulled
from.

> - machine_id identifies the physical hardware that ran this group of tests;
> if you divide the test run among multiple machines, you must have multiple
> result files?  Or do you include multiple instances of the top-level record,
> one for each machine?  I'm going to expand pandokia's knowledge from "host"
> to "host" for the name of the machine and something like "execution
> environment" which is a name for the actual environment it runs in.  There
> may be multiple execution environments per host; I have not worked out all
> the semantics yet.

The machine_id is the ID of the host that executed this *set* of
tests, if you have an executor which then spreads the tests wider than
itself, then each sub-executor would pass it's results to the caller,
and the caller would do aggregation. Make sense?

> - execution_line - command that runs the test set?

This I was on the fence about, but yes. I'm OK with dropping it as
it's an artifact of the site deployment

> We name our test runs.  There should be a place for something like that
> here.  The name of a test run is not necessarily unique, since a test run
> "release_candidate_1" might run in many different environments.
> I think that maybe job_id is a little like the test name that I am thinking
> of, but it is an arbitrary number that has no particular meaning to the
> human looking at the test reports.  That is, I could run a set of tests with
> the same UUID on each of several machines.  Is that right?

I think me putting in UUID is a matter of preference. We can simplify
it and call it a STR - or add job_name ? In my case, it's globally
unique to the run. UUIDs are *very* unique.

> All of this is just information for the user to look at later.  Effectively,
> this first part is a record about the test execution that applies to all the
> test results that follow.

Yes

>> run_stats:
>>   total_tests: INT
>>   total_pass: INT
>>   total_fail: INT
>>   total_other: INT
>>
>
> Why are these totals here?  What does the reader do if these totals do not
> match the data that follows?

These should always match the data below: you calculate them as a
derivative of the items below, this is another "nice to have for human
parsing"

>>   start_time: FLOAT
>>   stop_time: FLOAT
>>   total_time: FLOAT
>>
>
> wall clock time as a UTC time_t.
> Are you thinking that maybe ( stop_time - start_time ) != total_time?  I
> didn't think of that because it doesn't happen in any of my scenarios.  I
> can see how it might happen to some people, though.

I suggested dropping total_time later, and I am thinking those values
can be any value someone likes, in my case, they're calls to
time.time() which is in floating point seconds, which makes sense as
it should be easy to parse/output

> I don't collect the total time of the entire test suite.  In my system, what
> most people would call a test suite will run as several mostly-unrelated
> processes, with the results from all the subsets aggregated at the end.
>  There are lots of ways I could use start/stop/total time, but I don't
> really care enough.

Yeah, I can see that - I like to track the time a suite of tests takes
"overall" so I can track strong degradation

> I suggest that all times be optional.

I would argue that this is the expansive list: technically most of
this could be optional depending on the site deployment, I think this
is going to be a case where we outline "the perfect amount of
information" but people can insert None values where applicable.

> The times and stats are just for the user to look at.  Maybe you have a way
> to compare yesterday's times to today's to see trends.

I do; I can see results from months ago to see how long execution took
of a given suite: provided the tests have not changed, increased time
can be a "rollup" sign of system degradation.

>> test_cases:
>>   test:
>>       id: STR
>>       result: STR
>>       result_id: STR
>>       start: FLOAT (time.time())
>>       stop: FLOAT (time.time())
>>       total_time: FLOAT (seconds)
>>       additional:
>>           coverage_info: Big str
>>           stdout: Big Str
>>           stderr: Big str
>>
>
> What is the significance of nesting test inside test_cases?  Does the name
> "test_cases" mean something specific?

test_cases is an indicator of "what tests were run for this job" - you
could call it "tests" too, I just don't like having "tests" and then
"test" - each execution can comprise multiple tests executing. Or one.

> "id" is the name of the test.  "id" is unique across what scope?  In
> pandokia, we called this the test_name, and they are arranged in a
> hierarchy.  We use "/" or "." to separate levels.  e.g. if there is a test
> named cluster/subsystem_a/test1 - the reporting system knows that every name
> that starts cluster/subsystem_a/ is related for display purposes.  A
> particular installation may not make use of this, but the naming convention
> should be part of the protocol.  We used "/" because part of the hierarchy
> might come from a directory/filename and part might be a
> module.class.function name detected by nose.  On Windows, I'll probably just
> turn \ into /.

They must be unique enough to fit "dictionary" rules - For me, it's
module.TestSuite.test_case_name, users can use whatever names apply to
the deployment - they can pick, they must be unique. This is a draw
back I can see, but then again, I would change the name of a test case
to reflect alterations in behavior.

> result is pass/fail/error/whatever (I see your reply under the subject
> "Result protocol / pass-fail-error")

Yes

> What is result_id ?  How is it different from result?

Ignore that, we dropped it. I wanted it to be a numeric repr of the
PASS|FAIL|etc codes. I like numbers.

> Again, maybe ( total_time != stop - start ), times optional.

I think start and stop are key, but users can ignore/None them

> coverage_info is something about which lines of code, which branches were
> taken, etc?  This is probably the most complicated part of the spec here,
> unless we just define it as a blob to be passed downstream.  If it is a
> blob, I suggest also adding coverage_info_type so that a downstream
> processor can know how to handle coverage_info from various sources.  We
> would define names for known types and have a convention for user-defined
> types.

Note the additional header was meant to indicate "add your information
here!" - you can add anything you want that your upstream consumer can
parse. Same with extended.

> Should stdout/stderr really be separated?  Sometimes it is nice to see where
> in the test run a particular error occurred.  Sometimes that just turns in
> to a mess.

See above.

> It takes a few pages to describe what you can do with a test_result record,
> so I won't go in to it here.
>
>> extended:
>>   requestor_id: INT
>>   artifacts_uri: <your path here>
>>   artifacts_chksum: MD5
>>   configuration_file: (the config file passed in in my case)
>>
>
> "extended" is not connected to a specific test?  In that case, why is it
> separate from the set of fields at the top?

Extended means "your info here" - I gave some examples from my deployment.

> What is a requestor_id ?  I might expect the name of the person who ran the
> test, but you list an INT here.

Typo, should be str

...snip... - Extended is optional, so is advanced

> Vicki came up with "test configuration attributes" to store information
> gathered by examining the environment.  We don't have a single configuration
> file, and we potentially have lots of things we might gather information
> about.  I don't plan for the reporting system to do anything more than pass
> the information to the user, though.

Yeah - me too, but I'd add that to extended and additional on my deployment

>
> Some things that I would add to this record format:
>
> - I run the same test on the same software in multiple test environments.
>  (Linux vs Solaris, Python 2.5 vs 2.6, etc)  I need a way to identify a set
> of results from the same test in all the different environments.  In
> pandokia, I called this test_run, which is a name that I explicitly assign
> to a group of related tests.  I suspect this is kind of like the job_id,
> except that I would not expect a field named "job_id" that is a "UUID" to
> have the same value on multiple machines.

Let's change UUID to String, it can be an arbitrary name which works
for the user

> - I want a single reporting system that processes data from largely
> unrelated projects.  To do that, the report file must identify which project
> it came from.

project_id?

> - I have a "location" which is vaguely defined as "a string that contains
> information to help a human find this test definition".  This is kind of
> like artifacts_uri

I'd say that goes in the "additional" field of the test name - in my
case it's an SVN URI

> I think a lot of the fields you define could be optional.  I only explicitly
> mentioned the times, but I think lots of the others could too.  In fact, we
> could probably do better by listing the fields that are _not_ optional.
>  What do you think?

That argument could go both ways. In my case, I'd define *a lot more*
- but I could see someone else defining less than 50% - as long as the
parser is resilient to this - for example, doing intelligent merging
with what it considers the "complete" record:

pseudo code:

complete_record = "path to protocol template"
this_result = "some result JSON"
our_record = complete_record.update(this_result)

jesse