[TIP] Result protocol / data content
Mark Sienkiewicz
sienkiew at stsci.edu
Mon Apr 13 12:12:50 PDT 2009
Jesse Noller wrote:
>
>> Maybe we should work out what this DSL "talks about" independent of
>> the serialization, and consider the serialization/formatting
>> separately? Or should we hash out first if it is going to be Latin-1
>> or UTF-8 or ??
>>
>
> That's what my goal is/was: define the fields in the file. It just so
> happens my example is somewhat valid YAML.
>
I'm with you here. It is _far_ more important to define the content of
each record than to choose a wire format.
I think the result record is a way to transmit data to your database.
This is true if your database is objects in memory, dbm files, sql
based, etc -- it has nothing to do with data storage or access.
My #1 rule of databases:
#1. You don't put data into databases. You get data out of databases.
Inserting the data is just a derived requirement.
The important implication is this: To make an effective design, you look
at the final use of the data and work backward to the data sources.
Obviously, we all think about this implicitly, but we need to be
explicit about it from time to time.
With that in mind, I am going to skip right to the report record, but it
might help to point out what the data is used for.
Here is what I think when I read this record description. Am I getting
it right?
> job_id: UUID
> recv_time: FLOAT
> build_id: STR
> machine_id: STR (uuid?)
> execution_line: STR
>
Just from looking at these names, I assume:
- job_id is an arbitrary number. Presumably, you have some external
system that can do something with this number?
- recv_time is ? - time that the request was delivered to the test
runner. (distinct from the time that the test runner actually started
running tests.)
- build_id is ? - a version identifier for the subsystem being tested or
a version identifier of the test suite?
- machine_id identifies the physical hardware that ran this group of
tests; if you divide the test run among multiple machines, you must have
multiple result files? Or do you include multiple instances of the
top-level record, one for each machine? I'm going to expand pandokia's
knowledge from "host" to "host" for the name of the machine and
something like "execution environment" which is a name for the actual
environment it runs in. There may be multiple execution environments
per host; I have not worked out all the semantics yet.
- execution_line - command that runs the test set?
We name our test runs. There should be a place for something like that
here. The name of a test run is not necessarily unique, since a test
run "release_candidate_1" might run in many different environments.
I think that maybe job_id is a little like the test name that I am
thinking of, but it is an arbitrary number that has no particular
meaning to the human looking at the test reports. That is, I could run
a set of tests with the same UUID on each of several machines. Is that
right?
All of this is just information for the user to look at later.
Effectively, this first part is a record about the test execution that
applies to all the test results that follow.
> run_stats:
> total_tests: INT
> total_pass: INT
> total_fail: INT
> total_other: INT
>
Why are these totals here? What does the reader do if these totals do
not match the data that follows?
> start_time: FLOAT
> stop_time: FLOAT
> total_time: FLOAT
>
wall clock time as a UTC time_t.
Are you thinking that maybe ( stop_time - start_time ) != total_time? I
didn't think of that because it doesn't happen in any of my scenarios.
I can see how it might happen to some people, though.
I don't collect the total time of the entire test suite. In my system,
what most people would call a test suite will run as several
mostly-unrelated processes, with the results from all the subsets
aggregated at the end. There are lots of ways I could use
start/stop/total time, but I don't really care enough.
I suggest that all times be optional.
The times and stats are just for the user to look at. Maybe you have a
way to compare yesterday's times to today's to see trends.
> test_cases:
> test:
> id: STR
> result: STR
> result_id: STR
> start: FLOAT (time.time())
> stop: FLOAT (time.time())
> total_time: FLOAT (seconds)
> additional:
> coverage_info: Big str
> stdout: Big Str
> stderr: Big str
>
What is the significance of nesting test inside test_cases? Does the
name "test_cases" mean something specific?
"id" is the name of the test. "id" is unique across what scope? In
pandokia, we called this the test_name, and they are arranged in a
hierarchy. We use "/" or "." to separate levels. e.g. if there is a
test named cluster/subsystem_a/test1 - the reporting system knows that
every name that starts cluster/subsystem_a/ is related for display
purposes. A particular installation may not make use of this, but the
naming convention should be part of the protocol. We used "/" because
part of the hierarchy might come from a directory/filename and part
might be a module.class.function name detected by nose. On Windows,
I'll probably just turn \ into /.
result is pass/fail/error/whatever (I see your reply under the subject
"Result protocol / pass-fail-error")
What is result_id ? How is it different from result?
Again, maybe ( total_time != stop - start ), times optional.
coverage_info is something about which lines of code, which branches
were taken, etc? This is probably the most complicated part of the spec
here, unless we just define it as a blob to be passed downstream. If it
is a blob, I suggest also adding coverage_info_type so that a downstream
processor can know how to handle coverage_info from various sources. We
would define names for known types and have a convention for
user-defined types.
Should stdout/stderr really be separated? Sometimes it is nice to see
where in the test run a particular error occurred. Sometimes that just
turns in to a mess.
It takes a few pages to describe what you can do with a test_result
record, so I won't go in to it here.
> extended:
> requestor_id: INT
> artifacts_uri: <your path here>
> artifacts_chksum: MD5
> configuration_file: (the config file passed in in my case)
>
"extended" is not connected to a specific test? In that case, why is it
separate from the set of fields at the top?
What is a requestor_id ? I might expect the name of the person who ran
the test, but you list an INT here.
artifacts_uri is an arbitrary description of where I can find various
files left over after the test ran. artifacts_chksum is a checksum of
those files (in some deterministic order), not of the URI itself. I
would expect this to be attached to a specific test.
You have a single configuration file. Is this the name of the file or
the content of the file? Is this the configuration of the test system,
or of the items being tested?
Vicki came up with "test configuration attributes" to store information
gathered by examining the environment. We don't have a single
configuration file, and we potentially have lots of things we might
gather information about. I don't plan for the reporting system to do
anything more than pass the information to the user, though.
Some things that I would add to this record format:
- I run the same test on the same software in multiple test
environments. (Linux vs Solaris, Python 2.5 vs 2.6, etc) I need a way
to identify a set of results from the same test in all the different
environments. In pandokia, I called this test_run, which is a name that
I explicitly assign to a group of related tests. I suspect this is kind
of like the job_id, except that I would not expect a field named
"job_id" that is a "UUID" to have the same value on multiple machines.
- I want a single reporting system that processes data from largely
unrelated projects. To do that, the report file must identify which
project it came from.
- I have a "location" which is vaguely defined as "a string that
contains information to help a human find this test definition". This
is kind of like artifacts_uri
I think a lot of the fields you define could be optional. I only
explicitly mentioned the times, but I think lots of the others could
too. In fact, we could probably do better by listing the fields that
are _not_ optional. What do you think?
So, how well have I understood your report record?
Mark S.
More information about the testing-in-python
mailing list