[TIP] Result protocol / parsers, wire format

Mon Apr 13 09:58:02 PDT 2009

On Mon, Apr 13, 2009 at 12:33 PM, Mark Sienkiewicz <sienkiew at stsci.edu> wrote:

> I chose it for these reasons:
>
> 1. Extremely simple to parse.  Compare with XML, where your first step is to
> spend 2 days trying to figure out how to use the XML library.

As opposed to writing a new parser anytime you try to go an use it
someplace else. Nothing is ever easy to parse, which is why so much
work goes into XML/JSON/etc libraries - they tend to take into account
edge cases/errors no one expects.

Besides, I'm not trying to sandbag pandokia's format - I realize it
works for your use case, all I've been trying to say is "use something
out there already, and not do that work".

Besides, XML and JSON and YAML are *standards*, I don't really want to
make another markup language :(

> 3. If you have a partial file written by a crashed test runner, you can
> append another file, and you can still read everything except the incomplete
> record at the end of the corrupted file.

The same can be said of supporting multiple YAML documents (...
format) or individual JSON objects written to a log.

> 4. It can contain field names that are chosen at run time by the test.

Look at my proposal; I have a field marked additional to allow for
arbitrary fields, I know there's a need for this (I do this myself)

> Currently, it only contains text fields except for start_time and end_time.

I'm a big fan of being able to include real types in the file, so that
less work has to go into the emitter/parser.

> For my purposes, JSON and YAML are a lot like pandokia's format, except:
>
> - braces before/after each record, quotes around strings
> - poor/no error recovery
> - nested records (which I do not need)
> - data types (though the only data type in JSON that I used is "string")

They're not the same thing! They're not even close, and they were
meant for different things. As for error recovery: it's possible to
get a malformed document in *any* format. I'm sure it's safe to say
the pandokia parser could get a malformed document, or at least a
highly corrupted one and have issue with it.

> I looked at yaml.org and followed a few links there to find out about YAML.
>  My initial impression is YAML has these drawbacks:
>
> - "Warning: It is not safe to call yaml.load with any data received from an
> untrusted source!" (http://pyyaml.org/wiki/PyYAMLDocumentation)  This
> differs from JSON, which is always safe to load.

YAML, specifically PyYAML has some additional things (which I like,
others might hate) which allow a degree of python object creation on
load. For example, I can make it create datetime.datetime objects, and
other things by defining them in the YAML file.

JSON does not support this, but given JSON is a subset of YAML (sort
of) I'm actually going to probably support loading either-or, some
times I want JSON for safety, sometimes I want YAML - that's a
deployment question in my mind.

However, the spec we're discussing should stick to pure JSON syntax,
and not pull in all of the magic yaml stuff I like.

> - YAML seems to have more of a learning curve associated with it.  At least,
> that is my impression from looking over the yaml and json web sites.  (Maybe
> the yaml documentation just isn't as clear.)

YAML, in the normal case, is stupid simple. Only when you need the
advanced stuff do you need to know the magic. Sort of like a
programming language I know.

> If YAML has reasonable error recovery, it might be worth the learning curve.
>  Otherwise, JSON might be more appropriate, just for the simplicity.  Of
> course, YAML parsers can read JSON, so if we emit JSON, then a YAML
> application can read it.

See above, in a parser I might build out, I would support JSON mode or
YAML mode, the two are close enough to one another to not have to
worry in many cases.

However, the spec should stick to JSON-compatible fields, as the
advanced YAML stuff doesn't have a place in this.

jesse