[TIP] Announce: Pygora 0.0.1
Ronny Pfannschmidt
Ronny.Pfannschmidt at gmx.de
Sun Oct 17 08:36:30 PDT 2010
On Sat, 2010-10-16 at 13:33 -0400, Alfredo Deza wrote:
>
>
> On Sat, Oct 16, 2010 at 1:12 PM, Ronny Pfannschmidt
> <Ronny.Pfannschmidt at gmx.de> wrote:
>
> On Sat, 2010-10-16 at 12:11 -0400, Alfredo Deza wrote:
> >
> >
> > On Fri, Oct 15, 2010 at 8:23 PM, Ronny Pfannschmidt
> > <Ronny.Pfannschmidt at gmx.de> wrote:
> >
> > On Sun, 2010-02-28 at 16:38 -0500, Alfredo Deza
> wrote:
> > > After Ned Batchelder's talk in Python I spent a
> few minutes
> > talking
> > > about how nice it is to have as much (or even
> more!) test
> > code lines
> > > than source code lines.
> > >
> > > I have put together a small script that will count
> the lines
> > of code
> > > in a given directory and will compare them to your
> test code
> > lines
> > > with a nice output (e.g. percentage, total lines,
> lines per
> > file
> > > etc...).
> > >
> > > Ideally, this could become a plugin for Nose (or
> py.test),
> > but for
> > > now, it will create an executable symlink so you
> can run it
> > anywhere.
> > >
> > > The project page: http://code.google.com/p/pygora/
> > >
> > > I am using distribute to package Pygora, you can
> install it
> > by:
> > >
> > > sudo easy_install pygora
> > >
> > > And, if you were not wondering already, the name
> comes from
> > a *goat*:
> > > Pygora Goat is a cross between the Pygmy Goat and
> the Angora
> > Goat that
> > > produces three distinct kinds of fleece and has
> the smaller
> > size of
> > > the Pygmy.
> > > Source: Wikipedia
> > >
> > >
> > > Pygora could do better in telling what lines to
> count with a
> > nice
> > > regex, but this first release will get you an
> idea,
> > enhancements will
> > > follow.
> >
> >
> > you might want to take a look at py.countloc which
> is shipped
> > by py (and
> > will be in pycmd in future)
> >
> > py.countloc provides way better numbers (imho)
> >
> >
> >
> >
> > Since I had py.test installed I tried out py.countloc, and I
> have a
> > few comments...
> >
> >
> > The idea behind pygora is to have a comparison ratio, to
> know where
> > your test line numbers
> > are related to those of your source code, this is not
> provided by
> > py.countloc
> >
> >
> > In the latest pygora version (you have actually replied to
> the first
> > announcement dated in February)
>
> my misstake, i was in the middle of working trough the backlog
> of the
> ml, and mindlessly replied to the first mail instead of
> finding the
> latest one
>
>
> no big deal :)
>
> > speed was improved a lot, and in fact after benchmarking
> both tools
> > pygora takes almost
> > half the time to go through a path and get the metrics.
> >
> >
> > For example, in a directory with around 1,5 million lines of
> code
> > these are the times I got (best out of 3 runs):
> >
> >
> > py.countloc 4.04s user 0.50s system 99% cpu 4.572 total
> >
> >
> > pygora 2.52s user 0.66s system 97% cpu 3.253 total
>
> interesting timings, i think py.countloc was only optimized
> for accurate
> numbers, not raw speed, i'll take a look at that once my todo
> is shorted
> out
>
>
> Well, I think the speed improvements are a by-product of refactoring
> the previous version.
>
>
>
> >
> >
> > There is also no way to tell py.countloc to avoid printing
> *every*
> > single file it is counting. This is specially
> > annoying for 1.5m lines of code, also, no way of filtering
> or
> > excluding certain files.
> >
> >
> > All of those options are available with pygora (e.g. verbose
> option,
> > filtering etc...) although implementing
> > them in py.countloc shouldn't be a hard thing to do.
> >
> >
> > Having said that, I find that the source line detection in
> py.countloc
> > seems a bit more accurate. I initially had
> > trouble in deciding for example if docstrings should count
> as part of
> > the source code or not (I ended including them).
>
> thats why i wanted to point you at it after trying a
> easy_installed
> pygora
> >
> >
> > In my specific case, I like something fast that reports what
> I need.
> > If I want a ratio to be able to compare, this output
> > seems too much for me:
> >
> >
> > number of testfiles 794
> > number of non-empty testlines 158805
> > number of files 6906
> > number of non-empty lines 1342471
> >
> >
> > Compared to:
> >
> >
> > Test lines = 214669
> > Source lines = 1241449
> > Total lines = 1456118
> >
> >
> > Your test code is 17.29% of your source code
> >
>
> i guess i'm more interested in the other stats,
> usually the ratio of test lines vs tested code lines
> isn't that interesting (i tend to have more test lines than
> code lines)
> whats more interesting to me is the ratio of code/branch
> coverage
>
>
> you mention "ratio" but I just don't see any ratio with py.countloc,
> unless
> I am missing something...
I declared it uninteresting,
to be honest *I* consider it a false/miss-leading metric
(just about as wrong as lines of code for programmer productivity)
although its occasional interesting to know just for deciding where to
go in terms of code/test size vs features
>
>
> I'm also confused here about "code/branch coverage". What do you mean?
> this is *not*
> a coverage tool, it just counts lines and compares them.
sorry, I didn't mean to drag in the tooling types,
just express what *I* consider the
more important metrics around code and tests
(obviously they are solved by a different kind of tool)
currently *I* consider the code line/branch/state coverage
of a test-suite the most important metric,
(assuming correctness)
code size is secondary, although as usual the smaller the better
regards Ronny
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.idyll.org/pipermail/testing-in-python/attachments/20101017/7718d9de/attachment.pgp>
More information about the testing-in-python
mailing list