[TIP] Which is the proper way to test an "unordered list"

Jorge Vargas jorge.vargas at gmail.com
Fri May 18 06:24:03 PDT 2012


Thank you all for the answers.

This got me thinking about my dataset
- The order is irrelevant, however it's good to have.
- There should be only one of each type but it is not really enforced.
Most checks here are in the for of "value in states" so if someone
makes the mistake of adding the same state twice it doesn't really
affect things.

So a set seems to be the proper construction here. That said I could
stick with a list as there is no real need of the overhead of sets (I
know it's little) but it's really irrelevant to keep it as the only
case where the list will contain the same value twice is by programmer
error. So i believe the sorted(list) solution makes the most sense. It
will catch errors as a state not being "declared" (that's what the
decorator does), catch if the same state is added twice but will not
really care about the order.

On Thu, May 17, 2012 at 7:16 PM, Andrew Dalke <dalke at dalkescientific.com> wrote:
> On May 17, 2012, at 3:30 AM, Jorge Vargas wrote:
>> My question is which will be the proper way to test for this when the
>> "order of the list" is not valid?
>
> The right answer depends on what the data set contains. Others have
> already suggested:
>
>  1) sorting; but that assumes that the objects can be compared
>      by inequality
>
>  2) converting to a set, but that assumes that objects are hashable,
>      and that duplicates can be ignored
>
> You can fix #2 by
>
>  3) use a collections.Counter; this still requires hashable objects
>       but doesn't have the count problem
>
>
> The only generic solution is:
>
>  4) Do the O(n*m) test of checking each item from one list to see
>       if it occurs in the other, removing matches as they are found
>
> Most people are uncomfortable with potentially quadratic worst-case
> times, but with only a couple of states, this isn't a problem.
>
>
> In your case, you can probably use the sets solution without a problem,
> especially if your "next" implementation verifies that there are no
> duplicates in the list. Or as Herman's commented, perhaps your
> implementation is more appropriate with a set in the first place?
>
> Cheers,
>
>                                Andrew
>                                dalke at dalkescientific.com
>
>
>
> _______________________________________________
> testing-in-python mailing list
> testing-in-python at lists.idyll.org
> http://lists.idyll.org/listinfo/testing-in-python



More information about the testing-in-python mailing list