[socal-piggies] Treaps vs heaps for best m of n problems

Mon Jan 25 16:38:51 PST 2010

Darius Bacon wrote:
>> From: Dan Stromberg <strombrg at gmail.com>
>>    2. For the other application (not yet on the web) named gprog, I
>>       believe the sometimes useful trick of storing a minheap of the m
>>       largest values wouldn't work because you often need to evict old,
>>       weak values and also often need to get a copy of the strongest
>>       value - that is, you need both the least and the greatest to
>>       obtain a safe blocksize choice and a new blocksize addition). 
>>       gprog is a sort of bandwidth meter for pipes - a GUI one that uses
>>       a cache oblivious algorithm to avoid becoming a bottleneck... So
>>       it forms lots of guesses as to good transfer size and keeps around
>>       the best m guesses out of n (where n is the number of block size
>>       guesses tried) for later   It also benefits from being able to
>>       detect collisions between the randomly generated current transfer
>>       size, and a previously-guessed transfer size.
>>     
>
> Ah, so I was missing context -- sorry. It's neat how the treap can
> handle this well while still relatively simple to code.
>
> I wonder how it compares to a dedicated double-ended-priority-queue
> structure like
> http://www.diku.dk/forskning/performance-engineering/Jesper/heaplab/heapsurvey_html/node11.html
> but you might have covered that too and I really should have come to
> the meeting! I get the impression it'd use less space by omitting the
> random priorities, and run updates in comparable time, but
> deterministically and with O(1) for min and max; OTOH it's probably
> not worth the bother coding up when a balanced-tree structure like
> treaps has more uses -- like detecting collisions in this case.
>
>   
>> I've today added a third mode to "highest" that uses heapq (not yet on 
>> the web); a pair of performance graphs should complete within a few 
>> days.  /Initial/ indications are that, at least for varied n, heapq 
>> yields an implementation that's in between the insort and treap 
>> implementations in performance, though closer to the insort times than 
>> the treap times.
>>     
>
> Interesting.
>
> Darius
>
> _______________________________________________
> socal-piggies mailing list
> socal-piggies at lists.idyll.org
> http://lists.idyll.org/listinfo/socal-piggies
>   
I've finally got the performance comparison done:

http://stromberg.dnsalias.org/~dstromberg/highest/

I believe that a double ended priority queue implemented as a minmaxheap 
would give very nice lookups of the best and worst values, but O(n) 
lookups of arbitrary values.  Also, looking up arbitrary values would 
seem to defeat the abstraction.

It's quite possible that the heap would perform closer to the treap if I 
were storing instances of a class in the treap too.  As it is now, I'm 
storing tuples in the treap and instances of a class in the heap.  The 
thing is, the treap's additional flexibility makes it more viable to use 
a less flexible (and faster) tuple instead of a class.