[socal-piggies] Treaps vs heaps for best m of n problems
Darius Bacon
darius at accesscom.com
Sat Jan 16 12:12:35 PST 2010
> From: Dan Stromberg <strombrg at gmail.com>
>
> It was suggested that a heap or priority queue might work well for what
> I was using treaps for.
>
> Reminder: n is the number of files examined, m is the number of the
> largest files you're interested in, and m is often much smaller than n.
>
> IINM, a heap would allow a nice O(n*log(n)) algorithm, but wouldn't work
> that well for an O(n*log(m)) - because I believe that finding and
> evicting the smallest value in a max heap is O(n). In "highest", the
> smallest value is evicted after an insertion, to keep the datastructure
> from becoming larger than it needs to be - we can do this, because we
> know we only care about the 1000 (or whatever) largest values.
I'm likely misunderstanding since I missed the meeting, but:
Isn't the idea to keep a *min*-heap of the up-to-m largest values seen
so far? On considering each of the n inputs, you compare to the
smallest of the m largest values so far and replace it if the current
input is larger. That's O(n*log(m)) time with a good heap type like
heapq.
Darius
More information about the socal-piggies
mailing list