[ged] Journal Club paper for next week

Jason Pell jason.pell at gmail.com
Tue Sep 13 07:46:57 PDT 2011


Also, can someone do me a favor and bring one of the adapters for the
projector so that I can connect my MacBook Air to it? I don't have
one...

Thanks,
Jason

2011/9/12 Jason Pell <jason.pell at gmail.com>:
> I have room 2235 reserved every other week starting tomorrow at 1pm.
> See you then!
>
> Jason
>
> 2011/9/6 Jason Pell <jason.pell at gmail.com>:
>> Hi everyone,
>>
>> Next week, I will be presenting the paper "Efficient construction of
>> an assembly string graph using the FM-index" by Jared Simpson (author
>> of ABySS assembler) and Richard Durbin (well-known bioinformatician).
>> Abstract and link below.
>>
>> Motivation: Sequence assembly is a difficult problem whose importance
>> has grown again recently as the cost of sequencing has dramatically
>> dropped. Most new sequence assembly software has started by building a
>> de Bruijn graph, avoiding the overlap-based methods used previously
>> because of the computational cost and complexity of these with very
>> large numbers of short reads. Here, we show how to use suffix
>> array-based methods that have formed the basis of recent very fast
>> sequence mapping algorithms to find overlaps and generate assembly
>> string graphs asymptotically faster than previously described
>> algorithms.
>>
>> Results: Standard overlap assembly methods have time complexity O(N2),
>> where N is the sum of the lengths of the reads. We use the
>> Ferragina-Manzini index (FM-index) derived from the Burrows-Wheeler
>> transform to find overlaps of length at least τ among a set of reads.
>> As well as an approach that finds all overlaps then implements
>> transitive reduction to produce a string graph, we show how to output
>> directly only the irreducible overlaps, significantly shrinking memory
>> requirements and reducing compute time to O(N), independent of depth.
>> Overlap-based assembly methods naturally handle mixed length read
>> sets, including capillary reads or long reads promised by the third
>> generation sequencing technologies. The algorithms we present here
>> pave the way for overlap-based assembly approaches to be developed
>> that scale to whole vertebrate genome de novo assembly.
>>
>> http://bioinformatics.oxfordjournals.org/content/26/12/i367.abstract
>>
>> Jason
>>
>



More information about the ged-jclub mailing list