[khmer] partitioning pipeline output, fastq

Jens-Konrad Preem jpreem at ut.ee
Fri May 24 07:47:13 PDT 2013


On 05/24/2013 05:40 PM, Jordan Fish wrote:
> One thing I'd recommend is to do your mate pair merging -before- digi
> norm and partitioning.  Feed the reads that merge successfully in to
> diginorm first, in line with the put your best data in first.
>
> Jordan
>
> On Fri, May 24, 2013 at 9:06 AM, C. Titus Brown <ctb at msu.edu> wrote:
>> Correct!
>>
>> ---
>> C. Titus Brown, ctb at msu.edu
>>
>> On May 24, 2013, at 8:59, Jens-Konrad Preem <jpreem at ut.ee> wrote:
>>
>>> Hi,
>>> similar to my question about filter-below-abund.by output that got already solved. Thanks!
>>> The input and output for partitioning pipeline as mentioned by your Guide, and example of partitioning large data on your website is fasta formatted file. The next step for partitioned data would be assembly. I am thinking on pre-assembling the mate pairs with FLASH *before full assembly with SoapDenovo2 or Velvet. The input files for FLASH are fastq.
>>>
>>> Do I understand correctly that nothing happens to the sequences themselves during the partitioning- they are just binned/sorted around into groups/partitions?
>>> In such case it should be no problem for me to take the quality scores from the filter-below-abund.py output fastq (the brother of filter-below-abundpy fasta output :D) and then just attach those to the partitioned sequences?
>>>
>>> Jens
>>>
>>>
>>>
>>> * they seem to apply that the genome assembly furhter down the line would be remarkably improved, at least as it is for the case of Soapdenovo, maybe it is not such a case for Velvet, the assembler you have suggested?
>>> Magoč, T., & Salzberg, S. L. (2011). FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics (Oxford, England), 27(21), 2957–63. doi:10.1093/bioinformatics/btr507
>>>
>>> --
>>> Jens-Konrad Preem, MSc., University of Tartu
>>>
>>>
>>> _______________________________________________
>>> khmer mailing list
>>> khmer at lists.idyll.org
>>> http://lists.idyll.org/listinfo/khmer
>> _______________________________________________
>> khmer mailing list
>> khmer at lists.idyll.org
>> http://lists.idyll.org/listinfo/khmer
That is a good idea. So what do you think about following pipeline 
quality control(was thinking Musket*), merging-pairs (FLASH), diginorm 
and partitioning as per "partitioning large datasets"(feed in both 
merged ones and the single ones), assembly (considering Soapdenovo2 or 
Velvet).
Jens
*http://musket.sourceforge.net/homepage.htm#latest

-- 
Jens-Konrad Preem, MSc., University of Tartu





More information about the khmer mailing list