[khmer] partitioning pipeline output, fastq

Jens-Konrad Preem jpreem at ut.ee
Fri May 24 05:59:58 PDT 2013


Hi,
similar to my question about filter-below-abund.by output that got 
already solved. Thanks!
The input and output for partitioning pipeline as mentioned by your 
Guide, and example of partitioning large data on your website is fasta 
formatted file. The next step for partitioned data would be assembly. I 
am thinking on pre-assembling the mate pairs with FLASH *before full 
assembly with SoapDenovo2 or Velvet. The input files for FLASH are fastq.

Do I understand correctly that nothing happens to the sequences 
themselves during the partitioning- they are just binned/sorted around 
into groups/partitions?
In such case it should be no problem for me to take the quality scores 
from the filter-below-abund.py output fastq (the brother of 
filter-below-abundpy fasta output :D) and then just attach those to the 
partitioned sequences?

Jens



* they seem to apply that the genome assembly furhter down the line 
would be remarkably improved, at least as it is for the case of 
Soapdenovo, maybe it is not such a case for Velvet, the assembler you 
have suggested?
Magoč, T., & Salzberg, S. L. (2011). FLASH: fast length adjustment of 
short reads to improve genome assemblies. Bioinformatics (Oxford, 
England), 27(21), 2957–63. doi:10.1093/bioinformatics/btr507

-- 
Jens-Konrad Preem, MSc., University of Tartu





More information about the khmer mailing list