[khmer] filter-below-abund.py fastq scores from previous file

Jens-Konrad Preem jpreem at ut.ee
Thu Apr 11 00:05:07 PDT 2013


I eagerly wait for such functionality, right now I'm in a bit of a 
hurry. Maybe if I have some more free time I can decipher the "below 
-abund" and screed codes eneact my own changes in it if so I'll post it. 
Right now I've made a Perl script (slowish though) to gather the scores 
from keep and give those to names/seqs of keep.below - if they're 
trimmed then I' of course in trouble again, are they trimmed in some 
specific way from ends or something, then it might not be too hard to 
add this trimming function to my script to make the scores to correspond 
to sequence.

In case I fail in my task, have you any suggestions for paired end 
assemblers that don't take quality scores+ I've tried Cope but for some 
reason it produced very little alignments. (Compared to flash for 
example) - I'd like to merge the normalized (and hopefully partitioned) 
pairs before assembly6 with the likes of velvet or SoapDenovo. If all 
goes sideways what are your thoughts on - a) assembly without merging 
pairs, b) merging pairs (with Flash etc.) before any modification by 
khmer (normalisation,parititioning).

Jens-Konrad

On 04/11/2013 03:08 AM, Eric McDonald wrote:
> I believe that the 'filter-below-abund.py' script trims sequences. So, 
> your Perl script may need to also truncate the quality scores line 
> down to the length of the trimmed sequences.
>
> By the way, we are working on getting the scripts to output FASTQ if 
> they receive FASTQ inputs, but that functionality is not ready yet. 
> You're definitely not the only person interested in that 
> functionality. ;-)
>
> Hope that helps,
>   Eric
>
>
>
> On Wed, Apr 10, 2013 at 8:13 AM, Jens-Konrad Preem <jpreem at ut.ee 
> <mailto:jpreem at ut.ee>> wrote:
>
>     Hi,
>     I have just a quick question. Filter-below-abund takes a fastq
>     file and outputs a fasta file.
>     Can I make use of a Perl script that would take the names from the
>     resulting file and add the quality scores from the previous file.
>     As I understand nothing happens to the names or sequences - some
>     of them just get culled.
>
>     I want to try out the normalized data with some paired end reads
>     assemblers that use quality scores/fastq files.
>     I think its easier for me to write such Perl script than to modify
>     filter-below-abund.py to output fastq.
>
>     Not much of a python guy - though it seems that there shouldn't be
>     too much work on replacing screed.fasta with screed.fastq etc.,
>     but I find it is quite often easier to write a few lines than to
>     parse what someone else wrote and why and then try to modify it :D.
>
>     Jens-Konrad Preem, MSc., University of Tartu
>
>
>
>
>     _______________________________________________
>     khmer mailing list
>     khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>     http://lists.idyll.org/listinfo/khmer
>
>
>
>
> -- 
> Eric McDonald
> HPC/Cloud Software Engineer
>   for the Institute for Cyber-Enabled Research (iCER)
>   and the Laboratory for Genomics, Evolution, and Development (GED)
> Michigan State University
> P: 517-355-8733

-- 
Jens-Konrad Preem, MSc, University of Tartu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130411/caaa60c8/attachment-0002.htm>


More information about the khmer mailing list