[khmer] abundance-dist for paired end data

Fri Oct 18 06:34:22 PDT 2013

Interestingly I ran into the same problem today.

I was also trying to use abundance-dist.py for two pairs of pair-end files.

As described by Nacho, load-into-counting had no problem with taking multiple files. However when I was trying to do the same with abundance-dist by running something like:

$abundance-dist.py -z test.kh pair1_1.fq.gz pair1_2.fq.gz pair2_1.fq.gz pair2_2.fq.gz test.hist

It complained that: 
usage: abundance-dist.py [-h] [-z] [-s] hashname datafile histout
abundance-dist.py: error: unrecognized arguments: pair2_1.fq.gz pair2_2.fq.gz test.hist

Then I grouped the fq files:
$abundance-dist.py -z test.kh 'pair1_1.fq.gz pair1_2.fq.gz pair2_1.fq.gz pair2_2.fq.gz' test.hist
It ran for a second and aborted with the following printscreen:

hashtable from test.kh
K: 19
HT sizes: [4000000007, 4000000009, 4000000019, 4000000063]
outputting to test.hist
preparing hist...
python: parsers.cc:308: FastqGzParser::FastqGzParser(const std::string&): Assertion `current_read.name[0] == '@'' failed.
Aborted

I checked my fq files and the reads do begin with '@'. Any idea what is actually going on?

Thanks ahead and happy Friday!

Cheers,
Huan

On 10/18/13, "C. Titus Brown"  wrote:
> There is no good way at the moment to give multiple sequence files to
> abundance-dist, sorry :(. You might be able to combine sample.hist files
> afterwards... hmm. But for now, I think interleaving is the best option,
> as Chris says.
> 
> best,
> --titus
> 
> On Wed, Oct 16, 2013 at 09:02:50PM +0000, Fields, Christopher J wrote:
> > Personally, I interleave the reads and run them together for all the analyses, then split them later (after filtering, etc). But it would be interesting to hear of other options?
> > 
> > chris
> > 
> > On Oct 16, 2013, at 3:54 PM, Nacho Caballero <nachocab at gmail.com<mailto:nachocab at gmail.com(javascript:main.compose()>>
> > wrote:
> > 
> > 
> > I?m following the STAMPS tutorial<http://ged.msu.edu/angus/2013-hmp-assembly-webinar/exploring-stamps-data.html> and I created the bloom filter for my two paired end files
> > 
> > python load-into-counting.py -x 1e8 -k 20 sample.kh<http://sample.kh/> sample_1.fastq sample_2.fastq
> > 
> > Now I?m trying to generate the distribution, but there is no -p option, how should I do it?
> > 
> > python abundance-dist.py sample.kh<http://sample.kh/> ??.fastq sample.hist
> > 
> > _______________________________________________
> > khmer mailing list
> > khmer at lists.idyll.org<mailto:khmer at lists.idyll.org(javascript:main.compose()>
> > http://lists.idyll.org/listinfo/khmer
> > 
> 
> > _______________________________________________
> > khmer mailing list
> > khmer at lists.idyll.org
> > http://lists.idyll.org/listinfo/khmer
> 
> 
> -- 
> C. Titus Brown, ctb at msu.edu
> 
> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer