[khmer] abundance-dist for paired end data

Fri Oct 18 07:19:47 PDT 2013

As mentioned in previous discussion, for now, abundance-dist.py can only accept one sequence file to count it. "Grouping" the fq files as you did will not work either.

You can combine your fastq files into one and run abundance-dist.py, or you can run abundance-dist.py for each fastq file and combine the resulting .hist files.

Hope this helps.

--
Qingpeng Zhang, qingpeng at msu.edu

On Oct 18, 2013, at 9:34 AM, Huan Fan <hfan22 at wisc.edu> wrote:

> Interestingly I ran into the same problem today.
> 
> I was also trying to use abundance-dist.py for two pairs of pair-end files.
> 
> As described by Nacho, load-into-counting had no problem with taking multiple files. However when I was trying to do the same with abundance-dist by running something like:
> 
> $abundance-dist.py -z test.kh pair1_1.fq.gz pair1_2.fq.gz pair2_1.fq.gz pair2_2.fq.gz test.hist
> 
> It complained that: 
> usage: abundance-dist.py [-h] [-z] [-s] hashname datafile histout
> abundance-dist.py: error: unrecognized arguments: pair2_1.fq.gz pair2_2.fq.gz test.hist
> 
> Then I grouped the fq files:
> $abundance-dist.py -z test.kh 'pair1_1.fq.gz pair1_2.fq.gz pair2_1.fq.gz pair2_2.fq.gz' test.hist
> It ran for a second and aborted with the following printscreen:
> 
> hashtable from test.kh
> K: 19
> HT sizes: [4000000007, 4000000009, 4000000019, 4000000063]
> outputting to test.hist
> preparing hist...
> python: parsers.cc:308: FastqGzParser::FastqGzParser(const std::string&): Assertion `current_read.name[0] == '@'' failed.
> Aborted
> 
> I checked my fq files and the reads do begin with '@'. Any idea what is actually going on?
> 
> Thanks ahead and happy Friday!
> 
> Cheers,
> Huan
> 
> On 10/18/13, "C. Titus Brown"  wrote:
>> There is no good way at the moment to give multiple sequence files to
>> abundance-dist, sorry :(. You might be able to combine sample.hist files
>> afterwards... hmm. But for now, I think interleaving is the best option,
>> as Chris says.
>> 
>> best,
>> --titus
>> 
>> On Wed, Oct 16, 2013 at 09:02:50PM +0000, Fields, Christopher J wrote:
>>> Personally, I interleave the reads and run them together for all the analyses, then split them later (after filtering, etc). But it would be interesting to hear of other options?
>>> 
>>> chris
>>> 
>>> On Oct 16, 2013, at 3:54 PM, Nacho Caballero <nachocab at gmail.com<mailto:nachocab at gmail.com(javascript:main.compose()>>
>>> wrote:
>>> 
>>> 
>>> I?m following the STAMPS tutorial<http://ged.msu.edu/angus/2013-hmp-assembly-webinar/exploring-stamps-data.html> and I created the bloom filter for my two paired end files
>>> 
>>> python load-into-counting.py -x 1e8 -k 20 sample.kh<http://sample.kh/> sample_1.fastq sample_2.fastq
>>> 
>>> Now I?m trying to generate the distribution, but there is no -p option, how should I do it?
>>> 
>>> python abundance-dist.py sample.kh<http://sample.kh/> ??.fastq sample.hist
>>> 
>>> _______________________________________________
>>> khmer mailing list
>>> khmer at lists.idyll.org<mailto:khmer at lists.idyll.org(javascript:main.compose()>
>>> http://lists.idyll.org/listinfo/khmer
>>> 
>> 
>>> _______________________________________________
>>> khmer mailing list
>>> khmer at lists.idyll.org
>>> http://lists.idyll.org/listinfo/khmer
>> 
>> 
>> -- 
>> C. Titus Brown, ctb at msu.edu
>> 
>> _______________________________________________
>> khmer mailing list
>> khmer at lists.idyll.org
>> http://lists.idyll.org/listinfo/khmer
> 
> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer