[khmer] abundance-dist for paired end data
Huan Fan
hfan22 at wisc.edu
Fri Oct 18 06:34:22 PDT 2013
Interestingly I ran into the same problem today.
I was also trying to use abundance-dist.py for two pairs of pair-end files.
As described by Nacho, load-into-counting had no problem with taking multiple files. However when I was trying to do the same with abundance-dist by running something like:
$abundance-dist.py -z test.kh pair1_1.fq.gz pair1_2.fq.gz pair2_1.fq.gz pair2_2.fq.gz test.hist
It complained that:
usage: abundance-dist.py [-h] [-z] [-s] hashname datafile histout
abundance-dist.py: error: unrecognized arguments: pair2_1.fq.gz pair2_2.fq.gz test.hist
Then I grouped the fq files:
$abundance-dist.py -z test.kh 'pair1_1.fq.gz pair1_2.fq.gz pair2_1.fq.gz pair2_2.fq.gz' test.hist
It ran for a second and aborted with the following printscreen:
hashtable from test.kh
K: 19
HT sizes: [4000000007, 4000000009, 4000000019, 4000000063]
outputting to test.hist
preparing hist...
python: parsers.cc:308: FastqGzParser::FastqGzParser(const std::string&): Assertion `current_read.name[0] == '@'' failed.
Aborted
I checked my fq files and the reads do begin with '@'. Any idea what is actually going on?
Thanks ahead and happy Friday!
Cheers,
Huan
On 10/18/13, "C. Titus Brown" wrote:
> There is no good way at the moment to give multiple sequence files to
> abundance-dist, sorry :(. You might be able to combine sample.hist files
> afterwards... hmm. But for now, I think interleaving is the best option,
> as Chris says.
>
> best,
> --titus
>
> On Wed, Oct 16, 2013 at 09:02:50PM +0000, Fields, Christopher J wrote:
> > Personally, I interleave the reads and run them together for all the analyses, then split them later (after filtering, etc). But it would be interesting to hear of other options?
> >
> > chris
> >
> > On Oct 16, 2013, at 3:54 PM, Nacho Caballero <nachocab at gmail.com<mailto:nachocab at gmail.com(javascript:main.compose()>>
> > wrote:
> >
> >
> > I?m following the STAMPS tutorial<http://ged.msu.edu/angus/2013-hmp-assembly-webinar/exploring-stamps-data.html> and I created the bloom filter for my two paired end files
> >
> > python load-into-counting.py -x 1e8 -k 20 sample.kh<http://sample.kh/> sample_1.fastq sample_2.fastq
> >
> > Now I?m trying to generate the distribution, but there is no -p option, how should I do it?
> >
> > python abundance-dist.py sample.kh<http://sample.kh/> ??.fastq sample.hist
> >
> > _______________________________________________
> > khmer mailing list
> > khmer at lists.idyll.org<mailto:khmer at lists.idyll.org(javascript:main.compose()>
> > http://lists.idyll.org/listinfo/khmer
> >
>
> > _______________________________________________
> > khmer mailing list
> > khmer at lists.idyll.org
> > http://lists.idyll.org/listinfo/khmer
>
>
> --
> C. Titus Brown, ctb at msu.edu
>
> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer
More information about the khmer
mailing list