[khmer] InvalidFASTQFileFormat

Michael R. Crusoe mcrusoe at msu.edu
Thu Jan 9 12:29:53 PST 2014


That's great to hear.


On Thu, Jan 9, 2014 at 3:12 PM, Shaolin Wang
<sw4ed at eservices.virginia.edu>wrote:

> Yes, that's true, after disabling threading it is working. Thanks
>
> Best
> Shaolin
>
>
> On Thu, Jan 9, 2014 at 3:10 PM, Michael R. Crusoe <mcrusoe at msu.edu> wrote:
>
>> Hello Dr. Wang,
>>
>> As I don't have your complete sequence file I am not able to reproduce
>> your error. However it does appear to be a known bug. We are tracking this
>> in https://github.com/ged-lab/khmer/issues/249
>>
>> As a workaround you can disable threading with "-T 1" when you run into
>> this error.
>>
>> My apologies for this.
>>
>>
>>  On Wed, Jan 8, 2014 at 11:15 AM, Shaolin Wang <
>> sw4ed at eservices.virginia.edu> wrote:
>>
>>>  I am new, please help, I got InvalidFASTQFileFormat error, the
>>> sequence looks fine, even after I remove that sequence, it still shows the
>>> error, and the error always show on line 16181.
>>>
>>>  [17:27:40]sw4ed at hpcserver:~/Meta/SoilMeta> load-into-counting.py -T 4
>>> -k 20  -N 4 -x 4e9 Sample_2024-1F_1.kh Sample_2024-1F_1.trimmed.fastq
>>>
>>> PARAMETERS:
>>>  - kmer size =    20            (-k)
>>>  - n hashes =     4             (-N)
>>>  - min hashsize = 4e+09         (-x)
>>>
>>> Estimated memory usage is 1.6e+10 bytes (n_hashes x min_hashsize)
>>> --------
>>> Saving hashtable to Sample_2024-1F_1.kh
>>> Loading kmers from sequences in ['Sample_2024-1F_1.trimmed.fastq']
>>> making hashtable
>>> consuming input Sample_2024-1F_1.trimmed.fastq
>>> terminate called after throwing an instance of
>>> 'khmer::read_parsers::InvalidFASTQFileFormat'
>>>   what():  InvalidFASTQFileFormat: illegal sequence letters:
>>> @HISEQ700708:147:D278GACXX:2:1101:6816:3815 2:N:0:ATCACG
>>> Aborted (core dumped)
>>>
>>> [17:44:12]sw4ed at hpcserver:~/Meta/SoilMeta> sed -n '16181,16188p'
>>> Sample_2024-1F_1.trimmed.fastq
>>> @HISEQ700708:147:D278GACXX:2:1101:6816:3815 2:N:0:ATCACG
>>>
>>> ATTGTCTGCGCGTTACGATATTATCAAGAATCGCGACTGGTTATGGTCTCTTACTGCTACAACACTGAACACTAAGACCAAATATGCTAACATTGGCAAC
>>> +
>>>
>>> CCCFFFFFHHHHHIJJJJIJJJJJJJJJJJJJJJJJJJJJIJIIIJIHHHHHFFFFFFFFEEEDDDDDDDDDDCDDDDDDDDDDEEDEDDDDDDDDDDDD
>>> @HISEQ700708:147:D278GACXX:2:1101:6919:3932 1:N:0:ATCACG
>>>
>>> TATTCAGGAAAACCTGCCGCAGACGCTTGGGGTCGCAGGAGATTTCCGGAATGTCTTCGTCATTGTCGATATAGTCCAGTCGAATACCGTCCTGGGCCAG
>>> +
>>> @C at FFFDFFHGFHIGIIJIIIJJJJGIJIJJIFEFHIIGGDHFHHHHFF
>>> <ACAEDDCDD=?A?DD at CBDDBDDCC@CDCDD at B?C at CCB<@DD>C?BDDB
>>>
>>>
>>> --
>>> Shaolin Wang, Ph.D
>>> Research Scientist
>>> Department of Psychiatry & Neurobiology Science
>>> University of Virginia
>>> 1670 Discovery Drive, Suite 110
>>> Charlottesville, VA 22911
>>>  Phone: 434-982-0243
>>> Fax:434-973-7031
>>> E-mail: swang at virginia.edu <sw4ed at eservices.virginia.edu>
>>>
>>> _______________________________________________
>>> khmer mailing list
>>> khmer at lists.idyll.org
>>> http://lists.idyll.org/listinfo/khmer
>>>
>>>
>>
>>
>> --
>> Michael R. Crusoe: Software Engineer and Bioinformatician
>> mcrusoe at msu.edu
>>  @ the Genomics, Evolution, and Development lab; Michigan State University
>> http://ged.msu.edu/     http://orcid.org/0000-0002-2961-9670
>> @biocrusoe <http://twitter.com/biocrusoe>
>>
>
>
>
> --
> Shaolin Wang, Ph.D
> Research Scientist
> Department of Psychiatry & Neurobiology Science
> University of Virginia
> 450 Ray C. Hunt Dr. G170
> Charlottesville, VA 22903
>
> Phone: 434-982-0243
> Fax:434-973-7031
> E-mail: swang at virginia.edu <sw4ed at eservices.virginia.edu>
>



-- 
Michael R. Crusoe: Software Engineer and Bioinformatician  mcrusoe at msu.edu
 @ the Genomics, Evolution, and Development lab; Michigan State University
http://ged.msu.edu/     http://orcid.org/0000-0002-2961-9670
@biocrusoe<http://twitter.com/biocrusoe>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20140109/6d020938/attachment-0002.htm>


More information about the khmer mailing list