[khmer] Extract-paired-reads.py after partitioning

Thu Nov 19 00:51:33 PST 2015

Dear Khmer mailing list

I am testing khmer but ran into a problem after partitioning when extracting the paired reads. In the original files there are two paired reads with the name:

@SRR492065.83910 HWI-EAS385_0095_FC:2:2:2631:16174 length=100

After interleaving the names become:
@SRR492065.83910 HWI-EAS385_0095_FC:2:2:2631:16174 length=100/1
@SRR492065.83910 HWI-EAS385_0095_FC:2:2:2631:16174 length=100/2

So far there is no problem. When I run do-partition.py after normalization the .part files contain these reads but they are now annotated, so the names changed to:
@SRR492065.83910 HWI-EAS385_0095_FC:2:2:2631:16174 length=100/1	58132
@SRR492065.83910 HWI-EAS385_0095_FC:2:2:2631:16174 length=100/2	58132

After extracting the partitions these name stay the same and trying to extract the paired reads from the group files gives the error: "raise Exception("no paired reads!? check file formats...")". A quick look at the code tells me that one of the checks that check_is_pair performs is whether the read names end with '/1' and '/2' which is no longer the case after partitioning.

Is this a bug in the code or am I overlooking an option that can prevent this error from occurring?

I am running khmer version 2.0+36.g799039f and screed 0.9

Best regards
Raf
###########################################################
Checked as being free of known viruses.

Scientific Institute of Public Health, Brussels, Belgium
Wetenschappelijk Instituut Volksgezondheid, Brussel, België
Institut scientifique de Santé publique, Bruxelles, Belgique
Visit our website: http://www.wiv-isp.be 
########################################################################
DISCLAIMER: Please see https://www.wiv-isp.be/Pages/EmailDisclaimer.aspx
########################################################################