<div dir="ltr">Yes.<div><br></div><div style>Default k = 32.</div><div style>Full sequence length is 75. 75 - 32 = 43.</div><div style>Trimmed sequence length is 37. 37 -32 = 5.</div><div style>To get a median of &gt;0, you need a trimmed sequence length of at least 53, because 43 / 2 = 21, 21 - 5 = 16, 37 + 16 = 53.</div>

<div style><br></div><div style>For example, try this as your sequence in test1.fa:</div><div style>  AAGACACCCTCACCCCTAGCTGCGCGAGGCCCTCTCCCCTGGGTAGAGGGTCA</div><div style><br></div><div style>Nothing wrong with the code - just your math skills, Adina. :-)</div>

<div style><br></div><div style>Hope that helps,</div><div style>  Eric</div><div style><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Mar 13, 2013 at 12:03 AM, Adina Chuang Howe <span dir="ltr">&lt;<a href="mailto:adina.chuang@gmail.com" target="_blank">adina.chuang@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">It is definitely reproducible.  And I have at least one sequence that can be identified that is causing this.<div><br></div>

<div>Here&#39;s what I&#39;m seeing:  (these are on the same HPC scratch space as before)</div><div>

<br></div><div>test1.fa contains only (37 bp):</div><div><br></div><div><div>&gt;SRR172902.316476<span style="white-space:pre-wrap">        </span>42391</div><div>AAGACACCCTCACCCCTAGCTGCGCGAGGCCCTCTCC</div></div>

<div><br></div><div>SRR172902.316476.single.fq contains the untrimmed same:</div><div><br></div><div><div>@SRR172902.316476 USI-EAS376:1:5:311:233 length=75</div><div>AAGACACCCTCACCCCTAGCTGCGCGAGGCCCTCTCCCCTGGGTAGAGGGTCAAACAGCGCAAGGCAACAGATCG</div>


<div>+SRR172902.316476 USI-EAS376:1:5:311:233 length=75</div><div>BBBBBABBB@BBBBBB&gt;B&gt;B=AA&gt;B?A=4ABA;B&gt;&gt;AAA79;;068799;2;====&gt;3&gt;9&gt;7922=;739#####</div><div><br></div><div class="gmail_quote">python sweep-reads3.py test1.fa SRR172902.316476.single.fq </div>


<div class="gmail_quote"><br></div><div class="gmail_quote">results in an empty test1.fa.sweep</div><div class="gmail_quote"><br></div><div class="gmail_quote">But...</div><div class="gmail_quote"><br></div><div class="gmail_quote">


python sweep-reads3.py SRR172902.316476.single.fq SRR172902.316476.single.fq</div><div class="gmail_quote"><br></div><div class="gmail_quote">Results in</div><div class="gmail_quote"><br></div><div class="gmail_quote"><div class="gmail_quote">


==&gt; SRR172902.316476.single.fq.sweep3 &lt;==</div><div class="gmail_quote">&gt;SRR172902.316476</div><div class="gmail_quote">AAGACACCCTCACCCCTAGCTGCGCGAGGCCCTCTCCCCTGGGTAGAGGGTCAAACAGCGCAAGGCAACAGATCG</div></div><div class="gmail_quote">


<br></div><div class="gmail_quote">Any clue?</div><div><div class="h5"><div class="gmail_quote"><br></div><div class="gmail_quote"><br></div><div class="gmail_quote">On Sun, Mar 10, 2013 at 11:33 PM, C. Titus Brown <span dir="ltr">&lt;<a href="mailto:ctb@msu.edu" target="_blank">ctb@msu.edu</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div>On Thu, Mar 07, 2013 at 02:10:17PM -0500, Adina Chuang Howe wrote:<br>

&gt; Possible bug in sweep-reads...I&#39;m not recovering the partitioned reads<br>

&gt; from the original dataset.<br>

&gt;<br>

&gt; First observed this when I looked at lotsa partitions and trying to<br>

&gt; recover swept reads - some swept files would show up empty:<br>

&gt;<br>

&gt; command:<br>

&gt; python sweep-reads3.py -N 4 -k 32 -x 1e9<br>

&gt; /mnt/research/gpgc/hmp-mock-partitions/001264-files/no-sweep-pids/pid*fa<br>

&gt; /mnt/research/gpgc/hmp-mock-partitions/SRR-combined.fastq<br>

&gt;<br>

&gt; troubleshooting:<br>

&gt; Then I looked at just one partition:<br>

&gt; on HPC:  /mnt/scratch/howead/test<br>

&gt; python sweep-reads3.py pid-42391.fa SRR-combined.fastq<br>

&gt;<br>

&gt; And resulting sweepfile is empty.<br>

&gt;<br>

&gt; If I run:<br>

&gt; python sweep-reads3.py pid-42391.fa pid-42391.fa<br>

&gt;<br>

&gt; Behavior is correct.<br>

<br>

</div></div>Very weird.  I don&#39;t see anything obviously wrong in the script, which<br>

just means it&#39;s a subtle and deep bug.<br>

<br>

Two questions --<br>

<br>

 - is it reproducible, i.e. do you get the same results every time you<br>

   run it? (please yes)<br>

<br>

 - can you break it down to a smaller failure point than with SRR-combined,<br>

   e.g. maybe a few hundred k reads?<br>

<div><div><br>

thanks,<br>

--titus<br>

--<br>

C. Titus Brown, <a href="mailto:ctb@msu.edu" target="_blank">ctb@msu.edu</a><br>

</div></div></blockquote></div><br></div></div></div>

</blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr"><div>Eric McDonald</div><div>HPC/Cloud Software Engineer</div><div>  for the Institute for Cyber-Enabled Research (iCER)</div><div>  and the Laboratory for Genomics, Evolution, and Development (GED)</div>

<div>Michigan State University</div><div>P: 517-355-8733</div></div>

</div>