[khmer] slice-reads-by-coverage.py on PE data question

Fields, Christopher J cjfields at illinois.edu
Fri Jun 5 09:07:20 PDT 2015


We’re currently going the EC route now (in our spare time) but will definitely try diginorn.  Thx!

chris

> On Jun 5, 2015, at 10:32 AM, C. Titus Brown <ctbrown at ucdavis.edu> wrote:
> 
> My suggestion: try diginorm and/or error correction.
> 
> --titus
> 
> On Fri, Jun 05, 2015 at 03:30:37PM +0000, Fields, Christopher J wrote:
>> Cool, that does seem like a better path!  Thanks Titus.
>> 
>> We have been planning on trying something like this on a particularly thorny plant genome assembly that's been on our back-burner for a bit.  We???ve been nicknaming it ???Nessie???, primarily b/c the kmer distribution resembles fuzzy pics of the Loch Ness monster (has at least three significant peaks).  Probably a nasty combination of being highly heterozygous and having large-scale genome duplications; it???s supposed to be diploid but then again we know how that sometimes turns out w/ plants.  
>> 
>> chris
>> 
>>> On Jun 5, 2015, at 9:44 AM, C. Titus Brown <ctbrown at ucdavis.edu> wrote:
>>> 
>>> Diane, Chris --
>>> 
>>> Hmm, the following could work if you don't mind losing orphaned reads:
>>> 
>>> interleave-reads.py => interleaved reads.
>>> 
>>> slice-reads-by-coverage.py => "broken paired" reads, where pairs remain
>>>   next to each other but there are lots of orphans.
>>> 
>>> extract-paired-reads.py => separate into still-paired (.pe) and orphaned (.se)
>>>   reads.
>>> 
>>> If you want to always retain the pair if either has the right coverage, that
>>> would require modifications to the script or a more complex workflow.  While
>>> modifying the script is probably a good idea, we may not have time to do so in
>>> the next week or three, though.
>>> 
>>> Diane, how about this - see if you can get the workflow above to work and
>>> give decent results (I would suggest plotting the coverage distribution of
>>> the .pe file as one way to evaluate), and if not, we can do the script
>>> modification for you.
>>> 
>>> --titus
>>> 
>>> On Fri, Jun 05, 2015 at 02:14:11PM +0000, Fields, Christopher J wrote:
>>>> I have used split-paired-reads.py for this purpose when normalizing PE reads, I assume it should work the same here.
>>>> 
>>>> chris
>>>> 
>>>> On Jun 5, 2015, at 8:38 AM, Diane Hatziioanou <dianehioanou at gmail.com<mailto:dianehioanou at gmail.com>> wrote:
>>>> 
>>>> Hello all again,
>>>> 
>>>> I have a question.
>>>> I want to use the slice-reads-by-coverage.py but I've got PE data which I would like to keep as PE data. Is slice-reads-by-coverage.py able to deal with interleaved PE data and keep it PE, can it manage it in another format or am I asking for too much and would have to use single ends and try pairing them back after its done?
>>>> 
>>>> Thanks,
>>>> Diane
>>>> 
>>>> --
>>>> Dr Diane Hatziioanou
>>>> Greek Mobile: (+30)6909403373
>>>> UK Mobile: (+44)7779516625
>>>> www.linkedin.com/in/dhatziioanou/<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_in_dhatziioanou_&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=dRcvnkEofGJzLdh7UkycULWyPkXIh41x5bZwerAUmho&s=HkGDnjft354ZjCH76btBvDSlqJalDCBYdlxfZezC5A4&e=>
>>>> https://twitter.com/DianeHIoanou<https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_DianeHIoanou&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=dRcvnkEofGJzLdh7UkycULWyPkXIh41x5bZwerAUmho&s=IXFKb657NAEdn_sXAHqCdOLY8MG2FfADm3AWWqWIMAk&e=>
>>>> 
>>>> _______________________________________________
>>>> khmer mailing list
>>>> khmer at lists.idyll.org<mailto:khmer at lists.idyll.org>
>>>> http://lists.idyll.org/listinfo/khmer
>>>> 
>>> 
>>>> _______________________________________________
>>>> khmer mailing list
>>>> khmer at lists.idyll.org
>>>> http://lists.idyll.org/listinfo/khmer
>>> 
>>> 
>>> -- 
>>> C. Titus Brown, ctbrown at ucdavis.edu
>> 
> 
> -- 
> C. Titus Brown, ctbrown at ucdavis.edu



More information about the khmer mailing list