[protocols] [khmer] calc-best-assembly.py

Michael R. Crusoe mcrusoe at msu.edu
Wed Jan 8 13:52:27 PST 2014


Okay. So does it create the grouplist.txt file?


On Wed, Jan 8, 2014 at 4:49 PM, Yiseul Kim <kimyise2 at msu.edu> wrote:

> Here you go.
>
> kak.group0000.nodn.pe.fa.gz
> kak.group0000.pe.fa.gz
> kak.group0001.nodn.pe.fa.gz
> kak.group0001.pe.fa.gz
> .
> .
> .
> kak.group0028.nodn.pe.fa.gz
> kak.group0028.pe.fa.gz
>
> Regards,
> Yiseul
>
>
> On Wed, Jan 8, 2014 at 4:45 PM, Michael R. Crusoe <mcrusoe at msu.edu> wrote:
>
>> What is the output of `ls kak.group*.pe.fa.gz`?
>>
>>
>>
>> On Wed, Jan 8, 2014 at 4:42 PM, Yiseul Kim <kimyise2 at msu.edu> wrote:
>>
>>> I already reflected the change of naming scheme… Sorry that I did not
>>> mention this earlier.
>>>
>>> Regards,
>>> Yiseul
>>>
>>>
>>> On Wed, Jan 8, 2014 at 4:36 PM, Michael R. Crusoe <mcrusoe at msu.edu>wrote:
>>>
>>>> Okay, then you need to modify the fourth line of your script to reflect
>>>> the change you made to the naming scheme:
>>>>
>>>> "     if [ -e ${groupid}.pe.fa.gz ]; then"
>>>>
>>>>
>>>> On Wed, Jan 8, 2014 at 4:15 PM, Yiseul Kim <kimyise2 at msu.edu> wrote:
>>>>
>>>>> Yes, velvet assembly files are named in that way except "fa" instead
>>>>> of "fq" not to get confused as the files are fasta format. I am
>>>>> around the campus and can stop by your office if you mind emails back and
>>>>> forth. Thanks for your help!
>>>>>
>>>>> Regards,
>>>>> Yiseul
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 3:54 PM, Michael R. Crusoe <mcrusoe at msu.edu>wrote:
>>>>>
>>>>>>
>>>>>> On Sat, Jan 4, 2014 at 10:26 AM, Yiseul Kim <kimyise2 at msu.edu> wrote:
>>>>>>
>>>>>>> Hi Michael,
>>>>>>>
>>>>>>> I am sorry for my late reply. Thanks for your help again!
>>>>>>>
>>>>>>> Yes, assemstats3.py worked.
>>>>>>>
>>>>>>> I am running all of this in a directory named assembly created on
>>>>>>> HPCC.
>>>>>>>
>>>>>>> When I ran the command you asked, the output says "cat:
>>>>>>> grouplist.txt: No such file or directory".
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Are your velvet assembly files named in the format
>>>>>> "kak.groupNNNN.pe.fq.gz" where NNNN is a four digit number between 0 and
>>>>>> 1,000 inclusive?
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Yiseul
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jan 3, 2014 at 6:05 PM, Michael R. Crusoe <mcrusoe at msu.edu>wrote:
>>>>>>>
>>>>>>>> Does the invocation of assemstats3.py work?
>>>>>>>>
>>>>>>>> This is being run in /mnt/assembly, yes?
>>>>>>>>
>>>>>>>> What is the output of this command?
>>>>>>>>
>>>>>>>> for group in $(cat grouplist.txt); do echo
>>>>>>>>  '$group.*velvet.*.d/contigs.fa'; ls  $group.*velvet.*.d/contigs.fa; done
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jan 3, 2014 at 5:11 PM, Yiseul Kim <kimyise2 at msu.edu>wrote:
>>>>>>>>
>>>>>>>>> Thanks for your help in advance!
>>>>>>>>>
>>>>>>>>> Basically, I am following the Kalamazoo metagenomic assembly
>>>>>>>>> protocol with my viral metagenomic dataset. In the assembly step, the
>>>>>>>>> protocol runs dataset with three different assemblers but I wanted to test
>>>>>>>>> only with the velvet. I am not an expert on writing a script and trying to
>>>>>>>>> modify the one below only for velvet output by deleting the part
>>>>>>>>> underlined. When I ran it, the error message says "too few arguments".
>>>>>>>>> Could you help me with modifying the script only from one assembler? Please
>>>>>>>>> let me know if I am not making myself clear.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> for i in {0..1000};
>>>>>>>>> do
>>>>>>>>>      groupid=$(printf kak.group%04d $i);
>>>>>>>>>      if [ -e ${groupid}.pe.fq.gz ]; then
>>>>>>>>>         echo $groupid
>>>>>>>>>      fi
>>>>>>>>> done > grouplist.txt
>>>>>>>>>
>>>>>>>>> for group in $(cat grouplist.txt)
>>>>>>>>> do
>>>>>>>>>    python /usr/local/share/khmer/sandbox/calc-best-assembly.py -q $group.{*velvet.*.d/contigs.fa*,*idba.d/scaffold.fa,*spades.d/contigs.fasta*} -o $group.best.fa
>>>>>>>>> done > best-assemblies.txt
>>>>>>>>>
>>>>>>>>> python /usr/local/share/khmer/sandbox/multi-rename.py testasm *.best.fa > final-assembly.fa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Yiseul
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jan 3, 2014 at 4:57 PM, Michael R. Crusoe <mcrusoe at msu.edu
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Please :-)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jan 3, 2014 at 4:40 PM, Yiseul Kim <kimyise2 at msu.edu>wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Michael,
>>>>>>>>>>>
>>>>>>>>>>> Can I ask one more question for you?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Yiseul
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jan 3, 2014 at 4:16 PM, Michael R. Crusoe <
>>>>>>>>>>> mcrusoe at msu.edu> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> You are welcome!
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jan 3, 2014 at 4:14 PM, Yiseul Kim <kimyise2 at msu.edu>wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Yiseul
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jan 3, 2014 at 4:11 PM, Michael R. Crusoe <
>>>>>>>>>>>>> mcrusoe at msu.edu> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is in the protocols-v0.8.3 branch of khmer:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://github.com/ged-lab/khmer/blob/protocols-v0.8.3/sandbox/calc-best-assembly.py
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Install instructions are at:
>>>>>>>>>>>>>> https://khmer-protocols.readthedocs.org/en/v0.8.3/metagenomics/1-quality.html#install-software
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jan 3, 2014 at 4:07 PM, Yiseul Kim <kimyise2 at msu.edu>wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Could someone help me with finding the location of
>>>>>>>>>>>>>>> *calc-best-assembly.py*? The newly released metagenomic
>>>>>>>>>>>>>>> assembly protocol (
>>>>>>>>>>>>>>> https://khmer-protocols.readthedocs.org/en/v0.8.3/metagenomics/4-assemble.html)
>>>>>>>>>>>>>>> says it is located under /khmer/sandbox but I am not able to find it. Any
>>>>>>>>>>>>>>> help would be appreciated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Yiseul
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>> --
>>>> Michael R. Crusoe: Software Engineer and Bioinformatician
>>>> mcrusoe at msu.edu
>>>>  @ the Genomics, Evolution, and Development lab; Michigan State
>>>> University
>>>> http://ged.msu.edu/     http://orcid.org/0000-0002-2961-9670
>>>> @biocrusoe <http://twitter.com/biocrusoe>
>>>>
>>>
>>>
>>
>>
>> --
>> Michael R. Crusoe: Software Engineer and Bioinformatician
>> mcrusoe at msu.edu
>>  @ the Genomics, Evolution, and Development lab; Michigan State University
>> http://ged.msu.edu/     http://orcid.org/0000-0002-2961-9670
>> @biocrusoe <http://twitter.com/biocrusoe>
>>
>
>


-- 
Michael R. Crusoe: Software Engineer and Bioinformatician  mcrusoe at msu.edu
 @ the Genomics, Evolution, and Development lab; Michigan State University
http://ged.msu.edu/     http://orcid.org/0000-0002-2961-9670
@biocrusoe<http://twitter.com/biocrusoe>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/protocols/attachments/20140108/2d62798d/attachment-0001.htm>


More information about the protocols mailing list