[protocols] [khmer] calc-best-assembly.py

Michael R. Crusoe mcrusoe at msu.edu
Wed Jan 8 12:54:21 PST 2014


On Sat, Jan 4, 2014 at 10:26 AM, Yiseul Kim <kimyise2 at msu.edu> wrote:

> Hi Michael,
>
> I am sorry for my late reply. Thanks for your help again!
>
> Yes, assemstats3.py worked.
>
> I am running all of this in a directory named assembly created on HPCC.
>
> When I ran the command you asked, the output says "cat: grouplist.txt: No
> such file or directory".
>


Are your velvet assembly files named in the format "kak.groupNNNN.pe.fq.gz"
where NNNN is a four digit number between 0 and 1,000 inclusive?



>
> Regards,
> Yiseul
>
>
> On Fri, Jan 3, 2014 at 6:05 PM, Michael R. Crusoe <mcrusoe at msu.edu> wrote:
>
>> Does the invocation of assemstats3.py work?
>>
>> This is being run in /mnt/assembly, yes?
>>
>> What is the output of this command?
>>
>> for group in $(cat grouplist.txt); do echo
>>  '$group.*velvet.*.d/contigs.fa'; ls  $group.*velvet.*.d/contigs.fa; done
>>
>>
>> On Fri, Jan 3, 2014 at 5:11 PM, Yiseul Kim <kimyise2 at msu.edu> wrote:
>>
>>> Thanks for your help in advance!
>>>
>>> Basically, I am following the Kalamazoo metagenomic assembly protocol
>>> with my viral metagenomic dataset. In the assembly step, the protocol runs
>>> dataset with three different assemblers but I wanted to test only with the
>>> velvet. I am not an expert on writing a script and trying to modify the one
>>> below only for velvet output by deleting the part underlined. When I ran
>>> it, the error message says "too few arguments". Could you help me with
>>> modifying the script only from one assembler? Please let me know if I am
>>> not making myself clear.
>>>
>>>
>>>
>>> for i in {0..1000};
>>> do
>>>      groupid=$(printf kak.group%04d $i);
>>>      if [ -e ${groupid}.pe.fq.gz ]; then
>>>         echo $groupid
>>>      fi
>>> done > grouplist.txt
>>>
>>> for group in $(cat grouplist.txt)
>>> do
>>>    python /usr/local/share/khmer/sandbox/calc-best-assembly.py -q $group.{*velvet.*.d/contigs.fa*,*idba.d/scaffold.fa,*spades.d/contigs.fasta*} -o $group.best.fa
>>> done > best-assemblies.txt
>>>
>>> python /usr/local/share/khmer/sandbox/multi-rename.py testasm *.best.fa > final-assembly.fa
>>>
>>>
>>> Regards,
>>> Yiseul
>>>
>>>
>>> On Fri, Jan 3, 2014 at 4:57 PM, Michael R. Crusoe <mcrusoe at msu.edu>wrote:
>>>
>>>> Please :-)
>>>>
>>>>
>>>> On Fri, Jan 3, 2014 at 4:40 PM, Yiseul Kim <kimyise2 at msu.edu> wrote:
>>>>
>>>>> Hi Michael,
>>>>>
>>>>> Can I ask one more question for you?
>>>>>
>>>>> Regards,
>>>>> Yiseul
>>>>>
>>>>>
>>>>> On Fri, Jan 3, 2014 at 4:16 PM, Michael R. Crusoe <mcrusoe at msu.edu>wrote:
>>>>>
>>>>>> You are welcome!
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 3, 2014 at 4:14 PM, Yiseul Kim <kimyise2 at msu.edu> wrote:
>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Regards,
>>>>>>> Yiseul
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jan 3, 2014 at 4:11 PM, Michael R. Crusoe <mcrusoe at msu.edu>wrote:
>>>>>>>
>>>>>>>> It is in the protocols-v0.8.3 branch of khmer:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/ged-lab/khmer/blob/protocols-v0.8.3/sandbox/calc-best-assembly.py
>>>>>>>>
>>>>>>>> Install instructions are at:
>>>>>>>> https://khmer-protocols.readthedocs.org/en/v0.8.3/metagenomics/1-quality.html#install-software
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jan 3, 2014 at 4:07 PM, Yiseul Kim <kimyise2 at msu.edu>wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Could someone help me with finding the location of
>>>>>>>>> *calc-best-assembly.py*? The newly released metagenomic assembly
>>>>>>>>> protocol (
>>>>>>>>> https://khmer-protocols.readthedocs.org/en/v0.8.3/metagenomics/4-assemble.html)
>>>>>>>>> says it is located under /khmer/sandbox but I am not able to find it. Any
>>>>>>>>> help would be appreciated.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Yiseul
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> khmer mailing list
>>>>>>>>> khmer at lists.idyll.org
>>>>>>>>> http://lists.idyll.org/listinfo/khmer
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Michael R. Crusoe: Software Engineer and Bioinformatician
>>>>>>>> mcrusoe at msu.edu
>>>>>>>>  @ the Genomics, Evolution, and Development lab; Michigan State
>>>>>>>> University
>>>>>>>> http://ged.msu.edu/     http://orcid.org/0000-0002-2961-9670
>>>>>>>> @biocrusoe <http://twitter.com/biocrusoe>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Michael R. Crusoe: Software Engineer and Bioinformatician
>>>>>> mcrusoe at msu.edu
>>>>>>  @ the Genomics, Evolution, and Development lab; Michigan State
>>>>>> University
>>>>>> http://ged.msu.edu/     http://orcid.org/0000-0002-2961-9670
>>>>>> @biocrusoe <http://twitter.com/biocrusoe>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Michael R. Crusoe: Software Engineer and Bioinformatician
>>>> mcrusoe at msu.edu
>>>>  @ the Genomics, Evolution, and Development lab; Michigan State
>>>> University
>>>> http://ged.msu.edu/     http://orcid.org/0000-0002-2961-9670
>>>> @biocrusoe <http://twitter.com/biocrusoe>
>>>>
>>>
>>>
>>
>>
>> --
>> Michael R. Crusoe: Software Engineer and Bioinformatician
>> mcrusoe at msu.edu
>>  @ the Genomics, Evolution, and Development lab; Michigan State University
>> http://ged.msu.edu/     http://orcid.org/0000-0002-2961-9670
>> @biocrusoe <http://twitter.com/biocrusoe>
>>
>
>


-- 
Michael R. Crusoe: Software Engineer and Bioinformatician  mcrusoe at msu.edu
 @ the Genomics, Evolution, and Development lab; Michigan State University
http://ged.msu.edu/     http://orcid.org/0000-0002-2961-9670
@biocrusoe<http://twitter.com/biocrusoe>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/protocols/attachments/20140108/4d9d22f5/attachment-0001.htm>


More information about the protocols mailing list