[khmer] khmer on 454 metagenomics data

Fri Apr 4 07:49:42 PDT 2014

Hi,

We want to analyse 454 metagenomics data (570 000 reads of ~700 nt per 
sample).
My questions are :
1/ Given that khmer is rather short-read/Illumina oriented, are we 
mistaken to try and apply it to our long 454 reads?
2/ Is there an actual benefit in feeding .fastq files to khmer (in our 
case separate .qual files at the moment, but that can be changed), or 
does it really only consider the sequence data? ie. are the fasta files 
sufficient ?
How do you define what data needs pre-normalization or what data can go 
straight to artifact removal? In the Iowa corn example, you do not start 
by a normalize/filter pass, how come?
3/ Thus, should the pipeline for our data be like :
DIGINORM (normalize-by-median --  filter-abund -- normalize-by-median) 
-- ARTIFACT REMOVAL (load-graph -- partition-graph, ...etc)
  or
is the step DIGINORM  useless in our case ?

Thanks for your help

And thanks for your great job on khmer 1.0 !

Cheers From Bordeaux

Alexis
-- 
CBiB - Université de Bordeaux <http://www.u-bordeaux.fr>

Dr Alexis Groppi <mailto:alexis.groppi at u-bordeaux2.fr>
Directeur adjoint du CBiB - Chargé de mission du CGFB
146, rue Léo Saignat - Case 68 - 33076 Bordeaux Cedex
T. +33 5 57 57 12 18
P. +33 6 35 95 04 87
www.cbib.u-bordeaux2.fr <http://www.cbib.u-bordeaux2.fr>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20140404/dfa0c2da/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature_mail_cbib_nub.jpg
Type: image/jpeg
Size: 47582 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20140404/dfa0c2da/attachment-0001.jpg>