[metagenomics-jclub] MetaJ Club Wed 10 AM June 8

Adina Chuang Howe adina.chuang at gmail.com
Thu Jun 2 13:21:38 PDT 2011


On the agenda this Wednesday at 10 AM PSB 271, starting our MSU Research
Round Robin with two 20 minute presentations on local MSU! research:

Aaron Garoutte (Tiedje lab) speaking about using metagenomic sequencing to
study plant-associated microbes to bioenergy crops (see abstracts below) and
Adina (me) speaking about our work on soil metagenomic assembly "done
right".  Abstracts below.
-----------------------
Aaron Garoutte:

Plant-associated microbes play an important role in plant health and
production, and thus have a potential role on the productivity and
sustainability of crops used for biofuels. We studied the plant-microbe
dynamic by surveying the microbial communities of two biofuel crops,
switchgrass and Miscanthus, in two US locations, Michigan and Wisconsin,
using whole shotgun sequencing, and targeted pyrotags of 16S rRNA. DNA was
extracted from rhizosphere and adjacent bulk soil and sequenced using the
Roche 454 FLX Titanium and Illumina Genome Analyzer sequencing platforms.
 Ribosomal gene reads were processed using the Ribosomal Database Project
Pyrosequencing Pipeline (RDP) and assembled shotgun sequences were annotated
using the CAMERA and MG-RAST annotation pipelines.

Significant grouping of metagenome assemblies and pyrotag sequences were
observed by associated plant type.  This pattern is seen throughout various
taxonomic levels in the pyrotag data as well as at the operational taxonomic
units (OTU) level.  A significant perMANOVA result is observed when samples
are grouped according to sampling location (Michigan vs Wisconsin).
 Ordination of the samples by principal components supports the perMANOVA
results.  At the genus and OTU level, samples separate by plant and location
while at the functional level (COGs), samples separate only at the plant
level. These results suggest that although there is variation in species
composition between the Wisconsin and Michigan soils, the functions carried
out by the rhizosphere community are conserved.

This work will allow us to further explore genes involved in plant growth
promotion, carbon, nitrogen, and phosphorus cycling contained in our
metagenomic sequences, which combined with site-specific environmental
metadata, can be used to explore the effects of gene suite and habitat on
plant-microbe-soil relationships.

Adina:

Metagenomic sequencing of complex communities using short-read sequencing
technologies presents both challenges and opportunities.  The decreasing
costs of short-read sequencing technologies have created unprecedented
opportunities to deeply sequence complex communities; however, the short
read length does not permit gene-centric analysis.  The assembly of these
short reads into larger contigs is required for effective gene analysis.  The
major bottleneck for applying current assembly algorithms to large
metagenomic datasets is the large volume of sequencing data and
corresponding computational power required to assemble these reads.
             Soil arguably has the most genetically diverse microbial
composition and hence is most in need of tools for metagenome
analysis.  Currently,
we have over 500 Gb of Illumina sequencing from Iowa cultivated and prairie
soils.  In order to make the assembly of this data possible, we have
developed several k-mer-based approaches ranging from abundance filtering to
data partitioning.
             Our methods use multiple hashtables to store a memory-efficient
representation of the k-mer graph.  Hash functions are used to identify
k-mer presence and connectivity within a dataset and subsequently subdivide
the dataset into disconnected subsets of reads.  Partitioning the data
allows us to scale assembly both by reducing the amount of memory needed for
the entire assembly and allowing parallel assembly of all partitions.
             In applying this approach to a subset of the soil metagenome,
we found the largest partitioned subset of reads to be a single large
connected graph with high local connectivity.   Further investigation into
these highly connected k-mers showed them to be preferentially located at
the 3’ end of reads, suggesting the presence of sequencing artifacts.  After
removing these highly connected k-mers and partitioning the 50 Gb dataset,
we assembled resulting partitions using the Velvet assembler in less than 68
Gb of memory.  The resulting assembly and assembly of the original dataset
(non-partitioned) differed by less than 5% (53,501 vs 62,108 contigs; 83.5
Mbp vs 87.6 Mbp, respectively).  Our approach for partitioning large
datasets works well for local and global assembly problems, scales to
commodity hardware, and has a freely available implementation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/metagenomics-jclub/attachments/20110602/f217937a/attachment.htm>


More information about the metagenomics-jclub mailing list