On the agenda this Wednesday at 10 AM PSB 271, starting our MSU Research Round Robin with two 20 minute presentations on local MSU! research:<br><br>Aaron Garoutte (Tiedje lab) speaking about using metagenomic sequencing to study plant-associated microbes to bioenergy crops (see abstracts below) and Adina (me) speaking about our work on soil metagenomic assembly "done right". Abstracts below.<br>
-----------------------<br><font size="2">Aaron Garoutte:<br></font><p style="margin: 0in 0in 0.0001pt; text-indent: 0.5in;"><font size="2"><span style="color: black;">Plant-associated
microbes play an important role in plant health and production, and
thus have a potential role on the productivity and sustainability of
crops used for biofuels. We studied the plant-microbe dynamic by
surveying the microbial communities of two biofuel crops, switchgrass
and Miscanthus, in two US locations, Michigan and Wisconsin, using whole
shotgun sequencing, and targeted pyrotags of 16S rRNA. DNA was
extracted from rhizosphere and adjacent bulk soil and sequenced using
the Roche 454 FLX Titanium and Illumina Genome Analyzer sequencing
platforms. Ribosomal gene reads were processed using the Ribosomal
Database Project Pyrosequencing Pipeline (RDP) and assembled shotgun
sequences were annotated using the CAMERA and MG-RAST annotation
pipelines.</span></font></p><p style="margin: 0in 0in 0.0001pt; text-indent: 0.5in;"><font size="2"><span style="color: black;">Significant
grouping of metagenome assemblies and pyrotag sequences were observed
by associated plant type. This pattern is seen throughout various
taxonomic levels in the pyrotag data as well as at the operational
taxonomic units (OTU) level. A significant perMANOVA result is observed
when samples are grouped according to sampling location (Michigan vs
Wisconsin). Ordination of the samples by principal components supports
the perMANOVA results. At the genus and OTU level, samples separate by
plant and location while at the functional level (COGs), samples
separate only at the plant level. These results suggest that although
there is variation in species composition between the Wisconsin and
Michigan soils, the functions carried out by the rhizosphere community
are conserved. </span></font></p><p style="margin: 0in 0in 0.0001pt; text-indent: 0.5in;"><font size="2"><span style="color: black;">This
work will allow us to further explore genes involved in plant growth
promotion, carbon, nitrogen, and phosphorus cycling contained in our
metagenomic sequences, which combined with site-specific environmental
metadata, can be used to explore the effects of gene suite and habitat
on plant-microbe-soil relationships. </span></font></p><font size="2"><br>Adina:<br>
</font><style>
<!--
/* Font Definitions */
@font-face
        {font-family:Cambria;
        panose-1:2 4 5 3 5 4 6 3 2 4;
        mso-font-charset:0;
        mso-generic-font-family:auto;
        mso-font-pitch:variable;
        mso-font-signature:3 0 0 0 1 0;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {mso-style-parent:"";
        margin-top:0in;
        margin-right:0in;
        margin-bottom:10.0pt;
        margin-left:0in;
        mso-pagination:widow-orphan;
        font-size:12.0pt;
        font-family:"Times New Roman";
        mso-ascii-font-family:Cambria;
        mso-ascii-theme-font:minor-latin;
        mso-fareast-font-family:Cambria;
        mso-fareast-theme-font:minor-latin;
        mso-hansi-font-family:Cambria;
        mso-hansi-theme-font:minor-latin;
        mso-bidi-font-family:"Times New Roman";
        mso-bidi-theme-font:minor-bidi;}
@page Section1
        {size:8.5in 11.0in;
        margin:1.0in 1.25in 1.0in 1.25in;
        mso-header-margin:.5in;
        mso-footer-margin:.5in;
        mso-paper-source:0;}
div.Section1
        {page:Section1;}
-->
</style>
<p class="MsoNormal" style="text-indent: 0.5in;"><font size="2">Metagenomic sequencing of complex
communities using short-read sequencing technologies presents both challenges and
opportunities.<span> </span>The decreasing
costs of short-read sequencing technologies have created unprecedented
opportunities to deeply sequence complex communities; however, the short read
length does not permit gene-centric analysis.<span> </span>The assembly of these short reads into larger contigs is
required for effective gene analysis.<span>
</span>The major bottleneck for applying current assembly algorithms to large
metagenomic datasets is the large volume of sequencing data and corresponding
computational power required to assemble these reads.<br></font>
<font size="2"><span> </span><span> </span>Soil
arguably has the most genetically diverse microbial composition and hence is
most in need of tools for metagenome analysis.<span> </span>Currently, we have over 500 Gb of Illumina sequencing from
Iowa cultivated and prairie soils.<span>
</span>In order to make the assembly of this data possible, w<span>e have
developed several k-mer-based approaches ranging from abundance filtering to
data partitioning. <br>
</span><span> </span><span> </span>Our
methods use multiple hashtables to store a memory-efficient representation of
the k-mer graph.<span> </span>Hash functions
are used to identify k-mer presence and connectivity within a dataset and
subsequently subdivide the dataset into disconnected subsets of reads. <span> </span>Partitioning the data allows us to scale
assembly both by reducing the amount of memory needed for the entire assembly
and allowing parallel assembly of all partitions.<br></font>
<font size="2"><span> </span><span> </span>In
applying this approach to a subset of the soil metagenome, we found the largest
partitioned subset of reads to be a single large connected graph with high
local connectivity.<span> </span>Further
investigation into these highly connected k-mers showed them to be
preferentially located at the 3’ end of reads, suggesting the presence of
sequencing artifacts.<span> </span>After
removing these highly connected k-mers and partitioning the 50 Gb dataset, we
assembled resulting partitions using the Velvet assembler in less than 68 Gb of
memory.<span> </span>The resulting assembly and
assembly of the original dataset (non-partitioned) differed by less than 5% (53,501
vs 62,108 contigs; 83.5 Mbp vs 87.6 Mbp, respectively).<span> </span>Our approach for partitioning large
datasets works well for local and global assembly problems, scales to commodity
hardware, and has a freely available implementation. </font></p>
<br>