On the agenda this Wednesday at 10 AM PSB 271, starting our MSU Research Round Robin with two 20 minute presentations on local MSU! research:<br><br>Aaron Garoutte (Tiedje lab) speaking about using metagenomic sequencing to study plant-associated microbes to bioenergy crops (see abstracts below) and Adina (me) speaking about our work on soil metagenomic assembly &quot;done right&quot;.  Abstracts below.<br>

-----------------------<br><font size="2">Aaron Garoutte:<br></font><p style="margin: 0in 0in 0.0001pt; text-indent: 0.5in;"><font size="2"><span style="color: black;">Plant-associated

 microbes play an important role in plant health and production, and 

thus have a potential role on the productivity and sustainability of 

crops used for biofuels. We studied the plant-microbe dynamic by 

surveying the microbial communities of two biofuel crops, switchgrass 

and Miscanthus, in two US locations, Michigan and Wisconsin, using whole

 shotgun sequencing, and targeted pyrotags of 16S rRNA. DNA was 

extracted from rhizosphere and adjacent bulk soil and sequenced using 

the Roche 454 FLX Titanium and Illumina Genome Analyzer sequencing 

platforms.  Ribosomal gene reads were processed using the Ribosomal 

Database Project Pyrosequencing Pipeline (RDP) and assembled shotgun 

sequences were annotated using the CAMERA and MG-RAST annotation 

pipelines.</span></font></p><p style="margin: 0in 0in 0.0001pt; text-indent: 0.5in;"><font size="2"><span style="color: black;">Significant

 grouping of metagenome assemblies and pyrotag sequences were observed 

by associated plant type.  This pattern is seen throughout various 

taxonomic levels in the pyrotag data as well as at the operational 

taxonomic units (OTU) level.  A significant perMANOVA result is observed

 when samples are grouped according to sampling location (Michigan vs 

Wisconsin).  Ordination of the samples by principal components supports 

the perMANOVA results.  At the genus and OTU level, samples separate by 

plant and location while at the functional level (COGs), samples 

separate only at the plant level. These results suggest that although 

there is variation in species composition between the Wisconsin and 

Michigan soils, the functions carried out by the rhizosphere community 

are conserved.  </span></font></p><p style="margin: 0in 0in 0.0001pt; text-indent: 0.5in;"><font size="2"><span style="color: black;">This

 work will allow us to further explore genes involved in plant growth 

promotion, carbon, nitrogen, and phosphorus cycling contained in our 

metagenomic sequences, which combined with site-specific environmental 

metadata, can be used to explore the effects of gene suite and habitat 

on plant-microbe-soil relationships. </span></font></p><font size="2"><br>Adina:<br>


</font><style>

<!--

 /* Font Definitions */

@font-face

        {font-family:Cambria;

        panose-1:2 4 5 3 5 4 6 3 2 4;

        mso-font-charset:0;

        mso-generic-font-family:auto;

        mso-font-pitch:variable;

        mso-font-signature:3 0 0 0 1 0;}

 /* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {mso-style-parent:"";

        margin-top:0in;

        margin-right:0in;

        margin-bottom:10.0pt;

        margin-left:0in;

        mso-pagination:widow-orphan;

        font-size:12.0pt;

        font-family:"Times New Roman";

        mso-ascii-font-family:Cambria;

        mso-ascii-theme-font:minor-latin;

        mso-fareast-font-family:Cambria;

        mso-fareast-theme-font:minor-latin;

        mso-hansi-font-family:Cambria;

        mso-hansi-theme-font:minor-latin;

        mso-bidi-font-family:"Times New Roman";

        mso-bidi-theme-font:minor-bidi;}

@page Section1

        {size:8.5in 11.0in;

        margin:1.0in 1.25in 1.0in 1.25in;

        mso-header-margin:.5in;

        mso-footer-margin:.5in;

        mso-paper-source:0;}

div.Section1

        {page:Section1;}

-->

</style>


<p class="MsoNormal" style="text-indent: 0.5in;"><font size="2">Metagenomic sequencing of complex

communities using short-read sequencing technologies presents both challenges and

opportunities.<span>  </span>The decreasing

costs of short-read sequencing technologies have created unprecedented

opportunities to deeply sequence complex communities; however, the short read

length does not permit gene-centric analysis.<span>  </span>The assembly of these short reads into larger contigs is

required for effective gene analysis.<span> 

</span>The major bottleneck for applying current assembly algorithms to large

metagenomic datasets is the large volume of sequencing data and corresponding

computational power required to assemble these reads.<br></font>

<font size="2"><span> </span><span>            </span>Soil

arguably has the most genetically diverse microbial composition and hence is

most in need of tools for metagenome analysis.<span>  </span>Currently, we have over 500 Gb of Illumina sequencing from

Iowa cultivated and prairie soils.<span> 

</span>In order to make the assembly of this data possible, w<span>e have

developed several k-mer-based approaches ranging from abundance filtering to

data partitioning. <br>

</span><span> </span><span>            </span>Our

methods use multiple hashtables to store a memory-efficient representation of

the k-mer graph.<span>  </span>Hash functions

are used to identify k-mer presence and connectivity within a dataset and

subsequently subdivide the dataset into disconnected subsets of reads. <span> </span>Partitioning the data allows us to scale

assembly both by reducing the amount of memory needed for the entire assembly

and allowing parallel assembly of all partitions.<br></font>

<font size="2"><span> </span><span>            </span>In

applying this approach to a subset of the soil metagenome, we found the largest

partitioned subset of reads to be a single large connected graph with high

local connectivity.<span>   </span>Further

investigation into these highly connected k-mers showed them to be

preferentially located at the 3’ end of reads, suggesting the presence of

sequencing artifacts.<span>  </span>After

removing these highly connected k-mers and partitioning the 50 Gb dataset, we

assembled resulting partitions using the Velvet assembler in less than 68 Gb of

memory.<span>  </span>The resulting assembly and

assembly of the original dataset (non-partitioned) differed by less than 5% (53,501

vs 62,108 contigs; 83.5 Mbp vs 87.6 Mbp, respectively).<span>  </span>Our approach for partitioning large

datasets works well for local and global assembly problems, scales to commodity

hardware, and has a freely available implementation. </font></p>


<br>