[bip] mRNA lengths of all Human/Mouse refseqs (tutorial)

Brandon King kingb at caltech.edu
Wed Jan 16 15:29:26 PST 2008


Hi All,

I recently needed to generate the sequence lengths for all Human and
Mouse refseqs mRNA's. I figured it was worth while to document how I did
it to try to get in the habit of making tutorials as I work on solving
specific problems.

I used the bhutils.Fasta module (LGPL license) that Joe Roden and myself
wrote for the BioHub project for the purpose of being able handle
extracting sequence from chromosome sized Fasta files w/o using up a lot
of memory. It also needed to handle extracting sequence from Fasta files
with many fasta sequences in it.

It is also very useful for selectively plucking sequences from multiple
fasta files w/ multiple sequences and combining them into a new
multi-sequence fasta file and it doesn't take much code to accomplish
these tasks. I will try to make tutorials for these as well.

It might be useful if someone wants to post similar tutorials for how
these would be done with biopython, pygr, or other packages as well.

bhutils is a collection of modules that were I found useful that could
easily be extracted from the BioHub project (making it a much lighter
weight package). http://woldlab.caltech.edu/html/bhutils/.

The tutorial can be found here:
http://bio.scipy.org/wiki/index.php/Multisequence_fasta_sequence_lengths_with_bhutils

-Brandon King




More information about the biology-in-python mailing list