[bip] mRNA lengths of all Human/Mouse refseqs (tutorial)
Titus Brown
titus at caltech.edu
Thu Jan 17 00:55:03 PST 2008
-> I recently needed to generate the sequence lengths for all Human and
-> Mouse refseqs mRNA's. I figured it was worth while to document how I did
-> it to try to get in the habit of making tutorials as I work on solving
-> specific problems.
->
-> I used the bhutils.Fasta module (LGPL license) that Joe Roden and myself
-> wrote for the BioHub project for the purpose of being able handle
-> extracting sequence from chromosome sized Fasta files w/o using up a lot
-> of memory. It also needed to handle extracting sequence from Fasta files
-> with many fasta sequences in it.
->
-> It is also very useful for selectively plucking sequences from multiple
-> fasta files w/ multiple sequences and combining them into a new
-> multi-sequence fasta file and it doesn't take much code to accomplish
-> these tasks. I will try to make tutorials for these as well.
->
-> It might be useful if someone wants to post similar tutorials for how
-> these would be done with biopython, pygr, or other packages as well.
->
-> The tutorial can be found here:
-> http://bio.scipy.org/wiki/index.php/Multisequence_fasta_sequence_lengths_with_bhutils
Here's a solution for pygr:
---
#! /usr/bin/env python
import sys
from pygr.seqdb import BlastDB
# create indices
db = BlastDB(sys.argv[1])
# iterate over keys & print
for name in db:
seq = db[name]
print '%s\t%d' % (name, len(seq))
---
More information about the biology-in-python
mailing list