[bip] mRNA lengths of all Human/Mouse refseqs (tutorial)

Titus Brown titus at caltech.edu
Thu Jan 17 00:55:03 PST 2008


-> I recently needed to generate the sequence lengths for all Human and
-> Mouse refseqs mRNA's. I figured it was worth while to document how I did
-> it to try to get in the habit of making tutorials as I work on solving
-> specific problems.
-> 
-> I used the bhutils.Fasta module (LGPL license) that Joe Roden and myself
-> wrote for the BioHub project for the purpose of being able handle
-> extracting sequence from chromosome sized Fasta files w/o using up a lot
-> of memory. It also needed to handle extracting sequence from Fasta files
-> with many fasta sequences in it.
-> 
-> It is also very useful for selectively plucking sequences from multiple
-> fasta files w/ multiple sequences and combining them into a new
-> multi-sequence fasta file and it doesn't take much code to accomplish
-> these tasks. I will try to make tutorials for these as well.
-> 
-> It might be useful if someone wants to post similar tutorials for how
-> these would be done with biopython, pygr, or other packages as well.
-> 
-> The tutorial can be found here:
-> http://bio.scipy.org/wiki/index.php/Multisequence_fasta_sequence_lengths_with_bhutils

Here's a solution for pygr:

---

#! /usr/bin/env python
import sys
from pygr.seqdb import BlastDB

# create indices
db = BlastDB(sys.argv[1])

# iterate over keys & print
for name in db:
    seq = db[name]
    print '%s\t%d' % (name, len(seq))

---



More information about the biology-in-python mailing list