[bip] testing for clustered-ness on non-random background

Tue Feb 17 18:28:41 PST 2009

On Tue, Feb 17, 2009 at 11:50:22AM -0800, Brent Pedersen wrote:
-> hi, this isn't a python question per se, but it seems like it might be
-> a good place to ask.
-> so i'd like to take a class of genes on a chromosome and see if they
-> are "clustered".
-> is there a good way to do this given that the genes are _already_
-> clustered/non-randomly distributed
-> along the chromosome due to the centromere, local duplications, etc?
-> i've thought of:
-> + encoding a chromosome as binary with 1 if it's a gene of interest
-> and 0 for any other gene
-> and then taking a moving average and finding peaks that fall outside
-> of 95% limits generated
-> by monte-carlo. this has the problem (or perhaps benefit) that it
-> doesn't account for base pair
-> position, just relative gene position.
-> 
-> + using geospatial measures like moran's I or geary's C--though those
-> are generally 2 dimensional,
-> i think they could be modified to handle distribution along the 1d
-> chromsome. then i could take something
-> like the global geary's C for the genome and comparing to the geary's
-> C for the genes in question.
-> 
-> any literature on this?
-> thanks for any pointers.

Hi, Brent,

people seem to have focused on the idea of generating a statistical
background model (aka null hypothesis) and doing comparisons against
that.  I think you'll probably find that the genome is indeed non-random
but that's just a hunch ;)

I can't tell what you're trying to do in detail, but my first instinct
would be to find a way to compute the density for a variety of different
genes in the genome of interest.  If you can show that your particular
class of genes differs from "comparable" classes, then you might have
something interesting.

cheers,
--titus
-- 
C. Titus Brown, ctb at msu.edu