[bip] testing for clustered-ness on non-random background

Tue Feb 17 20:04:35 PST 2009

Hi all,

If you're interested in genes that are significantly up and down regulated
from a microarray experiment and whether they are clustered in the genome,
could you brake all the chromosomes into bins based on cM and map each gene
to their respective bin using a mid point and/or the position of the first
exon (in bp). I guess you could use a simple chi-square to determine if
there were bins that contained clusters of significant differentially
expressed genes. However it wouldn't tell you which bins (or genomic
positions) that have a large number (or cluster) of significantly de genes.
I may be way off base but perhaps you could use a glm with a poisson
distibution.  
y = B + e
Where y  = gene number per bin and B would be the bin? Alternatively
something like Titus has suggested where a null distribution is generated
through a permutation and compared to the observed distribution?

Sean

On 2/17/09 9:28 PM, "C. Titus Brown" <ctb at msu.edu> wrote:

> On Tue, Feb 17, 2009 at 11:50:22AM -0800, Brent Pedersen wrote:
> -> hi, this isn't a python question per se, but it seems like it might be
> -> a good place to ask.
> -> so i'd like to take a class of genes on a chromosome and see if they
> -> are "clustered".
> -> is there a good way to do this given that the genes are _already_
> -> clustered/non-randomly distributed
> -> along the chromosome due to the centromere, local duplications, etc?
> -> i've thought of:
> -> + encoding a chromosome as binary with 1 if it's a gene of interest
> -> and 0 for any other gene
> -> and then taking a moving average and finding peaks that fall outside
> -> of 95% limits generated
> -> by monte-carlo. this has the problem (or perhaps benefit) that it
> -> doesn't account for base pair
> -> position, just relative gene position.
> -> 
> -> + using geospatial measures like moran's I or geary's C--though those
> -> are generally 2 dimensional,
> -> i think they could be modified to handle distribution along the 1d
> -> chromsome. then i could take something
> -> like the global geary's C for the genome and comparing to the geary's
> -> C for the genes in question.
> -> 
> -> any literature on this?
> -> thanks for any pointers.
> 
> Hi, Brent,
> 
> people seem to have focused on the idea of generating a statistical
> background model (aka null hypothesis) and doing comparisons against
> that.  I think you'll probably find that the genome is indeed non-random
> but that's just a hunch ;)
> 
> I can't tell what you're trying to do in detail, but my first instinct
> would be to find a way to compute the density for a variety of different
> genes in the genome of interest.  If you can show that your particular
> class of genes differs from "comparable" classes, then you might have
> something interesting.
> 
> cheers,
> --titus