There’s so much genomic data that we hardly know what to do with it all (0)
One of the things that falls out of our genomic evaluation system are breeding values for individual chromosomes for each trait evaluated. For the past few days I’ve been thinking about how to compare the genetic covariance matrices that we can calculate for each chromosome. This might help us identify markers that affect groups of traits by changing the correlations among them, but which do not have lerge (QTL-type) effects when compared to other markers. The evolutionary genetics guys have pondered some related issues, and I spent a lot of time reading about Flury’s hierarchy and common principal components and that sort of thing. Today I decided that it was time to write some code so I could do some calculations, and oI (of course) whipped up some Python to do the heavy lifting. The first step was to load the data into a dictionary of dictionaries of dictionaries, although I’m starting to think I should spend some quality time with PyTables. Then I wrote a function that uses NumPy to calculate the mean vector and covariance matrix from the EBV on a given chromsome. And it took about two lines of code to loop over that function to get the means and covariance matrices for all 33 chromosomes plus the whole genome. (I know, cattle have only 30 chromsomes, but we also have to consider the pseudoautosomal region and unassigned markers.) I expect it’ll take a little time Monday to get all of the trait data into a single structure (we currently process our traits in groups), and then I can get on to things like the actual principal components analyses, matrix correlations, etc. How cool is that?

