The March 2009 issue of Python Magazine is on the news-stands. Guess whose article is mentioned on the cover?
Tag Archive: Entries Tagged with pypedal
PyPedal Article Published in Python Magazine
PyPedal Release Candidate 7 Highlights Need for Unit Testing
Many thanks to Matthieu Authier for reporting several bugs, which in turn led to more bugs. The hunt started with broken code in examples/new_amatrix.py, which led to bugs in several subroutines in pyp_nrm, which exposed some small side issues. Notably, pydot 1.0.2 is broken and had to be patched by hand, and I had to make lots of changes to draw_pedigree() in the pyp_graphics module to get it working again. I also added a new example program, new_decompose.py, to demonstrate the use of the routines in pyp_nrm for decomposing A such that A = TDT’, as well as the code for directly forming A-inverse with or without inbreeding.
These errors, in a way, exposed the tip of an iceberg: there is not nearly enough unit testing in PyPedal. Many of the example programs were written a long time ago (about three years) and not checked systematically since. This means that some programs, such as examples/new_amatrix.py, were still using methods that no longer exist (I’m thinking specifically about the info() method of NewAMatrix objects). Heck, that program still imported Numeric! So it’s a great idea to have examples, but it doesn’t help users when they run the program and everything fails. It makes the whole package look shoddy and undependable, and who wants to trust their work to that kind of code? If Matthieu hadn’t e-mailed me about those problems I’d still be shipping broken code. I hate writing unit tests as much as the next guy, but PyPedal’s grown to the point that it’s too big for me to keep track of, so I need a way to automate testing. So, it’s getting to be unit test time. With that and the work I want to do on the graphics module there’s plenty of work in the queue for version 2.0.1.
PyPedal Release Candidate 6 Fixes Logging
I broke logging a while ago and didn’t notice it. Now it;s fixed. I’ve also modified pyp_metrics/renumber() so that it now cleans-up after itself by deleting ID map files when renumbering is complete; if you need to retain the file you may set the cleanmap parameter to True. I also fixed a few other little bugs and added a couple of new parameters to pyp_metrics/effective_founder_genomes(). I also fixed a broken link to the API documentation for pyp_metrics on the PyPedal website.
PyPedal Release Candidate 4 is Easier to Install
I made some changes to __init__.py and setup.py that make PyPedal installable (is that a word?) using the very Pythonic “python setup.py install” method. I also updated the discussion of installation in the manual. However, in order to get this working reliably I had to remove all of the dependencies from setup.py. This means that you have to make sure you’ve installed all of the dependencies yourself because setuptools won’t do it for you. I had to make this change because in my testing half of the dependencies would either bomb during the compile or be unavailable through PyPI, and I think that I’ve got a pretty common setup here. The easy solution for Windows users in a non-commercial setting is to either use the Enthought Python Distribution or download and run the binary installers (,msi files) for Numpy, matplotlib, etc. *buntu users should be able to install everything out of the basic repositories. I’m not sure about Mac OS/X users.
If you have any problems installing PyPedal please let me know so that I can fix them before the final release.
This is a Quick Test of the Google Syntax Highlighter for WordPress Plugin
I read about the Google Syntax Highlighter for WordPress in a blog entry by Jacob Gube describing several purportedly-useful WordPress extensions. The plugin sounds like just the thing to pretty-up some entries I’m working on about PyPedal. Here’s a simple example demonstrating how to use the newly-rewritten database tools in Release Candidate 4 (coming soon to a download site near you):
>>> # Load PyPedal
>>> import PyPedal
>>> # Specify program options -- note that we're accessing an SQLite database
>>> options = {}
>>> options['messages'] = 'verbose'
>>> options['pedfile'] = 'hartlandclark.ped'
>>> options['pedname'] = 'Pedigree from van Noordwijck and Scharloo (1981)'
>>> options['pedformat'] = 'asdb'
>>> options['pedigree_is_renumbered'] = 1
>>> options['database_name'] = 'new_db_test'
>>> options['dbtable_name'] = 'test'
>>> options['database_type'] = 'sqlite'
>>> # Now we load the pedigree
>>> example = pyp_newclasses.loadPedigree(options)
>>> # Muck around with the database
>>> pyp_nrm.inbreeding(example)
>>> # Drop the existing table, if there is one
>>> pyp_db.deleteTable(example)
>>> # Check to see if the table is gone
>>> pyp_db.doesTableExist(example)
>>> # Creating the table
>>> pyp_db.createPedigreeTable(example)
>>> # Populating the table
>>> pyp_db.populatePedigreeTable(example)
>>> # Perform some calculations, in this case computing coefficients of inbreeding
>>> mean_inbreeding = pyp_reports.meanMetricBy(example,metric='fa',byvar='by')
>>> print mean_inbreeding
Which should give you the result:
[INFO]: Logfile hartlandclark.log instantiated.
[INFO]: Preprocessing hartlandclark.ped
[INFO]: Opening pedigree file hartlandclark.ped
[INFO]: Creating pedigree metadata object
[INFO]: Instantiating a new PedigreeMetadata() object...
[INFO]: Naming the Pedigree()...
[INFO]: Assigning a filename...
[INFO]: Attaching a pedigree...
[INFO]: Setting the pedcode...
[INFO]: Counting the number of animals in the pedigree...
[INFO]: Counting and finding unique sires...
[INFO]: Counting and finding unique dams...
[INFO]: Setting renumbered flag...
[INFO]: Counting and finding unique generations...
[INFO]: Counting and finding unique birthyears...
[INFO]: Counting and finding unique founders...
[INFO]: Counting and finding unique herds...
[INFO]: Detaching pedigree...
Metadata for Pedigree from van Noordwijck and Scharloo (1981) (hartlandclark.ped)
Records: 15
Unique Sires: 9
Unique Dams: 5
Unique Gens: 1
Unique Years: 8
Unique Founders: 3
Unique Herds: 1
Pedigree Code: asdb
{1920: 0.0, 1960: 0.015625, 1930: 0.0, 1900: 0.0, 1970: 0.14453125, \
1940: 0.015625, 1910: 0.0, 1950: 0.078125}
Pretty snazzy, eh?
An Article on PyPedal May Appear in Python Magazine
I received this morning an e-mail from the Technical Editor of Python Magazine asking if I’d be interested in writing an article about PyPedal. I hope to talk with with him soon to get more details and see if we can agree on something that might be a good fit tor his publication. There’s not a done deal or anything like that, but I’m still flattered to be asked, and excited about the possibility of spreading the word to a wider audience. Maybe this is how Dustin Puryear of BRLUG fame began to grow The Brand.
PyPedal Release Candidate 3 Features an Inbreeding Bugfix and Refactored Database Handling
The third release candidate of PyPedal 2.0 has been released. It includes a minor bugfix to pyp_nrm/inbreeding_vanraden() and a completely-refactored database backend. After a few days of writing, and a false start with SQLAlchemy, I’ve completely rewritten the database backend using ADOdb for Python. PyPedal can now be used with MySQL, Postgres, and SQLite databases. The manual has also been updated to include a more complete discussion of databases in PyPedal. Notably, there is discussion of how to bend PyPedal to your will if you want to load data from, or save them to, your own databases using formats other than ASDx.
I’ve got no quarrel with SQLAlchemy — it looks very cool — but I could never quite get it to do what I wanted. After working with it for a day or so I concluded that it was just overkill for the few simple things I wanted to do, so I jumped ship to ADOdb for Python. It stood me in good stead many times in my PHP days,and it got the job done handily here, too.
I’m sure I’ll think about more that I want to say later. The dusting and polishing that goes along with preparing a release version is doing PyPedal a lot of good.
PyPedal Release Candidates Feature Fewer Bugs!
One supposes that the titles of this entry is not particularly informative because that is, in fact, the point of release candidates: to iron out as many lingering bugs as possible. What do you from a guy with hair like mine?
Now that I’ve established that my blood sugar’s off we can get down to brass tacks. Release candidates 1 and 2 have been released, and release candidate 3 is in the works. I won’t repeat everything that’s in the CHANGES.txt file, but I’ll hit the high points.
The big news is that I broke pyp_nrm/inbreeding_vanraden() at some point in the recent past. If I had regression tests I would have noticed this sooner. Maybe. Someone really should get on that. The bug affected only animals with both parents unknown or with unknown sires. The fix was very simple. I also fixed a bug that prevented the super-nifty Full-Sib Speedup from working. Wait, what, you’re not familiar with the Full-Sib Speedup? Let me explain.
The Full-Sib Speedup is a trick to avoid unnecessary calculations when you’ve got a pedigree with lots of full-sibs, such as dogs. The trick is that a list of observed sire-dam combinations is stored. The first time that combination is seen the resulting COI is stored in a dictionary. Any subsequent offspring of that combination receive the COI stored in the dictionary. Instead of calculating the COI for each offspring of the combination it is calculated only once. There are a number of steps avoided with this trick, such as pedigree extraction, reordering, and renumbering. And it now works correctly, which is a bonus.
The minor things include such stuff as lots of small fixes to the NewAMatrix class, fixes to inbreeding and relationship metadata, and added pedigree import/export from database and textstreams.
I also added a couple of chapters to the manual, one on input/output and one on performing calculations with pedigrees. The API documentation has been pulled out of the manual and is available on the website. I’m also now using Doxygen to generate those documents, which is working quite well.
Perhaps feature deletions are not as interesting as feature additions, but I’ve removed the GUI from PyPedal. The code that was hanging around was very old — it predated the current pedigree format system and used static format codes embedded in the pedigree file — and didn’t work very well. The fundamental idea behind PyPedal is that you get a set of tools that allow you to program (or metaprogram?) using pedigrees, and the GUI doesn’t fit in that paradigm. I’m also not comfortable with GUI programming and don’t have much interest in pursuing it. If someone wants to work on a GUI for PyPedal I’ll be happy to provide feedback, but do not intend to pursue it any further on my own.
PyPedal now calculates ancestral inbreeding coefficients
PyPedal can now calculate ancestral inbreeding coefficients using either the recursion equation of Ballou (1997) or the gene dropping method of Suwanlee et al. (2007). Ancestral inbreeding is the probability of an individual inheriting an allele that has undergone inbreeding in the past at least once, and is of interest in conservation and evolutionary genetics relative to purging of deleterious alleles. Results have been validated against small examples provided in each paper, although it looks as though there may be a typo in the coefficient of inbreeding for individual 23 in Figure 1 of Suwanlee et al. (2007); they report f as 0.633 and I get 0.668, but it’s easy for that kind of thing to happen in typesetting.
Although Suwanlee et la. (2007) make a strong case that Ballou’s (1997) equation needs to be modified because it assumes that “regular” inbreeding and ancestral inbreeding are independent, Ballou’s method has the advantage of being much faster to calculate. If the overestimation observed under Ballou’s method is acceptable then performance-for-accuracy may be an acceptable trade-off. Both methods are provided in pyp_metrics so that the user may select whichever best suits their needs.
Several other small changes have been made to PyPedal. The most important of these is that several of the methods in the class pyp_newclasses/PedigreeMetadata now use Python sets, which were introduced in Python 2.4. As a result, PyPedal will not run on Python versions earlier than 2.4.
PyPedal now calculates partial inbreeding coefficients
After sitting on the “coming soon” list for a couple of years PyPedal can finally calculate coefficients of partial inbreeding! As side effect, the pyp_nrm/reorder() routine now moves all founders to the beginning of the pedigree. In the past, founders were guaranteed to precede their offspring, but they were not guaranteed to precede other animals with known parents. Here’s the renumbered pedigree as rendered by pyp_graphics\new_draw_pedigree():

Here’s the program I used to test the code and draw the pedigree:
from PyPedal import pyp_newclasses, pyp_metrics, pyp_nrm, pyp_io, pyp_graphicsoptions = {}
# This is the name of the input tile.
options['pedfile'] = ‘partial.ped’
# This is a descriptor used in some output.
options['pedname'] = ‘Pedigree from Fig. 2 of Gulisija and Crow (2007)’
options['messages'] = ‘verbose’
options['renumber'] = 1
options['pedformat'] = ‘ASD’
options['assign_sexes'] = 1if __name__ == “__main__”:
test = pyp_newclasses.loadPedigree(options)
partial_f = pyp_nrm.partial_inbreeding(test)
print partial_f
pyp_graphics.new_draw_pedigree(test, gfilename=’partial’, gtitle=’Figure 2 from Gulisija and Crow (2007)’, gorient=’p',gname=1)
In addition to writing a file, partial.jpg, that contains the pedigree drawing the dictionary of partial inbreeding coefficients will be printed:
{1: {3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0625}, 2: {3: 0.0, 4: 0.0, 5: 0.0, 6: 0.25, 7: 0.125, 8: 0.125, 9: 0.21875}, 3: {4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0625, 8: 0.0625, 9: 0.09375}}
The take-home is that animal I has an inbreeding coefficient of 0.375 (data not shown) and coefficients of partial inbreeding to founders J, K, and M of 0.21875, 0.09375, and 0.0625, respectively. As expected, 0.21875 + 0.09375 + 0.0625 = 0.375. QED, right?
I’m also close to having the code done to support GEDCOM 5.5 files, which is widely-used by the human genealogy community, although support will be provided for only a small subset of the formal specification. It’s going to feel a little hackish to use due to some complicated issues with the way PyPedal converts string IDs to integral IDs, but it’ll work.
