Monday, 6 February 2012

The fun of Statistics or "Lies, damned lies, and statistics"

There is a saying (in English) that there are "Lies, damned lies, and statistics".  This saying has over the years been attributed to many people, the least of whom are Disraeli (a British politician of the 19thC) and Mark Twain. However because 'google' is our friend I went to check the veracity of the above mentioned quote and found that it pre-dates these two aforementioned gentleman. history of the lies quote .

The point being made in the quote is that statistics can be manipulated to support or demolish a particular argument, they can be incorrectly reported by the media for dramatic impact, wilfully misinterpreted for personal or political gain and  generally tend to turn people off  from wanting to know more about a subject.

However statistics are also a useful tool in epidemiology.  For those who do not know, epidemiology is the study of disease patterns within and between populations, over time.
The investigation into finding a cause for HWSS was accelerated by using statistics.  The probability of 26 apparently unrelated ponies, living completely different lifestyles, in different countries in different continents spontaneously all being afflicted with the same debilitating hoof condition is statistically unlikely; there had to be a common denominator.

Back in 2005 there were very few publicly accessible 'through the internet breeding data base resources' available.  Since then of course there has been a growth in such sites as all breed pedigree and sukuposti.  These sites well pre-date the official database of the CPBS.  Consequently if one wanted a comprehensive database of the 'state of the nation' of  even the local Connemara Pony genepool one had to construct a database from scratch by manually loading information from the various printed stud books and registers into a suitable computer programme.  
Papers of interest which arose from the building of  a privately assembled database can now be seen on line.

Connemara Pony Bloodlines in New Zealand
An Analysis of the Australian Connemara Pony Population

All of these papers were written up many months prior to this particular database being used to determine whether there was a common link between the 26 apparently unrelated Connemara ponies which had clinical HWSS.  The information contained in these papers have no connection whatsoever with the HWSS research.  They are included because they demonstrate how statistics can be very helpful in making breeding decisions.  They are also of interest to those people who are (rightly) concerned about increasing genetic bottlenecks, decreasing genetic diversity and the negative effects of  'Founder's Syndrome' in the Connemara Pony both  in their own country and worldwide.   It would be great if other countries were to also emulate this work and do a similar analysis of their own pony populations.
Such work as has been done in New Zealand can also be achieved in other countries. One needs an extensive database which includes all of the breeding for example includes the total parentage of the OUTCROSS stallions, not just the Connemara pony side of the pedigree.

What these papers AND the pedigree work undertaken for HWSS research  does prove, is that studying standard four generation pedigrees is not a valid form of analysis in a small, closed genetic population such is the Connemara Pony WORLDWIDE.   Indeed even a seven generation family tree does not disclose all the possible permutations which, with one common ancestor, has  resulted in HWSS.

The private database being used by the research group, now has well in excess of 20,000 individual entries which originate with Number One in the Irish Stud Book, Cannonball.  Work has begun to code each entry for HWSS status (where known).  Ultimately the aim is to convert the database into a statistical analysis programme which will then be able to predict the level of penetrence of  HWSS carriers within discrete populations.   Doing this process is one of academic interest only, because a screening test will be available well before the data entry is complete.  It will be interesting to see whether the statistical projections are significantly different from the reality demonstrated by the screening, however.

Of course the screening test is not going to happen until the second phase of the research is complete.  The second phase does not commence until the fund raising has reached the target.   So another plea for money here.  The funds still required are well within the achievable; we are so close to achieving the goal, but not quite there - yet.