Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Analysing pedigrees

  • 28-08-2016 5:53pm
    #1
    Closed Accounts Posts: 4,744 ✭✭✭


    My interest is thoroughbred flat pedigrees. I don’t own a horse or work in the business. Occasionally I have a bet.
    It would be nice to know before a horse is born or before it reaches the racecourse how good it might be. Crazy stuff?
    But people who breed horses are guessing. They breed to what is or may be selling/popular or what they think will perform on the racecourse.

    My pedigree database has almost 400,000 horses input by me over more than twenty years. Other databases contain horse performance data. Is it possible to compare horse pedigrees with performance data and learn? About a month ago I finished plugging holes in my data. The next move was writing programs to compare pedigree with performance. I’ve done a bit of that in the past. I can analyse a pedigree by sight (I think) by looking at six generations and up to twelve generations. Six generations is 126 ancestors while twelve is 8190 (2,4,8,16,32,64 … ).

    There was a thread on the Betfair forum last week looking for people who might be interested in a syndicate to buy a yearling at the Tattersalls September Sales. I said I would join, others did too, but it was the usual result. No action. While waiting for people to join the syndicate I wrote a computer program to analyse the 533 Tattersalls yearling lots. It took eight days to write the program and knock out the bugs. The program compares horses in the pedigree with each other.
    Just for comparison a few extras were added to the analysis pot: 533 Tattersalls yearlings; 190 Ireland and England 2015 Group 1, 2, 3 winners, a few great horses.
    A second analysis run looked at about 200 offspring of one sire (I had ratings for these).
    The results are interesting. Some refining of the output is needed. More horses analysed will produce more output and I hope better insight.

    I did not anticipate one of the problems. It takes almost one minute to analyse one horse. That might seem quick. I want to analyse 200,000 horses (200,000 minutes = 3333 hours = 139 days). Starting now the results file will be ready on 13th January assuming my PC could keep running 24/7 until then.
    This afternoon I made my first attempt to speed things. I’m reducing the size of the data files by reducing the size of each horse [name + date of birth]. COLOR="Blue"]Angrywhitepyjamas (2013)[/COLOR by COLOR="Purple"]Manduro (2002)[/COLOR out of COLOR="SeaGreen"]Ornellaia (2000)[/COLOR becomes COLOR="blue"]AVEZ[/COLOR COLOR="purple"]LXXP[/COLORCOLOR="seagreen"]OKSM[/COLOR . This reduces the data to below 1/7 of the original size. I’m hoping this speeds processing. Sprinkling magic powder on the code [AVEZ] will restore that name in the results file.

    I know horses may not run to their pedigree. There is a lot of randomness. Full siblings can be quite different. That is why I will use a very large data sample to try to tune out the randomness.
    Benefits might be: select the best in a race field / annual crop; pick the best lots in a sale; choose the best stallion for a broodmare; test if a new sire will suit his broodmare population.

    What do you think? Any ideas, suggestions, insights?
    (answers should not include the words “box of frogs”.)


«1

Comments

  • Registered Users, Registered Users 2 Posts: 2,702 ✭✭✭tryfix


    You're operating at a rarified level there Diomed, I don't pretend to understand your process but maybe you should do this in stages working on the last few years of runners and their pedigrees first and then adding in the older pedigrees and refining your results over time. There's only so much use in going back over older pedigrees and it's quite possible that they'd mislead you as racing patterns and training methods have evolved quite a bit.


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    The application of numbers to performance is new, perhaps from the 1960s onwards. I’m analysing current horses using their ancestors.
    Obviously if I analyse a horse born in 1990 its 5th generation ancestors could be born in around 1930 (5 x 12 years = 60 years. 1990-60 years = 1930).

    Some people say only the sire and dam matter. If that was true should brothers be exactly alike, and sisters exactly alike i.e. identical twins?
    It might be best to breed to the ancestors, not just the parents.

    The idea is to take each ancestor in a pedigree and compare it with every other ancestor.
    Extract those matches that have something of interest, count them and give them a value.
    List the counts and values in an output file.
    Then load the next horse for analysis.

    Later I go to the output file and split each horse a few ways: number and value of matches; by male/female; by generation; sire side/dam side.
    More matches might be best, but some results suggest balance between matches is better i.e. an excess of one type is a negative.
    I can’t be too firm with opinions yet as I need to analyse in volume first.

    Using code for names is an effort to increase processing speed. Like taking speed off a horse’s back I doubt there will be a straight line benefit.
    The first horse in the database is AAAA, then AAAB, AAAC and so on.
    A 26x26x26x26 = 456,976 database of 4 digit codes is used as substitutes for real names (more than enough to cover my 390,000 pedigree database.)

    One comforting thing is one of the very best horses ever is among the top rated, and one of the current top horses is also in the top few.

    One collection of filters reduces the 894 analysed horses to only 27 horses.

    The Tattersalls September sales yearlings (533) reduce to 19.
    The other (361) reduce to 8 as follows:
    2015 Group winners Ireland and England (188) reduce to 6 (includes one smasher).
    Famous horses (10) reduce to 2 (actually 1 of the 10 was a famous dud)
    Sons of one sire (163) reduce to 0.

    This "collection of filters" was done on the fly this morning and was just a guess. More output and thought will refine the choices.


  • Posts: 0 [Deleted User]


    diomed wrote: »
    Some people say only the sire and dam matter. If that was true should brothers be exactly alike, and sisters exactly alike i.e. identical twins?
    It might be best to breed to the ancestors, not just the parents.

    Sounds like an interesting project. If way over my head.

    Surely however the sire and the dam ARE all that matters for any given horse? That's where all the genetic material comes from. Of course ancestries can make one sire and one dam a particularly good match but that's a different matter.

    Your argument about full siblings being identical doesn't make any sense - if they share a sire and dam then they are going to share ancestry all the way back?

    The fact that they are not identical is just biology, in each case the genetic material is mixed differently, that is why breeding isn't an exact science.

    Good luck anyway! I guess the next step is to evaluate a crop of yearlings and make some sort of performance prediction and see if it works?


  • Banned (with Prison Access) Posts: 1,279 ✭✭✭kidneyfan


    I don't think that you should necessarily reduce the data files. Have you looked at indexing the database? 400,000 records isn't many.
    One thing I would suggest might be to apply the standard dosage method to your dataset and contrast your own method with that. etc


  • Registered Users, Registered Users 2 Posts: 2,702 ✭✭✭tryfix


    diomed wrote: »
    The application of numbers to performance is new, perhaps from the 1960s onwards. I’m analysing current horses using their ancestors.
    Obviously if I analyse a horse born in 1990 its 5th generation ancestors could be born in around 1930 (5 x 12 years = 60 years. 1990-60 years = 1930).

    Some people say only the sire and dam matter. If that was true should brothers be exactly alike, and sisters exactly alike i.e. identical twins?
    It might be best to breed to the ancestors, not just the parents.

    The idea is to take each ancestor in a pedigree and compare it with every other ancestor.
    Extract those matches that have something of interest, count them and give them a value.
    List the counts and values in an output file.
    Then load the next horse for analysis.

    Later I go to the output file and split each horse a few ways: number and value of matches; by male/female; by generation; sire side/dam side.
    More matches might be best, but some results suggest balance between matches is better i.e. an excess of one type is a negative.
    I can’t be too firm with opinions yet as I need to analyse in volume first.

    Using code for names is an effort to increase processing speed. Like taking speed off a horse’s back I doubt there will be a straight line benefit.
    The first horse in the database is AAAA, then AAAB, AAAC and so on.
    A 26x26x26x26 = 456,976 database of 4 digit codes is used as substitutes for real names (more than enough to cover my 390,000 pedigree database.)

    One comforting thing is one of the very best horses ever is among the top rated, and one of the current top horses is also in the top few.

    One collection of filters reduces the 894 analysed horses to only 27 horses.

    The Tattersalls September sales yearlings (533) reduce to 19.
    The other (361) reduce to 8 as follows:
    2015 Group winners Ireland and England (188) reduce to 6 (includes one smasher).
    Famous horses (10) reduce to 2 (actually 1 of the 10 was a famous dud)
    Sons of one sire (163) reduce to 0.

    This "collection of filters" was done on the fly this morning and was just a guess. More output and thought will refine the choices.

    On the question of full brothers being exactly alike, that's the supposedly logical outcome of any attempt to define racehorse potential purely on pedigree analysis. Of course the actual reality of the abilities displayed by full brothers is something else and in that conundrum lay some useful pointers.

    From my own observations I can see a roughly 2 furlong spread either side between the trip that a horse should be optimum at and what it actually is optimum at in the highest class racing. So Frankel the absolute freak that he was could pull the Galileo x Kind cross down to 8f optimum while his brother Noble Mission was optimum at 10f and could manage 12f just a little less well.

    That natural individual variation from what should be for the optimum trip based on pedigree would explain the differences between full brothers.

    It's great that you're doing your own dosage index because the official one is quite poorly thought out. The international differences between the differering speed and stamina tests that the US, Australia and Europe provide make it next to impossible to accurately produce a dosage figure for how much stamina is in a line which hasn't been tested for stamina at the top level for a few generations.

    I'd be interested to know what your dosage index is predicting as the staying distance for some of the Frankel's that are hitting the track, because the official DI score of many of these horses is 12f+ territory but we're not seeing any great staying on performances from his stock so far.


  • Advertisement
  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    Surely however the sire and the dam ARE all that matters for any given horse?
    Foal A has two parents rated 110 and four grandparents rated 90.
    Foal B has two parents rated 110 and four grandparents rated 130.
    Which foal would you pick assuming both foals look the same?

    I much prefer a foal with good ancestors instead of a foal with good parents.
    Good parents are an indicator that the foal may be good, but there are many examples of top horses that were no good at stud: Reference Point; Brigadier Gerrard.
    A good horse is imo a product of a sire and dam that have ancestors that produce a pleasing result.
    Someone asked me on boards if a stallion (he named the stallion) was a good match for his mare.
    This stallion was a very good runner (rated 130+).
    When I checked his full crop (almost 200 rated horses) I found that the majority of his foals were rated significantly lower than the dams that produced them.
    The reason was obvious to me when I looked at the pedigree of each foal in turn. His ancestors were incompatible with the broodmare population he was serving.
    Your argument about full siblings being identical doesn't make any sense - if they share a sire and dam then they are going to share ancestry all the way back?
    They aren’t identical. I said “should they be alike?” They are similar but different. They share some genetic material. If you look at human brothers they are different.
    Identical pedigrees do not make identical horses. However, closely related horses in a pedigree increase the chance that their superior (hopefully) traits are passed forward.
    Frankel's stud fee is UK£125k. His full brother Noble Mission's fee is $25k (£19.1k). That is a big gamble that Frankel passes on the racing ability.
    Good luck anyway! I guess the next step is to evaluate a crop of yearlings and make some sort of performance prediction and see if it works?
    I have plenty of data to check the past.
    It is easier to check twenty years of past data over the winter than to wait for next year's runners to complete their season.
    If I can find reasonable cause and effect then I’ll look forward.

    Forgive me for multiquoting (one of my pet hates on boards.ie)


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    kidneyfan wrote: »
    I don't think that you should necessarily reduce the data files. Have you looked at indexing the database? 400,000 records isn't many.
    One thing I would suggest might be to apply the standard dosage method to your dataset and contrast your own method with that. etc
    I use indexing extensively. What "slows" the program is the amount of comparison (I'm not saying what with what).
    It does the comparisons for 894 horses, stepping down through the load file one at a time.

    Do you mean Dosage Index? I wrote a program for that in 1993. It is not worth considering. See elsewhere on board for my comments.


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    tryfix wrote: »
    I'd be interested to know what your dosage index is predicting as the staying distance for some of the Frankel's that are hitting the track, because the official DI score of many of these horses is 12f+ territory but we're not seeing any great staying on performances from his stock so far.
    My attempt is guessing the quality of a horse, not the likely staying distance.

    For stamina estimation you should check the Equinome C.C, C.T, T.T classification system.
    You can only be sure of a foal's stamina (or lack) if both his parents are C.C (the foal will be a C.C), or if both the parents are a T.T (the foal will be a T.T), of if one parent is a C.C and the other parent is a T.T (the foal is a C.T).
    If one parent is a C.C and the other a C.T then the foal can be a C.C or a C.T.
    If one parent is a C.T and the other a C.T then the foal can be a C.C, a C.T, or a T.T.
    C.C = sprinter, T.T = stayer

    From reading past research papers by Dr Emmaline Hill it seems that all C.C horse are descended from one mare about 300 years ago (the Crab mare?), while there were eleven (?) variations of T.T.
    The paper said that made sense when you remember that races were four mile heats between two horses, and the two often ran a few four mile heats on the same day. Staying was the game back in the day.


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    There wasn't much improvement from reducing the data field size, about 10% to 53 seconds a record.
    I'm not even sure there was an improvement as I didn't record the timings from earlier.
    I might go for a cut down program that looks at less of the pedigree, identify likely candidates, then run the full version on those.

    The other option is to do what they do on the cycling forum. If someone gets a puncture people recommend a new bike. Perhaps a new PC?


  • Banned (with Prison Access) Posts: 1,279 ✭✭✭kidneyfan


    diomed wrote: »
    There wasn't much improvement from reducing the data field size, about 10% to 53 seconds a record.
    I'm not even sure there was an improvement as I didn't record the timings from earlier.
    I might go for a cut down program that looks at less of the pedigree, identify likely candidates, then run the full version on those.

    The other option is to do what they do on the cycling forum. If someone gets a puncture people recommend a new bike. Perhaps a new PC?
    Perhaps a new database? What are you using currently ? if Access, Filemaker, Foxpro? If any of the above think about changing.


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 261 ✭✭show me the money.1


    Amazing dedication I call it


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    kidneyfan wrote: »
    Perhaps a new database? What are you using currently ? if Access, Filemaker, Foxpro? If any of the above think about changing.
    I'm not enthusiastic about the Microsoft model, a yearly charge for operating system and applications, or going that way.

    I use MicroSoft FoxPro, and support stopped for that in 2015.
    The commercial pedigree software package I use to input pedigrees uses .dbf files. Their pedigree software moved to Access files about ten years back. At that time I was using Access in work and found it a bit weak so didn't move. The problem with (their) pedigree software was poor data, and they like to update your data with their data. I have 800 books - stud books, pedigree analysis, form books, so can look up things.

    I fix problems in my data by running small programs in FoxPro e.g. a female as a sire, an incomplete pedigree (e.g less than 31 ancestors in 4 generations 1+2+4+8+16), missing Bruce Lowe family numbers, duplicate entries (two slightly different names, with same sire and dam).

    My PC is a about seven years old, a gaming type PC. Last night I tested it on a website that ranks PCs. It was in about the 45th percentile for CPU which was ok, but only the 6th for graphic. Graphics are not important. Perhaps I need something with a faster CPU, caches, faster transfers.

    A bit of an update.
    Overnight I let the PC work on about 250 pedigrees for horses born in 1999 that had ratings in my files.
    Yesterday I did a lot of work on the results file, setting up a name field (the horse) and 28 results fields for that horse, and automating.
    The results file groups the numbers in a few ways: pedigree top half/bottom half; male/female; which generation; and fourteen fields of how complex the link between horses.

    Earlier I thought the numbers looked good.
    When I later looked at the numbers of the 250 sample I was disappointed. I used Pearson in Excel for a quick test on the many output fields.

    Overall it was a bit positive - the higher my number the higher the rating. This morning I looked at it again and decided to split the 250 horses into fillies, colts, geldings.
    (The categorisation of males in my database colts/geldings can be a bit off. A colt can become a gelding.)

    Anyway the 78 fillies came out with a much stronger positive relationship between my numbers and their ratings, the colts mildly positive, and the geldings slightly negative.
    This might be due to the small sample size. It might be that only promising females run, less promising females have a breeding career, and poor male runners are gelded.

    There are other possibilities:
    fillies showing poor ability are culled before they reach the sales/racecourse
    males with both poor and good pedigrees are sent to the sales as they are more saleable (59% of the 1999 dob sample were male (colts and geldings)

    I need to think about the careers of horses:
    This one year example gives an idea of the data.
    The file has 7826 horses (m,g,f) with a 1999 date of birth (dob)
    4640 females (2327 have no rating, presumable used only for breeding)
    1108 males (183 have no rating)
    2078 geldings (1 has no rating, is he an input error?, he has no dam, and is not in the books)

    Careers: females ran 2313 (29.6%); females no rating 2327 (29.7%); males ran 925 (11.8%); males no rating 183 (2.3%); geldings ran 2077 (26.6%); geldings no rating 1 (0.0%) = 7826 (100.0%)
    It might be an idea to compare the racing females with the breeding females.

    A slightly worrying result in the small sample size was the further back in the pedigree the better the relationship between my numbers and the ratings. I hope a bigger test will disprove this. I don't want to analyse more of the pedigree of each horse. It takes 53 seconds each horse now. Going back another generation doubles the data, and doubles the time.

    I have a file of ideas and I'll be working on those too.


  • Registered Users, Registered Users 2 Posts: 9,339 ✭✭✭convert


    You've really undertaken a huge project there, diomed. The amount of detail acquired and entered is just frightening, yet the details must be fascinating.

    Have you had a look at Dr Emmeline Hill's research on equine genomics? It's focusing more on the gene sede of things, but it would be interesting to see if just looking at your info can provide similar info.

    http://www.the-scientist.com/?articles.view/articleNo/31893/title/Emmeline-Hill--Genes-for-Speed/

    http://www.ucd.ie/agfood/staff/animalcropscience/academic/dremmelinehill/


  • Registered Users, Registered Users 2 Posts: 16,955 ✭✭✭✭Francie Barrett


    About 3/4 of the horses in the Tatersalls catalogue would likely be out of my league. I think the below two might potentially be interesting at a reasonable price.

    http://www.tattersalls.com/cat/october/2016/283.pdf
    http://www.tattersalls.com/cat/october/2016/279.pdf

    Diomed - can you tell me what your computer says about the Intello progeny? It is potentially an interesting sire.


  • Banned (with Prison Access) Posts: 1,279 ✭✭✭kidneyfan


    Hi Diomed,

    400,000 records is significant but not huge. I would seriously consider a database like Mysql or Postgre. There is a free edition of microsoft SQL available.

    It might be worth your while to look at an application like R.

    I would suggest that you post this question to the software development forum or database design forum (if there is one).

    How much RAM is on your PC?

    diomed wrote: »
    I'm not enthusiastic about the Microsoft model, a yearly charge for operating system and applications, or going that way.

    I use MicroSoft FoxPro, and support stopped for that in 2015.
    The commercial pedigree software package I use to input pedigrees uses .dbf files. Their pedigree software moved to Access files about ten years back. At that time I was using Access in work and found it a bit weak so didn't move. The problem with (their) pedigree software was poor data, and they like to update your data with their data. I have 800 books - stud books, pedigree analysis, form books, so can look up things.

    I fix problems in my data by running small programs in FoxPro e.g. a female as a sire, an incomplete pedigree (e.g less than 31 ancestors in 4 generations 1+2+4+8+16), missing Bruce Lowe family numbers, duplicate entries (two slightly different names, with same sire and dam).

    My PC is a about seven years old, a gaming type PC. Last night I tested it on a website that ranks PCs. It was in about the 45th percentile for CPU which was ok, but only the 6th for graphic. Graphics are not important. Perhaps I need something with a faster CPU, caches, faster transfers.

    A bit of an update.
    Overnight I let the PC work on about 250 pedigrees for horses born in 1999 that had ratings in my files.
    Yesterday I did a lot of work on the results file, setting up a name field (the horse) and 28 results fields for that horse, and automating.
    The results file groups the numbers in a few ways: pedigree top half/bottom half; male/female; which generation; and fourteen fields of how complex the link between horses.

    Earlier I thought the numbers looked good.
    When I later looked at the numbers of the 250 sample I was disappointed. I used Pearson in Excel for a quick test on the many output fields.

    Overall it was a bit positive - the higher my number the higher the rating. This morning I looked at it again and decided to split the 250 horses into fillies, colts, geldings.
    (The categorisation of males in my database colts/geldings can be a bit off. A colt can become a gelding.)

    Anyway the 78 fillies came out with a much stronger positive relationship between my numbers and their ratings, the colts mildly positive, and the geldings slightly negative.
    This might be due to the small sample size. It might be that only promising females run, less promising females have a breeding career, and poor male runners are gelded.

    There are other possibilities:
    fillies showing poor ability are culled before they reach the sales/racecourse
    males with both poor and good pedigrees are sent to the sales as they are more saleable (59% of the 1999 dob sample were male (colts and geldings)

    I need to think about the careers of horses:
    This one year example gives an idea of the data.
    The file has 7826 horses (m,g,f) with a 1999 date of birth (dob)
    4640 females (2327 have no rating, presumable used only for breeding)
    1108 males (183 have no rating)
    2078 geldings (1 has no rating, is he an input error?, he has no dam, and is not in the books)

    Careers: females ran 2313 (29.6%); females no rating 2327 (29.7%); males ran 925 (11.8%); males no rating 183 (2.3%); geldings ran 2077 (26.6%); geldings no rating 1 (0.0%) = 7826 (100.0%)
    It might be an idea to compare the racing females with the breeding females.

    A slightly worrying result in the small sample size was the further back in the pedigree the better the relationship between my numbers and the ratings. I hope a bigger test will disprove this. I don't want to analyse more of the pedigree of each horse. It takes 53 seconds each horse now. Going back another generation doubles the data, and doubles the time.

    I have a file of ideas and I'll be working on those too.


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    Have you had a look at Dr Emmeline Hill's research on equine genomics?
    I am familiar with her work and the work of Prof Patrick Cunningham.
    Years ago Prof Cunningham analysed all horses in a few editions of the General Stud Book comparing their ratings with their pedigrees and found 1/3 of performance could be explained by pedigree.
    I read in the Bloodstock Breeders Review annuals (30 years ago?) about the Irish Thoroughbred Breeders Association (?) funding the research.

    Her work on the speed gene and the quality test is very useful to owners for racing now, and probably of great benefit in picking mares to use for breeding.
    That research will produce more discoveries in time.
    I wonder if the lots on offer at horse auctions in the future will be the genetically proven duds.

    I don't have a blood sample for each of 200,000 horses, living and dead, and the €500 fee each to get them tested.
    My results (if I get any results) will be guesswork. Her results are facts.
    About 3/4 of the horses in the Tatersalls catalogue would likely be out of my league. I think the below two might potentially be interesting at a reasonable price.
    Can you tell me what your computer says about the Intello progeny? It is potentially an interesting sire.
    For the past week or so I've been gathering more data on about 180k horses so I won't be stopping this work.
    I never have an opinion on one sire, only an opinion on how his pedigree works with the mare population to which he is bred.
    To expand on that, I wouldn't be a fan of the Frankel stud fee when you can get the same ancestors with his full brother Noble Mission.
    I've started a re-write of the analysis I did a few weeks back. I'll use a 12 generation pedigree instead of a 6 generation. That looks like slower work (12 v 6) but I'll do less checking.
    400,000 records is significant but not huge. I would seriously consider a database like Mysql or Postgre. There is a free edition of microsoft SQL available.
    It might be worth your while to look at an application like R.
    I would suggest that you post this question to the software development forum or database design forum (if there is one).
    How much RAM is on your PC?
    I'll speed things by re-writing to reduce the checking, and by re-using some of the work
    e.g. for the 1,650 Sadler's Wells offspring in my database I'll just generate the SW pedigree once and the dam pedigrees of the 1,650 foals instead of each pedigree fully each time.

    I assume you mean ... R version 3.2.3 (2015-12-10) The R Foundation for Statistical Computing ...
    No I have it but have not used it yet. Data collection takes up most of my time.


  • Banned (with Prison Access) Posts: 1,279 ✭✭✭kidneyfan


    diomed wrote: »
    I am familiar with her work and the work of Prof Patrick Cunningham.
    Years ago Prof Cunningham analysed all horses in a few editions of the General Stud Book comparing their ratings with their pedigrees and found 1/3 of performance could be explained by pedigree.
    I read in the Bloodstock Breeders Review annuals (30 years ago?) about the Irish Thoroughbred Breeders Association (?) funding the research.

    Her work on the speed gene and the quality test is very useful to owners for racing now, and probably of great benefit in picking mares to use for breeding.
    That research will produce more discoveries in time.
    I wonder if the lots on offer at horse auctions in the future will be the genetically proven duds.

    I don't have a blood sample for each of 200,000 horses, living and dead, and the €500 fee each to get them tested.
    My results (if I get any results) will be guesswork. Her results are facts.


    For the past week or so I've been gathering more data on about 180k horses so I won't be stopping this work.
    I never have an opinion on one sire, only an opinion on how his pedigree works with the mare population to which he is bred.
    To expand on that, I wouldn't be a fan of the Frankel stud fee when you can get the same ancestors with his full brother Noble Mission.
    I've started a re-write of the analysis I did a few weeks back. I'll use a 12 generation pedigree instead of a 6 generation. That looks like slower work (12 v 6) but I'll do less checking.


    I'll speed things by re-writing to reduce the checking, and by re-using some of the work
    e.g. for the 1,650 Sadler's Wells offspring in my database I'll just generate the SW pedigree once and the dam pedigrees of the 1,650 foals instead of each pedigree fully each time.

    I assume you mean ... R version 3.2.3 (2015-12-10) The R Foundation for Statistical Computing ...
    No I have it but have not used it yet. Data collection takes up most of my time.
    I have 3.2.2 but obviously along the same lines. I use R Studio. I still don't understand why you are having these difficulties.


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    kidneyfan wrote: »
    I have 3.2.2 but obviously along the same lines. I use R Studio. I still don't understand why you are having these difficulties.
    I'm not having any difficulties.


  • Registered Users, Registered Users 2 Posts: 48 RoyalAcademy2


    This project is doomed to failure I'm afraid.

    Coolmore have the best current band of broodmares mostly dating back to when American blood was reimported back to Ireland in the 70's.

    Godolphin/Shadwell have the best mares money can buy.

    The traditional breeders like the Aga and Ballymacoll have their own lines.

    The new kids on the block will pay outrageous prices for blood (and they will be forced to!)-the Wildenstein draft at Goffs in November will be fireworks I expect.

    1. When did Coolmore last produce a real champion?
    2. Why cant Godolphin breed a stakes horse any more?
    3. What's the Aga's strike-rate for G1 success? One foal in fifty/hundred I expect
    4. How long will the Qatari's remain interested? Is Group 1 racing more about the "season" in London that any horse race?

    All the above will get their winners/champions but they will fail miserably with most others.

    For any anoraks here, take a look at the average failed Weld/Abdullah runner - he gets large numbers of them - Prendergast/Sh Hamdan, Bolger home-breds and any one of hundreds of failed Sheikh Mo horses. It is simply a numbers game for which you need dozens of "well bred" mares with a history AKA pedigree AKA money pit.

    The sub-theme of this admirable thread is, presumably, to find value in the sales ring. I would rather be a seller than buyer!


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    The sub-theme of this admirable thread is, presumably, to find value in the sales ring. I would rather be a seller than buyer!
    I don't have any broodmares and will not be buying.
    The aim is to analyse horses before they race or have raced little and have an opinion on every runner.
    I'll admit I did analyse a sales catalogue, but that was just a useful exercise in getting something together to analyse large numbers at once. My ideas file has about a dozen ideas that I'll be looking at over the winter, and testing against large results files.


  • Advertisement
  • Banned (with Prison Access) Posts: 1,279 ✭✭✭kidneyfan


    How bizarre to describe such a project as doomed to failure.

    Diomed it occurs to me that databases work best when they work with sets rather than trying to work procedurally. I am not that familiar with foxpro but could reworking some queries to work in a more set baased manner work?


  • Registered Users, Registered Users 2 Posts: 48 RoyalAcademy2


    I am suggesting that the results will provide generalisations: nicks that will be already known to the teams of experts behind the conglomerations.

    I don't know why diomed would almost apologise for reviewing a catalogue because, as he says, he's neither an owner not breeder therefore this is the logical starting point.

    His reference to the betfair thread is also extremely naive and I cannot see the point of the exercise - and the Herculean task - unless it has some ultimate reward.

    My point is not to deride but to suggest the results will be predictable yet unusable in any useful, commercial sense


  • Registered Users, Registered Users 2 Posts: 48 RoyalAcademy2


    I am suggesting that the results will provide generalisations: nicks that will be already known to the teams of experts behind the conglomerations.

    I don't know why diomed would almost apologise for reviewing a catalogue because, as he says, he's neither an owner not breeder therefore this is the logical starting point.

    His reference to the betfair thread is also extremely naive and I cannot see the point of the exercise - and the Herculean task - unless it has some ultimate reward.

    My point is not to deride but to suggest the results will be predictable yet unusable in any useful, commercial sense


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    RoyalAcademy2
    Thanks for your review of the results.


  • Registered Users, Registered Users 2 Posts: 48 RoyalAcademy2


    Diomed, your comment suggests I am missing the point and, if I am, I apologise.

    Address one question please: if two full brothers can be Pegasus on the one hand and noddy on the other how can your data predict between one and the other?

    Sailers wells and fairy King came from the same dam yet fairy King was a very good stallion despite no track success.

    Flat racing suggests the dam must have pedigree to have a decent chance of producing a winner whereas it's often complete pot luck over jumps.


  • Registered Users, Registered Users 2 Posts: 2,702 ✭✭✭tryfix


    Diomed, your comment suggests I am missing the point and, if I am, I apologise.

    Address one question please: if two full brothers can be Pegasus on the one hand and noddy on the other how can your data predict between one and the other?

    Sailers wells and fairy King came from the same dam yet fairy King was a very good stallion despite no track success.

    Flat racing suggests the dam must have pedigree to have a decent chance of producing a winner whereas it's often complete pot luck over jumps.

    The purpose of his data mining is to spot pedigrees that are likely to produce high class horses. In the case of Sadler's Wells and his full brother Fairy King the fact that Fairy King wasn't up to much on the track doesn't mean that the ( Northern Dancer x Fairy Bridge ) mating wasn't a super nick for producing quality racehorses, the subsequent successful stud career of Fairy King confirms that ( Northern Dancer x Fairy Bridge ) was a super nick and the third full brother colt from that mating Tate Gallery was Gp1 winning 2yo who also had a successful stallion career.

    All three full Brothers produced different types than the others as Stallions but as Diomed pointed out he's not trying to predict staying power, just likely inherited class. On that basis the Sadler's Wells and brothers story justifies the kind of broad search for class in a pedigree that Diomed is pursuing.


  • Registered Users, Registered Users 2 Posts: 48 RoyalAcademy2


    And my point is that this would be obvious to any detailed study of any individual pedigree.


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    Diomed, your comment suggests I am missing the point and, if I am, I apologise.
    1 .Address one question please: if two full brothers can be Pegasus on the one hand and noddy on the other how can your data predict between one and the other?
    2. Sailers wells and fairy King came from the same dam yet fairy King was a very good stallion despite no track success.
    3. Flat racing suggests the dam must have pedigree to have a decent chance of producing a winner whereas it's often complete pot luck over jumps.
    1. Do you mean Sadler's Wells?
    I assume you are saying Sadler’s Wells was Pegasus and his full sibling Fairy King was Noddy. Is that correct?
    Sadler’s Wells was not the champion of his year. El Gran Senor was rated 136; Darshaan 133; Sadler’s Wells 132; Chief Singer 131.

    2. There is a reason for Fairy King's lack of success on the track: "made one start at Phoenix Park and suffered a fractured sesamoid." see pedigreequery.com

    Another full sibling, Tate Gallery, was rated 117 as a 2yo and got no rating as a 3yo "last of 15 in the 2000 Guineas ... lacked enthusiasm on reappearance … retired to Coolmore Stud".

    Obviously to me Fairy King was a sprinter (Equinome C.C) and Sadler’s Wells a middle distance type (Equinome C.T). Their dam fairy Bridge ran twice, won twice, as a 2yo in 1977 at 5f and 6f in the Phoenix Park, and was retired to stud, given 8 stone 12 pounds on the Irish Free Handicap (3rd highest ranked). The prize money she could win as a 3yo paled into insignificance beside her stud vale.

    3. In my opinion good jumps runners are slower that good flat horses. If jumps horses were competitive they would compete for prizes like today’s Irish Champion Stakes. Harzand won UK£1.5 million for his English and Irish Derby wins.
    The real prize is stallion fees. Galileo’s fee is said to be €300k at over 100 covers a year = €30 million a year x 15 years stud career = ~ €450 million - the real prize.

    The only jumps horse that I can remember was competitive on the flat was Alderbrook. He won the Group 2 Prix Dollar at Longchamp. He later won and was 2nd in the Champion Hurdle.
    Istabraq won 2 races from 11 runs on the flat, and 23 from 29 over hurdles.

    You can not make definitive statement saying the dam must be good to breed a winner. I say there must be class, either in the dam, her dam, or further back.
    My example to confuse you is Sea-Bird (1962). He won the Epsom Derby, the Prix de l’Arc de Triomphe and other top races.
    Timeform rated him 145 and he is often considered the best horse of the 20th century.
    His dam Sicalade won 750 French Francs (about £75). Timeform say she "ended up being sold as butcher’s meat for £100".
    Her dam Marmelade (1949), was a full sibling of Camaree, the 1950 English 1000 Guineas winner.
    Sicalade dam Marmalade was a full sibling of a very classy filly.
    While I am discussing siblings I’ll say I don’t take much notice of half-siblings i.e. different sire, dame dam (often mentioned/identified in sales catalogues).


    Some general comments:
    You can not predict how good/bad a foal will be. There are over 20,000 genes involved and afaik the scientists have some idea about some of the genes/grouping of genes involved in athletic ability. They can test the foal when the foal is produced. They can not tell before the mating what ability the foal will inherit from the sire and what it will inherit from the dam.


    What I am trying to do is identify the ingredients that might produce a pleasing result by analysing the parents, grandparents, and further back.
    I know some will say only the sire and dam matter. True. If you look at brothers have you ever wondered why you are not identical? Each gets a different mix.

    One benefit from your questions is this afternoon I looked at the best horses by a sire. I found something in those pedigrees and wondered if it was in any other very top horses not by that sire. I found the same characteristic. What is strange is the thing I found was in the 11th generation of those three top horses, in both the sire side and dam side. It is not a simple feature. I’ll keep the instantly recognisable names of those three top horses to myself and put it in my ideas file for further testing.

    I want to analyse the pedigrees of both poor and good horses (horses that are not culled) against race data and work out a few rules.
    Some time ago I checked if group race winners produced group race winners (the sales catalogue put the damline group winners in black type approach). I did not find a worthwhile link (very weak).
    I have a database of all Group 1, 2, 3 races for Ireland, England, France, Germany, Italy, USA (Grade 1 only) from 1900 to 2015. It wasn’t a simple task to fill in the database and the pedigrees.

    Here is some info on USA horses in a book on my desk:
    Number of foals: 399,121
    Starters: 69.4%
    Winners: 46.1%
    Stakes winners: 3.4%
    Graded stakes winners: 0.8%
    Grade 1 winners: 0.2%

    1 in 500 foals in the USA is a grade 1 winner.

    The aim is not to identify the Group 1 / Grade 1 winners by analysing pedigrees, to identify the 500/1 shot.
    A more humble target might be to identify the 30.6% who are not good enough to make the racecourse, or the 23.3% who run and can not win a race (see above table).


  • Registered Users, Registered Users 2 Posts: 9,205 ✭✭✭Gringo180


    Todays Leger winner Harbour Law has a real speedy pedigree. Surprised to see such a pedigree produce a stamina laden stayer at the highest level.


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 2,702 ✭✭✭tryfix


    Gringo180 wrote: »
    Todays Leger winner Harbour Law has a real speedy pedigree. Surprised to see such a pedigree produce a stamina laden stayer at the highest level.

    A DI of 1.29 isn't speedy, but I know what you mean re the dam a sprinter by Pivotal a sire who produces both sprinters and classic middle distance performers.

    There's the highest level and the Highest Level. The tight finish of the Leger was between three nice group class colts rather than between 3 proper Gp 1 horses. The proper Gp 1 horse in the race crashed out leaving a Gp1 in name only behind.


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    I'm dragging up an old thread.
    For the past few months I've been doing a bit of database programming. The idea is to compare flat (not fences or hurdles) pedigrees to race results.

    At this stage I have not done the comparison, but I have the basics on the pedigree side.
    I have a program that analyses pedigrees. I can load any number of horses, start it, and come back later for the results.
    I'll be tinkering with it a little, adding bits and pieces, putting in timing to find slow spots, and rewriting to speed things.

    What gave me a boost was a poster asked me by PM to look at the Tattersalls February catalogue. It has 488 lots.
    I could look at each lot on screen in my commercial pedigree program (TesioPower) and pick out the best imo.
    Instead I decided to put in some extra effort to finish the database program.

    It analysed the 488 Tattersalls lots in 44.53 seconds or approx 0.9 seconds a horse.
    That was on my slow seven year old PC.
    I bought a new PC in November and that runs things in 32% of the time of the old PC, so it will analyse a horse pedigree in under 0.3 of a second.
    First it gathers the 126 ancestors in the first six generations (2,4,8,16,32,64), then analyses them.
    Fwiw in a sales catalogue they print the first three generations (14 ancestors).

    Plan
    Add new features to the six generation analysis (group winners, Derby winners, group winner producer).
    Compare the analysis to race performance.
    Give up if no link found between pedigree and performance :), or makes changes.
    My next plan will be to analyse 7, 8, 9, 10, 11, 12 generations. This should be easy as I only have to increase the size of the ancestor database.
    Of course if I go from 6 generations from 7 generations the data doubles.
    A 12 generation analysis is much bigger (and slower) than a 6 generation, 8,190 horses v 126 horses, 65 times the size.

    When running the program a byproduct is it gave some strange results for a few horses, many in the 1800s.
    These are horses with incomplete pedigrees (some of these are half-bred non-thoroughbred).
    I go back, fix the data if I can, and run it again.
    The reason to test the program against so many pedigrees if to test it against complex pedigrees.

    I am very interested in full siblings in pedigrees
    (horses with same sire and dam e.g. sires Sadler's Wells and Fairy King are both by the sire Northern Dancer out of the dam Fairy Bridge).
    And I look for 3/4, 7/8 siblings.
    Examples of recent 3/4 siblings are:
    Frankel; Highland Reel; Intello; Roderic O'Connor; Sir Isaac Newton; Teofilo, all by Galileo out of a Danehill dam.

    Below is a summary decade by decade of full siblings in 6 generation pedigrees from 1800 to now.
    It gives an indication that horses were much more closely inbred in the past, probably because you walked your sire to a local mare.
    (full siblings A and B: one of horse A, and four of horse B is counted as five)
    Horse populations in the early 1800s stayed in the same area (as did humans).
    When trains were invented (1830s) you could travel your mare.
    Motor transport (1890s) made travel even easier.
    Now you can fly the mare or stallion anywhere.

    In the table below you will see a spike in full siblings in pedigrees in the 1860s and 1870s.
    My guess is brothers Stockwell (1849) ("the emperor of stallions") and Rataplan (1850) are heavily involved.
    Galileo traces back in direct male line to Stockwell, as does almost everything else running today.
    The low full sibling numbers for the 1990, 2000s, 2010s might be because many of these are low quality running horses, not breeding horses.

    Decade Horses Full Sib Average
    180? 6 23 3.8
    181? 40 121 3.0
    182? 311 863 2.8
    183? 948 2407 2.5
    184? 1285 3642 2.8
    185? 1826 8846 4.8
    186? 2374 19072 8.0
    187? 3145 26993 8.6
    188? 4612 26988 5.9
    189? 5622 24467 4.4
    190? 6543 31694 4.8
    191? 7831 35914 4.6
    192? 9965 30673 3.1
    193? 11835 32875 2.8
    194? 16631 40744 2.4
    195? 25365 38647 1.5
    196? 27277 36406 1.3
    197? 36611 82902 2.3
    198? 50620 128464 2.5
    199? 69801 107813 1.5
    200? 83274 52267 0.6
    201? 15249 6288 0.4

    381171 3.4


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    This week I completed database programs that analysed 159,660 horse pedigrees, and threw out the results.
    Next is trying to learn from the results if there are differences between better horses and lesser horses, and if significant.

    The results data is 4116 rows by 40 columns = 164,640 cells.
    37 of the 40 columns are features in each pedigree, 16 from duplicated stallions, 16 from duplicated mares (duplicated mares are rare).
    Very few of the 37 fields from 4116 rows are filled (37x4116 = 152,292). 125.695 cells have a zero result (82.5%). Only 17.5% of cells are filled.

    It is a inbreedings/linebreedings analysis of six generations counting number of duplicated horses, groups of duplicated horses, sex of offspring of duplicated horses, siblings in pedigrees and so on.
    If the results are useless I will move on to other ideas.


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    I don't know much about statistics but I think the Chi Squared test might be the way to see if my results are relevant.
    I've been swotting up on it in the last few days so caution is advised.

    First I split the data into three files: colts, geldings, fillies.
    There are 37,057 colts and that is where I started the tests.
    I split the colts into two groups 10,212 and 26,842 = 37,054 (3 missing?).

    Group A are the lower rated colts, Group B the higher rated colts.
    Of course the highest rated colt in group A will only be a fraction below the lowest rated colt in Group B.
    In hindsight I could have picked a higher point so that the split would be closer to 50/50.

    Actual Frequency Group A Group B Total
    0 4453 11394 15847
    1 3876 10158 14034
    2 1505 4116 5621
    3 317 983 1300
    4 58 171 229
    5 3 20 23
    10212 26842 37054


    Expected Frequency Group A Group B Total
    0 4,367.40 11,479.60 15847
    1 3,867.74 10,166.26 14034
    2 1,549.14 4,071.86 5621
    3 358.28 941.72 1300
    4 63.11 165.89 229
    5 6.34 16.66 23
    10212 26842 37054

    p-value 0.01806328 <--- chitest (actual: expected)


    Chi-Square Terms Group A Group B
    0 1.68 0.64 = (4453-4367.4)^2/4367.4
    1 0.02 0.01 ( differences squared
    2 1.26 0.48 ( to turn them positive
    3 4.76 1.81 ( then divide by expected
    4 0.41 0.16
    5 1.76 0.67

    Chi-Square 13.64 <--- sum above Chi-Squared Terms values
    Degrees of Freedom 5 <--- (data rows -1) * (data columns -1)
    Alpha 0.01 <--- 1% level
    Critical Value 16.81 <--- critical chi-square value (x2 distribution table)

    Decision Reject - Group A & Group B are not different at 1% level

    Explanation ( If "Chi-Square" number is bigger than "Critical Value"
    ( then differences are large between the groups
    ( and not caused by chance


    This is from an Excel spreadsheet.
    The actual numbers are at the top.
    Then there is a calculation of the "expected" numbers.
    (4367.40 is 4453 x 15847/37054 and so on)
    The Chi-Squared Terms are the differences squared (to turn negative differences positive).
    The idea is if the difference are big then Group A and Group B are very different, and the difference is not due to chance.

    The "Decision" near the end is a comment saying if Group A and Group B are similar or different.
    What I want is the pedigree factor I am testing to show a difference between the groups.
    I want the Group B to have more of the factor, and that difference to be so large that if is not caused by chance.
    The above test "failed" to prove that there is a one in a hundred chance that Group B are better due to the factor.
    But if I change the Alpha to 0.05 it changes to
    "Accept - Group A and Group B are different, - significant at 5% level"
    or in other words there is only a one in twenty chance that Group B has more of the factor due to chance.

    Then I thought I would see what the average ratings were for the colts with this factor.
    Please remember that there are 37 pedigree factors, and this is just one of the 37 factors, a factor I think might produce better colts.
    The other 36 factors are not yet tested statistically.
    I will try to write a database program to calculate the statistical result at 1% and 5% for colts, geldings, fillies, and for all 37 factors.
    It might be worthwhile to split the data further into 10% chunks from slowest horses to fastest.
    Another possible is to use the random factor to split the horses randomly so that I do not use my opinions to select groups to test.

    What would happen if I calculated the average ratings of the 37,057 colts who have the factor one, twice, three times and upwards.
    Almost half the colts do not have this factor in their first six generations.
    The colts without this factor have an average rating of 74.36.
    The higher the number of factors it appears the higher the rating.
    These are the ratings of the 23 horses with 5 occurences that average 106.40
    45, 65, 74, 83, 88, 90, 96, 107, 111, 112, 115, 118, 120, 121, 124, 127, 129, 131, 132, 140
    I should point out that much of the data is of the best horses over the past forty years, and small numbers are unreliable.


    occurrence Count Average Diff
    0 15,847 74.36
    1 14,034 76.32 1.96
    2 5,621 79.12 2.80
    3 1,300 85.05 5.93
    4 229 89.69 4.64
    5 23 106.40 16.71
    6 3 91.00 -15.40

    37,057


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    I posted this (username amadán) on Irish Bloodstock Forums
    http://www.irishbloodstock.com/phpBB2/viewtopic.php?sid=236adfb3b370d928596f1c03e1bd7b3c&t=4342

    This is an old thread so this post might be overlooked.

    In Jan/Feb 2017 I wrote programs to compare a horse six generation pedigree with rating.
    To analyse deeper into a pedigree (7,8,9,10,11,12 gens) just needs the program to use a bigger ancestor database (I have those ready).
    The present 6 gen ancestor database is 126 horses (2+4+8+16+32+64).

    Many of these programs were written in earlier years but dusted off and completed after I finished the endless data collection at the end of 2016.
    The pedigree and rating data was collected over 23 years.

    I checked the program results statistically (by writing programs to do chi squared tests).
    The program records 9 sire and 9 dam “factors” in each pedigree, (plus a few other extra factors) and the count of those factors in each pedigree.

    The data was 159,222 horses.
    These were four groups: colts; fillies; geldings; colts & geldings.
    The first tests were for those four sex groups.
    The second tests compared nine groups by racing quality (within the sex groups)
    e.g. compare lowest group with second lowest group, all the way to comparing lowest group to highest group (9 groups is 9 x 8 /2 = 36 comparisons).

    About four of the nine factors are positive (two very), a few neutral, and a couple negative. I need time to review the results files.

    It should be possible to analyse pedigrees in volume and rank horses.
    I recently analysed a sales catalogue of ~450 lots in about 40 seconds.
    This was on my slow PC. My new PC is 3 times faster.
    My new PC will be useful if I want to analyse / test more generations (7,8,9,10,11,12), or do more tests.

    For increases of the count of some factors there is an increase in running rating: 0 count, 1 count, 2 count, and so on. (tested & proved statistically)
    Higher rated groups of horses have more occurences of the positive factors than lower rated groups. (tested & proved statistically)
    Counts go from 0 to 27, but usually up to about 7.

    Average ratings increase may only be a point or so for an increase in factor occurence, but this is averaged over tens of thousands of horses.
    But increasing from a count of 0 of a factor, to 1, to 2, to 3, to 4, gives a ratings increase for each jump in factor count.
    (The other eight factors (or 17 factors) might affect the ratings increase.)

    One of the results files is 5,184 lines.
    The Chi squared test "Accepts” or "Rejects” each group comparison at 5% and 1%
    i.e. a 1% Accept is the positive result has less that a 1% occurrence due to chance.

    This gives an idea of the test volumes (from one test).

    Occ ...Group A ....Group B
    0..........4453.......11394
    1..........3876.......10158
    2..........1505........4116
    3...........317..........983
    4............58...........171


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    I'm staying in the house today waiting for a DHL courier.

    This are an extract from the results, and might be of interest. There are four sex groups and 18+ factors in each, so 72+ pieces of information.

    One piece of information was picked at random (and luckily it is informative)
    Average rating (factor occurence): 80.7 (0); 81.2 (1); 81.9 (2); 82.7 (3); 83.6 (4); 85.4 (5); 87.2 (6); 91.1 (7); 105.1 (8); 75.5 (9); 76.5 (10)
    Average rating increase cumulatively: n/a (0); +0.5 (1); +1.2 (2); +2.0 (3); +2.9 (4); +4.7 (5); +6.5 (6); +9.4 (7); +24.4 (8); -5.2 (9); -4.2 (10)

    You can see there is an increase in rating for each increase in this factor count, from 0 up to 8. Then there is a massive drop for counts 9 and 10.
    One of the earlier groups has over 25,000 horses so the average is reliable, group 9 has only 4 horses and group 10 has 2 horses. Group 6 has 1,496, group 7 has 313, group 8 has 32 horses.

    Some people say only one generation matters, the sire and dam. Sales catalogues show a 3 generation pedigree. I'm working with 6 generations. Is it time to look at 7, 8, 9, 10, 11, 12 generations?


  • Advertisement
  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    I have often wondered if it is possible to purchase an average mare, and breed a good horse.
    The idea is to first breed a filly from the average mare, then to breed another horse that might be useful from that filly.
    A test-mating is a theoretical mating on paper (or computer) of a stallion and mare.
    I prepared test-matings on computer of an average horse picked at random, test-mating her with 393 stallions currently at stud in Irelend, England, France, Germany, Italy.
    Using those 393 test-matings I again test-mated the first offspring with the same 393 stallions. I assumed that the results of the first test-matings were all fillies.
    Of course you would not breed twice to the same sire, e.g. breed to Invincible Spirit, and breed that filly with Invincible Spirit.

    The average horse picked was Mary Sea (2000) by Selkirk out of Mary Astor by Groom Dancer.
    She didn’t win in nine starts.
    Her best Racing Post Rating was 71, and Official Rating 60.
    She has four offspring in my data: Bullyseye Babe (rated 53); Elsie Bay (69); Jamaica Grande (64); Sea Tobougie (48).

    I wanted to see if in two steps it was possible to produce a horse with many of the factors mentioned in my earlier post.
    The test sample was 154,842 (393 x 393)+393.
    The result was: 1 horse (9 count of the above mentioned factor); 2 (8 count); 75 (7 count); 754 (6); 5394 (5); 21701 (4); 45713 (3); 53685 (2); 25222 (1); 2295 (0).
    You can see how difficult it is to produce a good horse (theoretically), and this mirrors the reality.

    A strange outcome is it might be possible to produce a horse with an “8 count”.
    One of the offspring of Mary Sea is the filly Sea Tobougie (2007) by Tobougg out of Mary Sea.
    A test-mating of that combination with the current sire Zambezi Sun (fee €3k) gives an “8 count”.
    Sea Tobougie was only rated RPR 47, OR 40, and failed to win in twelve starts.
    In fact any brooodmare by Toubougg mated with Zambezi Sun would give a similar (not same) result.

    Is it possible to take a very average filly like Sea Tobougie and breed a good horse? It seems unlikely.
    But a Japanese breeder, Kihachiro Watanabe, bought Irish three-year-old filly, Saddlers Gal (RPR 52, nine starts, no wins, no places, earnings £0).
    He bred from her El Condor Pasa, who was second in the Prix de l’Arc de Triomphe (8 wins, 3 seconds from 11 starts).


  • Registered Users, Registered Users 2 Posts: 16,955 ✭✭✭✭Francie Barrett


    diomed wrote: »
    I have often wondered if it is possible to purchase an average mare, and breed a good horse.
    The idea is to first breed a filly from the average mare, then to breed another horse that might be useful from that filly.
    A test-mating is a theoretical mating on paper (or computer) of a stallion and mare.
    I prepared test-matings on computer of an average horse picked at random, test-mating her with 393 stallions currently at stud in Irelend, England, France, Germany, Italy.
    Using those 393 test-matings I again test-mated the first offspring with the same 393 stallions. I assumed that the results of the first test-matings were all fillies.
    Of course you would not breed twice to the same sire, e.g. breed to Invincible Spirit, and breed that filly with Invincible Spirit.

    The average horse picked was Mary Sea (2000) by Selkirk out of Mary Astor by Groom Dancer.
    She didn’t win in nine starts.
    Her best Racing Post Rating was 71, and Official Rating 60.
    She has four offspring in my data: Bullyseye Babe (rated 53); Elsie Bay (69); Jamaica Grande (64); Sea Tobougie (48).

    I wanted to see if in two steps it was possible to produce a horse with many of the factors mentioned in my earlier post.
    The test sample was 154,842 (393 x 393)+393.
    The result was: 1 horse (9 count of the above mentioned factor); 2 (8 count); 75 (7 count); 754 (6); 5394 (5); 21701 (4); 45713 (3); 53685 (2); 25222 (1); 2295 (0).
    You can see how difficult it is to produce a good horse (theoretically), and this mirrors the reality.

    A strange outcome is it might be possible to produce a horse with an “8 count”.
    One of the offspring of Mary Sea is the filly Sea Tobougie (2007) by Tobougg out of Mary Sea.
    A test-mating of that combination with the current sire Zambezi Sun (fee €3k) gives an “8 count”.
    Sea Tobougie was only rated RPR 47, OR 40, and failed to win in twelve starts.
    In fact any brooodmare by Toubougg mated with Zambezi Sun would give a similar (not same) result.

    Is it possible to take a very average filly like Sea Tobougie and breed a good horse? It seems unlikely.
    But a Japanese breeder, Kihachiro Watanabe, bought Irish three-year-old filly, Saddlers Gal (RPR 52, nine starts, no wins, no places, earnings £0).
    He bred from her El Condor Pasa, who was second in the Prix de l’Arc de Triomphe (8 wins, 3 seconds from 11 starts).
    It'd be interesting to do a distribution of the various combinations.

    Good mare/good stallion.
    Average mare/good stallion.
    Good mare/average stallion.
    Average mare/average stallion.

    Maybe you could run a query, get the average ratings of the offspring for each combination?

    I know you've talked about buying a horse, has your data analysis narrowed down the criteria on what you'd be using to select something?


  • Registered Users, Registered Users 2 Posts: 2,702 ✭✭✭tryfix


    diomed wrote: »
    I have often wondered if it is possible to purchase an average mare, and breed a good horse.
    The idea is to first breed a filly from the average mare, then to breed another horse that might be useful from that filly.
    A test-mating is a theoretical mating on paper (or computer) of a stallion and mare.
    I prepared test-matings on computer of an average horse picked at random, test-mating her with 393 stallions
    Is it possible to take a very average filly like Sea Tobougie and breed a good horse? It seems unlikely.
    But a Japanese breeder, Kihachiro Watanabe, bought Irish three-year-old filly, Saddlers Gal (RPR 52, nine starts, no wins, no places, earnings £0).
    He bred from her El Condor Pasa, who was second in the Prix de l’Arc de Triomphe (8 wins, 3 seconds from 11 starts).

    Just on the moderateness of Saddlers Gal. The only thing moderate about her was her performance on the track, she's a blue blood through and through.

    Saddlers Gal is by the mighty Sadlers Wells, out of the mare Glenveagh ( Seattle Slew x Lisadell ). Glenveagh is a half sister to Gp1 winners Fatherland ( National Stakes ) and the mighty Yeats ( multiple GP 1 winner ) both by Sadlers Wells.

    Sending Glenveagh to Sadlers Wells was a matter of sending her to a Stallion that had producded 2 Gp1 winners from that nick, sending Saddlers Gal to Kingmambo was once again repeating the inbreeding to the supremely influential broodmare Special.

    El Condor Pasa was a vindication of brilliant bloodlines, not the result of some random mating.


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    It'd be interesting to do a distribution of the various combinations.

    Good mare/good stallion.
    Average mare/good stallion.
    Good mare/average stallion.
    Average mare/average stallion.

    Maybe you could run a query, get the average ratings of the offspring for each combination?

    I know you've talked about buying a horse, has your data analysis narrowed down the criteria on what you'd be using to select something?
    “Good” and “average” are not easy to define. I tend to use numbers. I produced foal averages for stallions. People would be surprised at how little the averages differ, only a few points. And of course we do not know how many foals were culled. We may just be seeing the best on the racecourse. It is a business.

    A while back I grouped mares into 5 rating points bands (e.g. 1-6, 6-10, 11-15 up to 140+). The higher the dam ratings the higher the average foal rating. I confess I made errors and would have to run it again. The link between sire rating and foal rating was similar. Most sires are 120+. The dam is the weak part of the pedigree, and the dam’s dam the weakest.

    The variation in foal ratings for a sire’s lifetime crop is very large, and most of the variation imo due to good/bad pedigrees instead of the dam rating. For bad pedigree I mean little in common between the horses in the sire and in the dam pedigree i.e. little or no duplications, or male duplications of a sire only. The concept of good sires, good broodmares does not make sense to me. Good horses are the product of good pedigrees that match the ancestors of sire and dam (others may disagree).

    Ratings used might be suspect. They are a combination of well known ratings, the highest gained by the runner as a 3yo or older. Earlier ratings (1960s, 1960s, 1970s) seem to have been reviewed and lowered, so if you take them from old books you might get 135 for a horse, and if you see the same horse now it might be a 127. And 2yo ratings (free handicaps) were used when no 3yo+ rating found. An example here is Fairy Bridge, the dam of Sadler’s Wells. She only ran as a 2yo, was rated 124 (actually stones and pounds) in the Irish Free handicap, and retired to stud.
    The USA experimental free handicap has an upper limit of 126 which might under rate those horses. The introduction of International Classifications helps even out the ratings of the major racing countries. Some countries may have been a optimistic in the past, and their ratings useful in promoting local bloodstock.

    Has my analysis narrowed down my criteria?
    (1) avoid numerous duplications of a sire that produces only male offspring e.g. Northern Dancer, especially if these are the only or the majority of duplications in the pedigree.*
    (2) if buying to breed buy fillies with horses in their 3rd, 4th 5th generation that are full siblings of horses in the pedigrees of stallions at stud (or ¾, 7/8 siblings). These are not as common as you might think.
    (3) breed the filly on paper with all sires at stud before you buy her (test-matings).

    * I started to record the sire lines for each of the 393 stallions now at stud, and the sire line of the dams of those 393 stallions. After 22 stallions I stopped to do other work. 17 of those 22 stallions were Northern Dancer sire line, and 12 of their dams were Northern Dancer sire line (9 had both). Careful pedigree planning is needed to avoid this.

    The example I gave of El Condor Pasa should be examined. His dam Saddlers Gal has the full siblings Special and Lisadell (both By Forli out of Thong) in her pedigree 3 x 2, too close to make her a good runner, but this close inbreeding often makes the mare a good producer. The Japanese breeder knew what he was doing. He bred her to the sire Kingmambo, who also has Special in his pedigree.

    The next work is to go beyond the basic 6 generation analysis and prepare something that goes deeper into the pedigree, isolating the features that top males have and top females have (identified visually on screen). The analysis so far was a ready reckoner.


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    Saddlers Gal sold for 22,000 guineas as a yearling in 1990, the lowest price for a Sadler's Wells yearling that year. Then she ran nine times with a best placing 5th of 6. I can't find her sales price but I think she might have been entered in a Tattersalls mare sale without selling. My guess is she sold to her Japanese owner for a lot less than her original sale price.


  • Advertisement
  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    Just got knocked out of a tournament on PokerStars (5:20 am). Time for a cup of coffee and a donut.

    Found this on the internet a few minutes ago

    Takashi Watanabe's interest in horseracing is driven by his knowledge of pedigrees which resulted in his breeding of El Condor Pasa, the newest Japanese star to shine in Europe. The owner of a trucking and transportation company in Tokyo, Watanabe was introduced to the sport by his father.
    In 1992, he commissioned an agent to attend the Tattersalls December Sales to purchase the mare Saddlers Gal-a daughter of Sadlers Wells-who had failed to win in nine starts in Ireland.
    The mare was withdrawn from the catalogue, but he was so determined to secure her that the representative, Morio Sakurai, was ordered to track her down.
    ''Mr Watanabe asked me to find her,'' said Sakurai. ''I located her on a farm in Ireland and he told me to buy her.''
    The subsequent mating between Saddlers Gal and the top French miler Kingmambo was to produce El Condor Pasa, who Watanabe named after a Simon and Garfunkel hit.
    The colt was placed with Yo****aka Ninomiya and last year became the first three-year-old to win the Japan Cup, doing so by two and a half lengths-the widest margin ever.
    Ninomiya, a trainer since 1990, has a team of just 10 horses.
    Approaching 50 years of age, he had previously served as assistant to renowned horseman Teruo Hashimoto for 12 years.
    But he has also gained valuable experience for El Condor Pasa's current international programme during spells with Sir Michael Stoute in Newmarket and with US trainer Bruce Headley in California.

    * Yo****aka (Yosh1taka)


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    I analyse a number of things in pedigrees, including duplications.
    This is an example of a six generation pedigree with duplicated horses in colour.
    Some of the analysis factors I mention would be what you see here.

    Modern pedigree are much less inbred, often with three or four highlighted horses. Here over twenty are highlighted.
    The Tetrarch, foaled in Co Kildare, raced in England and was unbeaten.
    Note that there is nothing duplicated in the first three generations. Sales catalogues show three generations.
    A picture is worth a thousand words.

    6034073


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    This is the pedigree of Mary Sea, the average filly I chose for the test-mating experiment mentioned in previous posts.
    Note the few connections between the pedigrees of her sire Selkirk and dam Mary Astor.

    6034073


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    My plan now is to mix it up a bit and work on the ideas in my ideas file.

    I've always thought that the lower down in a pedigree the weaker the pedigree.
    For example, in the above pedigree of Mary Sea (rated (71) the dam line of Mary Astor (86), Djallybrook (105), Hollybrook (rating not found), La Vagabonde (non runner) was not too weak. This area is often a lot weaker than the rest of the pedigree, full of minor sires.

    I'm going to find out if possible the average rating of all the 126 positions (2+4+8+16+32+64) in the six generation pedigrees in my files. I'm not sure what good that is, but if I redo it for different ratings bands e.g. (0-20, 20-40, 60-80, 80-100, 100-120, 120+) there might be a lesson.


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    These are the average ratings for each pedigree position for a large number of pedigrees.
    Shown are average rating & number of rated horses averaged for that position.
    What is amazing to me is that the average for sires in the 3rd, 4th, 5th, 6th generation are all over 130.
    The 1st generation at 124.3, and 2nd generation at 129.6 and 127.6 suggest that many of the current horses will not survive in future pedigrees.
    Of course many of the current horses making up the 78.9 average are geldings or colts that will not be used for breeding.
    Lower counts for dams probably reflect: unraced dams; dams of one foal and rating not found / investigated; imported dams from USA and other.
    The lowest average dam ratings are on the bottom dam line as anticipated.

    6034073


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    Numbers in above pedigree chart

    Rated horses: 159,230
    Ancestors: 20,213,071
    (159,230 x 127 = 20,222,210)
    (difference of 9,139 is incomplete pedigrees, non-thoroughbreds)

    Rated ancestors: 10,132,337
    Rated sire ancestors: 9,190,434
    Rated dam ancestors: 941,903
    (Unrated sire ancestors: 836,575)
    (Unrated dam ancestors: 9,244,159)

    Sire & dam ancestors born before 1960: 15,720,422 (77.77%)
    (ratings unlikely before 1960)
    Rated sires before 1960: 7,099,930
    Rated dams before 1960: 61,687
    Rated sires 1960 and after: 2,090,504
    Rated dams 1960 and after: 880,216

    Count of Northern Dancer 203,614 (2.03% of male ancestors)
    Count of Hyperion 174,978 (1.75%)
    Count of Native Dancer 246,062 (2.45%)
    Count of Nasrullah 242,636 (2.42%)
    Count of Princequillo 121,707 (1.21%)
    Count of Sadler’s Wells 18,648 (0.19%)
    Count of Galileo 1,446 (0.01%)

    Count of horses rated 130+: 7,650,747 (number of horses rated 130+: 1,845)
    Count of horses rated 120-129: 1,559,099
    Count of horses rated 110-119: 308,594
    Count of horses rated 100-109: 130,723
    (explanation: horses rated 130+ produced many offspring, and their sons and daughters also produced many offspring)


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    These are a few more ideas that are work in progress.

    PREPOTENT
    You often hear about prepotent horses (showing great effectiveness in transmitting hereditary characteristics to its offspring).
    The term chef-de-race is used for these horses in the Dosage Index calculation. They are super sires.
    The Dosage Index calculation uses chef-de-race sires in the first four generations of a pedigree.
    Some of these sires have sired winners of great races.
    Is there evidence that their influence lasts many generations? I have not seen the analysis.

    I have a big file of the six generation pedigrees of 159,230 horses.
    The file is about 20 million lines (159,230 x 127).
    The idea I have is to give all the 127 horses in a six generation pedigree the same rating.
    I’ll use the example of Mary Sea (2000) above to explain. Her rating is 71.
    I’ll give her sire Selkirk 71, and her dam Mary Astor 71, and all the other horses in her first six generations 71 for this exercise.
    Selkirk is actually 129 and Mary Astor 86.

    Selkirk will get a different rating for every pedigree he appears in as a sire, whatever generation he is in.
    For Mary Sea he gets 71, for Altieri he gets 122, for Emerging (by Mount Nelson) he gets 90 (Selkirk is in the pedigree of Mount Nelson).
    At the end of the work all the Selkirk ratings are averaged, and Selkirk gets a number.
    The average of all the 159,230 horses is about 80.
    If a sire is “prepotent” his average should pop up above all others like a cork in water.
    His influence will have boosted the rating of every pedigree in which he appears.
    This should work for dams too, although they appear much less in pedigrees as they produce fewer offspring.
    Sires and dams that appear in only a few pedigrees will be filtered out of the results.
    I have about half the work done on this, the difficult part.


    AGE
    In this week’s Irish Field trainers and breeders were asked “would they buy the produce of an old mare”. A variation on this is do you believe birth order is important.

    I could easily produce information on these questions, but I think other factors are ignored in these studies.
    I’ve seen a study where the sample was only a few hundred, and the conclusion was the status quo: earlier birth rank is better; younger broodmares are better.

    My guess is the owner of a new broodmare sends his expensive purchase to the best sire he can afford, and probably repeats this for a few years.
    After a few disappointing foals he loses faith in the mare (and loses his money) and sends her to cheaper stallions, and still cheaper stallions until the end of her productive life.
    Have you every heard of a mare owner starting her out with the cheapest stallion, and sending her to the champion sire in her old age?

    I am interested not in the age of the sire and dam but in the dates of birth in the pedigree.
    A few numbers might throw some light on this.
    These are the average age of all horses (by generation) in my large data sample.
    There are more horses in the 6th generation so I'll say the average age of all in about 10.5 years.
    I'm not using this average age for anything at present. I just thought people might be interested in it.

    Gen Age
    0
    1 11.25
    2 10.02
    3 10.00
    4 10.11
    5 10.34
    6 10.51



    The table below needs a bit of explaining.
    These eight horses (A B C D E F G H) are four super sires, three disappointing sires, and one unproven recent sire.
    Five of the sires were excellent on the racecourse, but two of these disappointed at stud.
    One of the super sires was a disappointment on the racecourse.

    The numbers are the dates of birth of horses in the same pedigree positions, in the same generation of their pedigrees.
    The first date of birth (DOB) is in the first quarter of the pedigree, the next in the second quarter (1st and 2nd quarters are in the sire pedigree).
    The last two DOBs are in the dam pedigree of these horses.
    I could have listed all the DOBs in that generation but am using four only for demonstration.

    A B C D E F G H
    Sire's sire 1/4 1925 1935 1913 1920 1913 1935 1935 1920
    Sire's dam 1/4 1920 1961 1935 1919 1935 1935 1927 1935
    Dam's sire 1/4 1908 1954 1913 1942 1942 1919 1919 1937
    Dam's dam 1/4 1909 1954 1913 1934 1930 1917 1928 1935


    Comments:
    A) notice the imbalance between the 1925, 1920 and the 1908, 1909 in the dam side. This is like the Leaning tower of Pisa.
    My untested theory is that this imbalance often acts like an outcrossed pedigree, and might be a reason for the high ability.
    But my idea is this imbalance is a negative when that horse goes to stud.
    Half his pedigree doesn't match the pedigree of his mares because it is too old.
    B) Unproven sire as yet. Excellent runner.
    C) A super runner and super sire. Good DOB balance for three of the four ancestors.
    D) A very good runner, but a major disappointment as a sire.
    The sire side of his pedigree is much older, about twenty years.
    E) A high class runner and super sire. One horse is a good bit older (1913 DOB), but three are about the same age.
    The big worry is when the two on the sire side, and the two on the dam sire differ greatly.
    F) This was a very good juvenile, but a failure as a sire.
    Again note the large gap between the sire side (1935, 1935) and the dam side (1919, 1917).
    G) A poor runner but a sensational sire.
    H) A good runner, not a top runner, and a very successful sire.

    I haven't given the name of the horses as I don't want to offend people.
    This is just an idea from looking at many pedigrees, and trying to figure out why some very good horses didn't make the grade as sires.
    These sires and runners did not get their class just from a few matching dates of birth.
    My point is unmatched DOBs might be a negative.

    It might be a tricky task to program and test this. How do I extract "bad" DOBs?


  • Registered Users, Registered Users 2 Posts: 2,484 ✭✭✭Peintre Celebre


    Diomed re a good horse from a bad mare. Hyperion's grand dam won a seller


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    Diomed re a good horse from a bad mare. Hyperion's grand dam won a seller
    Good horses have run in sellers (and bad ones).

    Virago (1851) ran in a seller as a two year old and lost.
    In John Porter's autobiography he says " I have in my time seen many great fillies, but I regard Virago as perhaps the greatest of them all".
    Virago won the English 1000 Guineas as a 3yo.
    John Porter trained 21 English Classic winners.

    "The filly's participation in the event (a 2yo selling race) was a colossal piece of bluff, the purpose of which was to deceive those whose duty it was to frame the big handicaps of the following spring. .... "entered to be sold for £80, a bit of bunkum ... Day's head man accompanied the filly to the starting-post, ostensibly with a view to ensuring her getting well away ..... Goater appeared to be taken by surprise when the starter dropped his flag, and Virago was "left" a long way behind the others. She of course finished "nowhere" as intended. .... She was not among the first three, though she could have carried eleven stone and won. She could not have been bought for £5,000."

    I have many horses in my data with the runner rated over 100 and its dam in the 20s. Often a dig into the low rating shows a horse who ran well but disappointed many times later and ended the year with a low rating (that I've used) although it ran at a higher rating during the year. The most unusual case was a rating of 3 as a three year old. The horse finished its two year old racing with a 99 rating, but finished last (?) every time as a 3yo and was retired.

    I use very large volumes of data in an attempt to even out the questionable ratings.


  • Registered Users, Registered Users 2 Posts: 8,609 ✭✭✭Mooooo


    Totally out of my field here but in dairy breeding genomics is really taking hold, would each foal not be genomically tested before sale, and is there an index at which those figures are judged on in terms of performance and heritable traits. Ireland has EBI, economic breeding index, for cows which has subindexes given different weightings for milk, fertility, etc. NZ has similar with different weightings called the BW, breeding worth. I dunno the data available for horses in terms of performance, accuracy etc but Is there not somethings similar for horses and if so how does it fit in with your figures?


  • Advertisement
Advertisement