Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Analysing pedigrees

  • 28-08-2016 5:53pm
    #1
    Closed Accounts Posts: 4,744 ✭✭✭diomed


    My interest is thoroughbred flat pedigrees. I don’t own a horse or work in the business. Occasionally I have a bet.
    It would be nice to know before a horse is born or before it reaches the racecourse how good it might be. Crazy stuff?
    But people who breed horses are guessing. They breed to what is or may be selling/popular or what they think will perform on the racecourse.

    My pedigree database has almost 400,000 horses input by me over more than twenty years. Other databases contain horse performance data. Is it possible to compare horse pedigrees with performance data and learn? About a month ago I finished plugging holes in my data. The next move was writing programs to compare pedigree with performance. I’ve done a bit of that in the past. I can analyse a pedigree by sight (I think) by looking at six generations and up to twelve generations. Six generations is 126 ancestors while twelve is 8190 (2,4,8,16,32,64 … ).

    There was a thread on the Betfair forum last week looking for people who might be interested in a syndicate to buy a yearling at the Tattersalls September Sales. I said I would join, others did too, but it was the usual result. No action. While waiting for people to join the syndicate I wrote a computer program to analyse the 533 Tattersalls yearling lots. It took eight days to write the program and knock out the bugs. The program compares horses in the pedigree with each other.
    Just for comparison a few extras were added to the analysis pot: 533 Tattersalls yearlings; 190 Ireland and England 2015 Group 1, 2, 3 winners, a few great horses.
    A second analysis run looked at about 200 offspring of one sire (I had ratings for these).
    The results are interesting. Some refining of the output is needed. More horses analysed will produce more output and I hope better insight.

    I did not anticipate one of the problems. It takes almost one minute to analyse one horse. That might seem quick. I want to analyse 200,000 horses (200,000 minutes = 3333 hours = 139 days). Starting now the results file will be ready on 13th January assuming my PC could keep running 24/7 until then.
    This afternoon I made my first attempt to speed things. I’m reducing the size of the data files by reducing the size of each horse [name + date of birth]. COLOR="Blue"]Angrywhitepyjamas (2013)[/COLOR by COLOR="Purple"]Manduro (2002)[/COLOR out of COLOR="SeaGreen"]Ornellaia (2000)[/COLOR becomes COLOR="blue"]AVEZ[/COLOR COLOR="purple"]LXXP[/COLORCOLOR="seagreen"]OKSM[/COLOR . This reduces the data to below 1/7 of the original size. I’m hoping this speeds processing. Sprinkling magic powder on the code [AVEZ] will restore that name in the results file.

    I know horses may not run to their pedigree. There is a lot of randomness. Full siblings can be quite different. That is why I will use a very large data sample to try to tune out the randomness.
    Benefits might be: select the best in a race field / annual crop; pick the best lots in a sale; choose the best stallion for a broodmare; test if a new sire will suit his broodmare population.

    What do you think? Any ideas, suggestions, insights?
    (answers should not include the words “box of frogs”.)


«134

Comments

  • Registered Users Posts: 2,702 ✭✭✭tryfix


    You're operating at a rarified level there Diomed, I don't pretend to understand your process but maybe you should do this in stages working on the last few years of runners and their pedigrees first and then adding in the older pedigrees and refining your results over time. There's only so much use in going back over older pedigrees and it's quite possible that they'd mislead you as racing patterns and training methods have evolved quite a bit.


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    The application of numbers to performance is new, perhaps from the 1960s onwards. I’m analysing current horses using their ancestors.
    Obviously if I analyse a horse born in 1990 its 5th generation ancestors could be born in around 1930 (5 x 12 years = 60 years. 1990-60 years = 1930).

    Some people say only the sire and dam matter. If that was true should brothers be exactly alike, and sisters exactly alike i.e. identical twins?
    It might be best to breed to the ancestors, not just the parents.

    The idea is to take each ancestor in a pedigree and compare it with every other ancestor.
    Extract those matches that have something of interest, count them and give them a value.
    List the counts and values in an output file.
    Then load the next horse for analysis.

    Later I go to the output file and split each horse a few ways: number and value of matches; by male/female; by generation; sire side/dam side.
    More matches might be best, but some results suggest balance between matches is better i.e. an excess of one type is a negative.
    I can’t be too firm with opinions yet as I need to analyse in volume first.

    Using code for names is an effort to increase processing speed. Like taking speed off a horse’s back I doubt there will be a straight line benefit.
    The first horse in the database is AAAA, then AAAB, AAAC and so on.
    A 26x26x26x26 = 456,976 database of 4 digit codes is used as substitutes for real names (more than enough to cover my 390,000 pedigree database.)

    One comforting thing is one of the very best horses ever is among the top rated, and one of the current top horses is also in the top few.

    One collection of filters reduces the 894 analysed horses to only 27 horses.

    The Tattersalls September sales yearlings (533) reduce to 19.
    The other (361) reduce to 8 as follows:
    2015 Group winners Ireland and England (188) reduce to 6 (includes one smasher).
    Famous horses (10) reduce to 2 (actually 1 of the 10 was a famous dud)
    Sons of one sire (163) reduce to 0.

    This "collection of filters" was done on the fly this morning and was just a guess. More output and thought will refine the choices.


  • Posts: 0 Lincoln Ashy Cub


    diomed wrote: »
    Some people say only the sire and dam matter. If that was true should brothers be exactly alike, and sisters exactly alike i.e. identical twins?
    It might be best to breed to the ancestors, not just the parents.

    Sounds like an interesting project. If way over my head.

    Surely however the sire and the dam ARE all that matters for any given horse? That's where all the genetic material comes from. Of course ancestries can make one sire and one dam a particularly good match but that's a different matter.

    Your argument about full siblings being identical doesn't make any sense - if they share a sire and dam then they are going to share ancestry all the way back?

    The fact that they are not identical is just biology, in each case the genetic material is mixed differently, that is why breeding isn't an exact science.

    Good luck anyway! I guess the next step is to evaluate a crop of yearlings and make some sort of performance prediction and see if it works?


  • Banned (with Prison Access) Posts: 1,279 ✭✭✭kidneyfan


    I don't think that you should necessarily reduce the data files. Have you looked at indexing the database? 400,000 records isn't many.
    One thing I would suggest might be to apply the standard dosage method to your dataset and contrast your own method with that. etc


  • Registered Users Posts: 2,702 ✭✭✭tryfix


    diomed wrote: »
    The application of numbers to performance is new, perhaps from the 1960s onwards. I’m analysing current horses using their ancestors.
    Obviously if I analyse a horse born in 1990 its 5th generation ancestors could be born in around 1930 (5 x 12 years = 60 years. 1990-60 years = 1930).

    Some people say only the sire and dam matter. If that was true should brothers be exactly alike, and sisters exactly alike i.e. identical twins?
    It might be best to breed to the ancestors, not just the parents.

    The idea is to take each ancestor in a pedigree and compare it with every other ancestor.
    Extract those matches that have something of interest, count them and give them a value.
    List the counts and values in an output file.
    Then load the next horse for analysis.

    Later I go to the output file and split each horse a few ways: number and value of matches; by male/female; by generation; sire side/dam side.
    More matches might be best, but some results suggest balance between matches is better i.e. an excess of one type is a negative.
    I can’t be too firm with opinions yet as I need to analyse in volume first.

    Using code for names is an effort to increase processing speed. Like taking speed off a horse’s back I doubt there will be a straight line benefit.
    The first horse in the database is AAAA, then AAAB, AAAC and so on.
    A 26x26x26x26 = 456,976 database of 4 digit codes is used as substitutes for real names (more than enough to cover my 390,000 pedigree database.)

    One comforting thing is one of the very best horses ever is among the top rated, and one of the current top horses is also in the top few.

    One collection of filters reduces the 894 analysed horses to only 27 horses.

    The Tattersalls September sales yearlings (533) reduce to 19.
    The other (361) reduce to 8 as follows:
    2015 Group winners Ireland and England (188) reduce to 6 (includes one smasher).
    Famous horses (10) reduce to 2 (actually 1 of the 10 was a famous dud)
    Sons of one sire (163) reduce to 0.

    This "collection of filters" was done on the fly this morning and was just a guess. More output and thought will refine the choices.

    On the question of full brothers being exactly alike, that's the supposedly logical outcome of any attempt to define racehorse potential purely on pedigree analysis. Of course the actual reality of the abilities displayed by full brothers is something else and in that conundrum lay some useful pointers.

    From my own observations I can see a roughly 2 furlong spread either side between the trip that a horse should be optimum at and what it actually is optimum at in the highest class racing. So Frankel the absolute freak that he was could pull the Galileo x Kind cross down to 8f optimum while his brother Noble Mission was optimum at 10f and could manage 12f just a little less well.

    That natural individual variation from what should be for the optimum trip based on pedigree would explain the differences between full brothers.

    It's great that you're doing your own dosage index because the official one is quite poorly thought out. The international differences between the differering speed and stamina tests that the US, Australia and Europe provide make it next to impossible to accurately produce a dosage figure for how much stamina is in a line which hasn't been tested for stamina at the top level for a few generations.

    I'd be interested to know what your dosage index is predicting as the staying distance for some of the Frankel's that are hitting the track, because the official DI score of many of these horses is 12f+ territory but we're not seeing any great staying on performances from his stock so far.


  • Advertisement
  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    Surely however the sire and the dam ARE all that matters for any given horse?
    Foal A has two parents rated 110 and four grandparents rated 90.
    Foal B has two parents rated 110 and four grandparents rated 130.
    Which foal would you pick assuming both foals look the same?

    I much prefer a foal with good ancestors instead of a foal with good parents.
    Good parents are an indicator that the foal may be good, but there are many examples of top horses that were no good at stud: Reference Point; Brigadier Gerrard.
    A good horse is imo a product of a sire and dam that have ancestors that produce a pleasing result.
    Someone asked me on boards if a stallion (he named the stallion) was a good match for his mare.
    This stallion was a very good runner (rated 130+).
    When I checked his full crop (almost 200 rated horses) I found that the majority of his foals were rated significantly lower than the dams that produced them.
    The reason was obvious to me when I looked at the pedigree of each foal in turn. His ancestors were incompatible with the broodmare population he was serving.
    Your argument about full siblings being identical doesn't make any sense - if they share a sire and dam then they are going to share ancestry all the way back?
    They aren’t identical. I said “should they be alike?” They are similar but different. They share some genetic material. If you look at human brothers they are different.
    Identical pedigrees do not make identical horses. However, closely related horses in a pedigree increase the chance that their superior (hopefully) traits are passed forward.
    Frankel's stud fee is UK£125k. His full brother Noble Mission's fee is $25k (£19.1k). That is a big gamble that Frankel passes on the racing ability.
    Good luck anyway! I guess the next step is to evaluate a crop of yearlings and make some sort of performance prediction and see if it works?
    I have plenty of data to check the past.
    It is easier to check twenty years of past data over the winter than to wait for next year's runners to complete their season.
    If I can find reasonable cause and effect then I’ll look forward.

    Forgive me for multiquoting (one of my pet hates on boards.ie)


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    kidneyfan wrote: »
    I don't think that you should necessarily reduce the data files. Have you looked at indexing the database? 400,000 records isn't many.
    One thing I would suggest might be to apply the standard dosage method to your dataset and contrast your own method with that. etc
    I use indexing extensively. What "slows" the program is the amount of comparison (I'm not saying what with what).
    It does the comparisons for 894 horses, stepping down through the load file one at a time.

    Do you mean Dosage Index? I wrote a program for that in 1993. It is not worth considering. See elsewhere on board for my comments.


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    tryfix wrote: »
    I'd be interested to know what your dosage index is predicting as the staying distance for some of the Frankel's that are hitting the track, because the official DI score of many of these horses is 12f+ territory but we're not seeing any great staying on performances from his stock so far.
    My attempt is guessing the quality of a horse, not the likely staying distance.

    For stamina estimation you should check the Equinome C.C, C.T, T.T classification system.
    You can only be sure of a foal's stamina (or lack) if both his parents are C.C (the foal will be a C.C), or if both the parents are a T.T (the foal will be a T.T), of if one parent is a C.C and the other parent is a T.T (the foal is a C.T).
    If one parent is a C.C and the other a C.T then the foal can be a C.C or a C.T.
    If one parent is a C.T and the other a C.T then the foal can be a C.C, a C.T, or a T.T.
    C.C = sprinter, T.T = stayer

    From reading past research papers by Dr Emmaline Hill it seems that all C.C horse are descended from one mare about 300 years ago (the Crab mare?), while there were eleven (?) variations of T.T.
    The paper said that made sense when you remember that races were four mile heats between two horses, and the two often ran a few four mile heats on the same day. Staying was the game back in the day.


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    There wasn't much improvement from reducing the data field size, about 10% to 53 seconds a record.
    I'm not even sure there was an improvement as I didn't record the timings from earlier.
    I might go for a cut down program that looks at less of the pedigree, identify likely candidates, then run the full version on those.

    The other option is to do what they do on the cycling forum. If someone gets a puncture people recommend a new bike. Perhaps a new PC?


  • Banned (with Prison Access) Posts: 1,279 ✭✭✭kidneyfan


    diomed wrote: »
    There wasn't much improvement from reducing the data field size, about 10% to 53 seconds a record.
    I'm not even sure there was an improvement as I didn't record the timings from earlier.
    I might go for a cut down program that looks at less of the pedigree, identify likely candidates, then run the full version on those.

    The other option is to do what they do on the cycling forum. If someone gets a puncture people recommend a new bike. Perhaps a new PC?
    Perhaps a new database? What are you using currently ? if Access, Filemaker, Foxpro? If any of the above think about changing.


  • Advertisement
  • Registered Users Posts: 261 ✭✭show me the money.1


    Amazing dedication I call it


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    kidneyfan wrote: »
    Perhaps a new database? What are you using currently ? if Access, Filemaker, Foxpro? If any of the above think about changing.
    I'm not enthusiastic about the Microsoft model, a yearly charge for operating system and applications, or going that way.

    I use MicroSoft FoxPro, and support stopped for that in 2015.
    The commercial pedigree software package I use to input pedigrees uses .dbf files. Their pedigree software moved to Access files about ten years back. At that time I was using Access in work and found it a bit weak so didn't move. The problem with (their) pedigree software was poor data, and they like to update your data with their data. I have 800 books - stud books, pedigree analysis, form books, so can look up things.

    I fix problems in my data by running small programs in FoxPro e.g. a female as a sire, an incomplete pedigree (e.g less than 31 ancestors in 4 generations 1+2+4+8+16), missing Bruce Lowe family numbers, duplicate entries (two slightly different names, with same sire and dam).

    My PC is a about seven years old, a gaming type PC. Last night I tested it on a website that ranks PCs. It was in about the 45th percentile for CPU which was ok, but only the 6th for graphic. Graphics are not important. Perhaps I need something with a faster CPU, caches, faster transfers.

    A bit of an update.
    Overnight I let the PC work on about 250 pedigrees for horses born in 1999 that had ratings in my files.
    Yesterday I did a lot of work on the results file, setting up a name field (the horse) and 28 results fields for that horse, and automating.
    The results file groups the numbers in a few ways: pedigree top half/bottom half; male/female; which generation; and fourteen fields of how complex the link between horses.

    Earlier I thought the numbers looked good.
    When I later looked at the numbers of the 250 sample I was disappointed. I used Pearson in Excel for a quick test on the many output fields.

    Overall it was a bit positive - the higher my number the higher the rating. This morning I looked at it again and decided to split the 250 horses into fillies, colts, geldings.
    (The categorisation of males in my database colts/geldings can be a bit off. A colt can become a gelding.)

    Anyway the 78 fillies came out with a much stronger positive relationship between my numbers and their ratings, the colts mildly positive, and the geldings slightly negative.
    This might be due to the small sample size. It might be that only promising females run, less promising females have a breeding career, and poor male runners are gelded.

    There are other possibilities:
    fillies showing poor ability are culled before they reach the sales/racecourse
    males with both poor and good pedigrees are sent to the sales as they are more saleable (59% of the 1999 dob sample were male (colts and geldings)

    I need to think about the careers of horses:
    This one year example gives an idea of the data.
    The file has 7826 horses (m,g,f) with a 1999 date of birth (dob)
    4640 females (2327 have no rating, presumable used only for breeding)
    1108 males (183 have no rating)
    2078 geldings (1 has no rating, is he an input error?, he has no dam, and is not in the books)

    Careers: females ran 2313 (29.6%); females no rating 2327 (29.7%); males ran 925 (11.8%); males no rating 183 (2.3%); geldings ran 2077 (26.6%); geldings no rating 1 (0.0%) = 7826 (100.0%)
    It might be an idea to compare the racing females with the breeding females.

    A slightly worrying result in the small sample size was the further back in the pedigree the better the relationship between my numbers and the ratings. I hope a bigger test will disprove this. I don't want to analyse more of the pedigree of each horse. It takes 53 seconds each horse now. Going back another generation doubles the data, and doubles the time.

    I have a file of ideas and I'll be working on those too.


  • Moderators, Sports Moderators Posts: 9,338 Mod ✭✭✭✭convert


    You've really undertaken a huge project there, diomed. The amount of detail acquired and entered is just frightening, yet the details must be fascinating.

    Have you had a look at Dr Emmeline Hill's research on equine genomics? It's focusing more on the gene sede of things, but it would be interesting to see if just looking at your info can provide similar info.

    http://www.the-scientist.com/?articles.view/articleNo/31893/title/Emmeline-Hill--Genes-for-Speed/

    http://www.ucd.ie/agfood/staff/animalcropscience/academic/dremmelinehill/


  • Registered Users Posts: 16,352 ✭✭✭✭Francie Barrett


    About 3/4 of the horses in the Tatersalls catalogue would likely be out of my league. I think the below two might potentially be interesting at a reasonable price.

    http://www.tattersalls.com/cat/october/2016/283.pdf
    http://www.tattersalls.com/cat/october/2016/279.pdf

    Diomed - can you tell me what your computer says about the Intello progeny? It is potentially an interesting sire.


  • Banned (with Prison Access) Posts: 1,279 ✭✭✭kidneyfan


    Hi Diomed,

    400,000 records is significant but not huge. I would seriously consider a database like Mysql or Postgre. There is a free edition of microsoft SQL available.

    It might be worth your while to look at an application like R.

    I would suggest that you post this question to the software development forum or database design forum (if there is one).

    How much RAM is on your PC?

    diomed wrote: »
    I'm not enthusiastic about the Microsoft model, a yearly charge for operating system and applications, or going that way.

    I use MicroSoft FoxPro, and support stopped for that in 2015.
    The commercial pedigree software package I use to input pedigrees uses .dbf files. Their pedigree software moved to Access files about ten years back. At that time I was using Access in work and found it a bit weak so didn't move. The problem with (their) pedigree software was poor data, and they like to update your data with their data. I have 800 books - stud books, pedigree analysis, form books, so can look up things.

    I fix problems in my data by running small programs in FoxPro e.g. a female as a sire, an incomplete pedigree (e.g less than 31 ancestors in 4 generations 1+2+4+8+16), missing Bruce Lowe family numbers, duplicate entries (two slightly different names, with same sire and dam).

    My PC is a about seven years old, a gaming type PC. Last night I tested it on a website that ranks PCs. It was in about the 45th percentile for CPU which was ok, but only the 6th for graphic. Graphics are not important. Perhaps I need something with a faster CPU, caches, faster transfers.

    A bit of an update.
    Overnight I let the PC work on about 250 pedigrees for horses born in 1999 that had ratings in my files.
    Yesterday I did a lot of work on the results file, setting up a name field (the horse) and 28 results fields for that horse, and automating.
    The results file groups the numbers in a few ways: pedigree top half/bottom half; male/female; which generation; and fourteen fields of how complex the link between horses.

    Earlier I thought the numbers looked good.
    When I later looked at the numbers of the 250 sample I was disappointed. I used Pearson in Excel for a quick test on the many output fields.

    Overall it was a bit positive - the higher my number the higher the rating. This morning I looked at it again and decided to split the 250 horses into fillies, colts, geldings.
    (The categorisation of males in my database colts/geldings can be a bit off. A colt can become a gelding.)

    Anyway the 78 fillies came out with a much stronger positive relationship between my numbers and their ratings, the colts mildly positive, and the geldings slightly negative.
    This might be due to the small sample size. It might be that only promising females run, less promising females have a breeding career, and poor male runners are gelded.

    There are other possibilities:
    fillies showing poor ability are culled before they reach the sales/racecourse
    males with both poor and good pedigrees are sent to the sales as they are more saleable (59% of the 1999 dob sample were male (colts and geldings)

    I need to think about the careers of horses:
    This one year example gives an idea of the data.
    The file has 7826 horses (m,g,f) with a 1999 date of birth (dob)
    4640 females (2327 have no rating, presumable used only for breeding)
    1108 males (183 have no rating)
    2078 geldings (1 has no rating, is he an input error?, he has no dam, and is not in the books)

    Careers: females ran 2313 (29.6%); females no rating 2327 (29.7%); males ran 925 (11.8%); males no rating 183 (2.3%); geldings ran 2077 (26.6%); geldings no rating 1 (0.0%) = 7826 (100.0%)
    It might be an idea to compare the racing females with the breeding females.

    A slightly worrying result in the small sample size was the further back in the pedigree the better the relationship between my numbers and the ratings. I hope a bigger test will disprove this. I don't want to analyse more of the pedigree of each horse. It takes 53 seconds each horse now. Going back another generation doubles the data, and doubles the time.

    I have a file of ideas and I'll be working on those too.


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    Have you had a look at Dr Emmeline Hill's research on equine genomics?
    I am familiar with her work and the work of Prof Patrick Cunningham.
    Years ago Prof Cunningham analysed all horses in a few editions of the General Stud Book comparing their ratings with their pedigrees and found 1/3 of performance could be explained by pedigree.
    I read in the Bloodstock Breeders Review annuals (30 years ago?) about the Irish Thoroughbred Breeders Association (?) funding the research.

    Her work on the speed gene and the quality test is very useful to owners for racing now, and probably of great benefit in picking mares to use for breeding.
    That research will produce more discoveries in time.
    I wonder if the lots on offer at horse auctions in the future will be the genetically proven duds.

    I don't have a blood sample for each of 200,000 horses, living and dead, and the €500 fee each to get them tested.
    My results (if I get any results) will be guesswork. Her results are facts.
    About 3/4 of the horses in the Tatersalls catalogue would likely be out of my league. I think the below two might potentially be interesting at a reasonable price.
    Can you tell me what your computer says about the Intello progeny? It is potentially an interesting sire.
    For the past week or so I've been gathering more data on about 180k horses so I won't be stopping this work.
    I never have an opinion on one sire, only an opinion on how his pedigree works with the mare population to which he is bred.
    To expand on that, I wouldn't be a fan of the Frankel stud fee when you can get the same ancestors with his full brother Noble Mission.
    I've started a re-write of the analysis I did a few weeks back. I'll use a 12 generation pedigree instead of a 6 generation. That looks like slower work (12 v 6) but I'll do less checking.
    400,000 records is significant but not huge. I would seriously consider a database like Mysql or Postgre. There is a free edition of microsoft SQL available.
    It might be worth your while to look at an application like R.
    I would suggest that you post this question to the software development forum or database design forum (if there is one).
    How much RAM is on your PC?
    I'll speed things by re-writing to reduce the checking, and by re-using some of the work
    e.g. for the 1,650 Sadler's Wells offspring in my database I'll just generate the SW pedigree once and the dam pedigrees of the 1,650 foals instead of each pedigree fully each time.

    I assume you mean ... R version 3.2.3 (2015-12-10) The R Foundation for Statistical Computing ...
    No I have it but have not used it yet. Data collection takes up most of my time.


  • Banned (with Prison Access) Posts: 1,279 ✭✭✭kidneyfan


    diomed wrote: »
    I am familiar with her work and the work of Prof Patrick Cunningham.
    Years ago Prof Cunningham analysed all horses in a few editions of the General Stud Book comparing their ratings with their pedigrees and found 1/3 of performance could be explained by pedigree.
    I read in the Bloodstock Breeders Review annuals (30 years ago?) about the Irish Thoroughbred Breeders Association (?) funding the research.

    Her work on the speed gene and the quality test is very useful to owners for racing now, and probably of great benefit in picking mares to use for breeding.
    That research will produce more discoveries in time.
    I wonder if the lots on offer at horse auctions in the future will be the genetically proven duds.

    I don't have a blood sample for each of 200,000 horses, living and dead, and the €500 fee each to get them tested.
    My results (if I get any results) will be guesswork. Her results are facts.


    For the past week or so I've been gathering more data on about 180k horses so I won't be stopping this work.
    I never have an opinion on one sire, only an opinion on how his pedigree works with the mare population to which he is bred.
    To expand on that, I wouldn't be a fan of the Frankel stud fee when you can get the same ancestors with his full brother Noble Mission.
    I've started a re-write of the analysis I did a few weeks back. I'll use a 12 generation pedigree instead of a 6 generation. That looks like slower work (12 v 6) but I'll do less checking.


    I'll speed things by re-writing to reduce the checking, and by re-using some of the work
    e.g. for the 1,650 Sadler's Wells offspring in my database I'll just generate the SW pedigree once and the dam pedigrees of the 1,650 foals instead of each pedigree fully each time.

    I assume you mean ... R version 3.2.3 (2015-12-10) The R Foundation for Statistical Computing ...
    No I have it but have not used it yet. Data collection takes up most of my time.
    I have 3.2.2 but obviously along the same lines. I use R Studio. I still don't understand why you are having these difficulties.


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    kidneyfan wrote: »
    I have 3.2.2 but obviously along the same lines. I use R Studio. I still don't understand why you are having these difficulties.
    I'm not having any difficulties.


  • Registered Users Posts: 48 RoyalAcademy2


    This project is doomed to failure I'm afraid.

    Coolmore have the best current band of broodmares mostly dating back to when American blood was reimported back to Ireland in the 70's.

    Godolphin/Shadwell have the best mares money can buy.

    The traditional breeders like the Aga and Ballymacoll have their own lines.

    The new kids on the block will pay outrageous prices for blood (and they will be forced to!)-the Wildenstein draft at Goffs in November will be fireworks I expect.

    1. When did Coolmore last produce a real champion?
    2. Why cant Godolphin breed a stakes horse any more?
    3. What's the Aga's strike-rate for G1 success? One foal in fifty/hundred I expect
    4. How long will the Qatari's remain interested? Is Group 1 racing more about the "season" in London that any horse race?

    All the above will get their winners/champions but they will fail miserably with most others.

    For any anoraks here, take a look at the average failed Weld/Abdullah runner - he gets large numbers of them - Prendergast/Sh Hamdan, Bolger home-breds and any one of hundreds of failed Sheikh Mo horses. It is simply a numbers game for which you need dozens of "well bred" mares with a history AKA pedigree AKA money pit.

    The sub-theme of this admirable thread is, presumably, to find value in the sales ring. I would rather be a seller than buyer!


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    The sub-theme of this admirable thread is, presumably, to find value in the sales ring. I would rather be a seller than buyer!
    I don't have any broodmares and will not be buying.
    The aim is to analyse horses before they race or have raced little and have an opinion on every runner.
    I'll admit I did analyse a sales catalogue, but that was just a useful exercise in getting something together to analyse large numbers at once. My ideas file has about a dozen ideas that I'll be looking at over the winter, and testing against large results files.


  • Advertisement
  • Banned (with Prison Access) Posts: 1,279 ✭✭✭kidneyfan


    How bizarre to describe such a project as doomed to failure.

    Diomed it occurs to me that databases work best when they work with sets rather than trying to work procedurally. I am not that familiar with foxpro but could reworking some queries to work in a more set baased manner work?


  • Registered Users Posts: 48 RoyalAcademy2


    I am suggesting that the results will provide generalisations: nicks that will be already known to the teams of experts behind the conglomerations.

    I don't know why diomed would almost apologise for reviewing a catalogue because, as he says, he's neither an owner not breeder therefore this is the logical starting point.

    His reference to the betfair thread is also extremely naive and I cannot see the point of the exercise - and the Herculean task - unless it has some ultimate reward.

    My point is not to deride but to suggest the results will be predictable yet unusable in any useful, commercial sense


  • Registered Users Posts: 48 RoyalAcademy2


    I am suggesting that the results will provide generalisations: nicks that will be already known to the teams of experts behind the conglomerations.

    I don't know why diomed would almost apologise for reviewing a catalogue because, as he says, he's neither an owner not breeder therefore this is the logical starting point.

    His reference to the betfair thread is also extremely naive and I cannot see the point of the exercise - and the Herculean task - unless it has some ultimate reward.

    My point is not to deride but to suggest the results will be predictable yet unusable in any useful, commercial sense


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    RoyalAcademy2
    Thanks for your review of the results.


  • Registered Users Posts: 48 RoyalAcademy2


    Diomed, your comment suggests I am missing the point and, if I am, I apologise.

    Address one question please: if two full brothers can be Pegasus on the one hand and noddy on the other how can your data predict between one and the other?

    Sailers wells and fairy King came from the same dam yet fairy King was a very good stallion despite no track success.

    Flat racing suggests the dam must have pedigree to have a decent chance of producing a winner whereas it's often complete pot luck over jumps.


  • Registered Users Posts: 2,702 ✭✭✭tryfix


    Diomed, your comment suggests I am missing the point and, if I am, I apologise.

    Address one question please: if two full brothers can be Pegasus on the one hand and noddy on the other how can your data predict between one and the other?

    Sailers wells and fairy King came from the same dam yet fairy King was a very good stallion despite no track success.

    Flat racing suggests the dam must have pedigree to have a decent chance of producing a winner whereas it's often complete pot luck over jumps.

    The purpose of his data mining is to spot pedigrees that are likely to produce high class horses. In the case of Sadler's Wells and his full brother Fairy King the fact that Fairy King wasn't up to much on the track doesn't mean that the ( Northern Dancer x Fairy Bridge ) mating wasn't a super nick for producing quality racehorses, the subsequent successful stud career of Fairy King confirms that ( Northern Dancer x Fairy Bridge ) was a super nick and the third full brother colt from that mating Tate Gallery was Gp1 winning 2yo who also had a successful stallion career.

    All three full Brothers produced different types than the others as Stallions but as Diomed pointed out he's not trying to predict staying power, just likely inherited class. On that basis the Sadler's Wells and brothers story justifies the kind of broad search for class in a pedigree that Diomed is pursuing.


  • Registered Users Posts: 48 RoyalAcademy2


    And my point is that this would be obvious to any detailed study of any individual pedigree.


  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    Diomed, your comment suggests I am missing the point and, if I am, I apologise.
    1 .Address one question please: if two full brothers can be Pegasus on the one hand and noddy on the other how can your data predict between one and the other?
    2. Sailers wells and fairy King came from the same dam yet fairy King was a very good stallion despite no track success.
    3. Flat racing suggests the dam must have pedigree to have a decent chance of producing a winner whereas it's often complete pot luck over jumps.
    1. Do you mean Sadler's Wells?
    I assume you are saying Sadler’s Wells was Pegasus and his full sibling Fairy King was Noddy. Is that correct?
    Sadler’s Wells was not the champion of his year. El Gran Senor was rated 136; Darshaan 133; Sadler’s Wells 132; Chief Singer 131.

    2. There is a reason for Fairy King's lack of success on the track: "made one start at Phoenix Park and suffered a fractured sesamoid." see pedigreequery.com

    Another full sibling, Tate Gallery, was rated 117 as a 2yo and got no rating as a 3yo "last of 15 in the 2000 Guineas ... lacked enthusiasm on reappearance … retired to Coolmore Stud".

    Obviously to me Fairy King was a sprinter (Equinome C.C) and Sadler’s Wells a middle distance type (Equinome C.T). Their dam fairy Bridge ran twice, won twice, as a 2yo in 1977 at 5f and 6f in the Phoenix Park, and was retired to stud, given 8 stone 12 pounds on the Irish Free Handicap (3rd highest ranked). The prize money she could win as a 3yo paled into insignificance beside her stud vale.

    3. In my opinion good jumps runners are slower that good flat horses. If jumps horses were competitive they would compete for prizes like today’s Irish Champion Stakes. Harzand won UK£1.5 million for his English and Irish Derby wins.
    The real prize is stallion fees. Galileo’s fee is said to be €300k at over 100 covers a year = €30 million a year x 15 years stud career = ~ €450 million - the real prize.

    The only jumps horse that I can remember was competitive on the flat was Alderbrook. He won the Group 2 Prix Dollar at Longchamp. He later won and was 2nd in the Champion Hurdle.
    Istabraq won 2 races from 11 runs on the flat, and 23 from 29 over hurdles.

    You can not make definitive statement saying the dam must be good to breed a winner. I say there must be class, either in the dam, her dam, or further back.
    My example to confuse you is Sea-Bird (1962). He won the Epsom Derby, the Prix de l’Arc de Triomphe and other top races.
    Timeform rated him 145 and he is often considered the best horse of the 20th century.
    His dam Sicalade won 750 French Francs (about £75). Timeform say she "ended up being sold as butcher’s meat for £100".
    Her dam Marmelade (1949), was a full sibling of Camaree, the 1950 English 1000 Guineas winner.
    Sicalade dam Marmalade was a full sibling of a very classy filly.
    While I am discussing siblings I’ll say I don’t take much notice of half-siblings i.e. different sire, dame dam (often mentioned/identified in sales catalogues).


    Some general comments:
    You can not predict how good/bad a foal will be. There are over 20,000 genes involved and afaik the scientists have some idea about some of the genes/grouping of genes involved in athletic ability. They can test the foal when the foal is produced. They can not tell before the mating what ability the foal will inherit from the sire and what it will inherit from the dam.


    What I am trying to do is identify the ingredients that might produce a pleasing result by analysing the parents, grandparents, and further back.
    I know some will say only the sire and dam matter. True. If you look at brothers have you ever wondered why you are not identical? Each gets a different mix.

    One benefit from your questions is this afternoon I looked at the best horses by a sire. I found something in those pedigrees and wondered if it was in any other very top horses not by that sire. I found the same characteristic. What is strange is the thing I found was in the 11th generation of those three top horses, in both the sire side and dam side. It is not a simple feature. I’ll keep the instantly recognisable names of those three top horses to myself and put it in my ideas file for further testing.

    I want to analyse the pedigrees of both poor and good horses (horses that are not culled) against race data and work out a few rules.
    Some time ago I checked if group race winners produced group race winners (the sales catalogue put the damline group winners in black type approach). I did not find a worthwhile link (very weak).
    I have a database of all Group 1, 2, 3 races for Ireland, England, France, Germany, Italy, USA (Grade 1 only) from 1900 to 2015. It wasn’t a simple task to fill in the database and the pedigrees.

    Here is some info on USA horses in a book on my desk:
    Number of foals: 399,121
    Starters: 69.4%
    Winners: 46.1%
    Stakes winners: 3.4%
    Graded stakes winners: 0.8%
    Grade 1 winners: 0.2%

    1 in 500 foals in the USA is a grade 1 winner.

    The aim is not to identify the Group 1 / Grade 1 winners by analysing pedigrees, to identify the 500/1 shot.
    A more humble target might be to identify the 30.6% who are not good enough to make the racecourse, or the 23.3% who run and can not win a race (see above table).


  • Registered Users Posts: 9,205 ✭✭✭Gringo180


    Todays Leger winner Harbour Law has a real speedy pedigree. Surprised to see such a pedigree produce a stamina laden stayer at the highest level.


  • Advertisement
  • Registered Users Posts: 2,702 ✭✭✭tryfix


    Gringo180 wrote: »
    Todays Leger winner Harbour Law has a real speedy pedigree. Surprised to see such a pedigree produce a stamina laden stayer at the highest level.

    A DI of 1.29 isn't speedy, but I know what you mean re the dam a sprinter by Pivotal a sire who produces both sprinters and classic middle distance performers.

    There's the highest level and the Highest Level. The tight finish of the Leger was between three nice group class colts rather than between 3 proper Gp 1 horses. The proper Gp 1 horse in the race crashed out leaving a Gp1 in name only behind.


Advertisement