Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Is there a formula or method?

  • 10-04-2018 3:40pm
    #1
    Closed Accounts Posts: 4,744 ✭✭✭


    My interest is thoroughbred horse pedigrees.
    About a year back I completed work on 160,000 pedigrees where I analysed the six generations pedigrees (2+4+8+16+32+64 = 126 ancestors) of those horses.
    I extracted pedigree information from the 126 ancestors, and compared the results to the horse ratings.
    I found that better horses had more of certain pedigree features.
    These features in the pedigrees could be positive or negative - some increased ratings, others lowered ratings.

    In summary form one of the features affected ratings like this.
    More occurrences of the feature gave higher ratings on average.

    occ rating horses
    0 90.05 876
    1 91.44 4974
    2 92.42 9889
    3 93.74 10803
    4 95.24 6826
    5 98.07 2662
    6 100.15 812
    7 104.66 189
    8 110.92 24
    9 86.00 2



    The above is a summary of 37,000+ horses.

    The chart below is a list of individual horses with individual rating.
    e.g. horse AAAAAA_001 had a rating of 116 and had 1 of factor AAA, 1 of factor EEE, 2 of factor III.
    What I would like to do is have a formula to calculate the rating from the AAA, BBB, CCC, DDD, EEE, FFF, GGG, HHH, III numbers.

    Missing info that might be required are things like:
    ratings average about 75 for the horse population so the formula might have to start with 75 and adding AAA, BBB and so on.
    Another bit of missing info that might be important is the rating of the mother of the horse (the dam or broodmare).
    Typically a mare with a low rating produces lower rated runners, a mare with a higher rating higher rated runners.
    e.g. AAAAAA_003 rated 120. His dam was rated 107.
    But I think selecting a sire for that dam to produce a good foal (in theory) with many of the positive factors will boost the foal rating above the rating of its dam.


    Name rate AAA BBB CCC DDD EEE FFF GGG HHH III
    AAAAAA_001 116 1 0 0 0 1 0 0 0 2
    AAAAAA_002 87 1 0 1 0 0 0 0 0 2
    AAAAAA_003 120 3 0 1 1 0 1 0 0 5
    AAAAAA_004 100 3 0 0 1 0 2 0 0 0
    AAAAAA_005 70 2 1 1 0 0 0 0 2 0
    AAAAAA_006 67 5 2 1 0 0 0 0 1 0
    AAAAAA_007 71 2 2 0 0 0 0 0 2 0
    AAAAAA_008 107 2 1 0 1 0 0 0 6 0
    AAAAAA_009 63 3 1 0 0 1 0 0 0 0
    AAAAAA_010 106 4 2 2 0 0 0 0 0 0
    AAAAAA_011 85 4 2 0 1 0 0 0 3 0
    AAAAAA_012 62 3 0 0 0 1 0 0 8 2
    AAAAAA_013 116 2 0 2 0 0 0 0 3 1
    AAAAAA_014 103 4 2 1 0 0 0 0 2 2
    AAAAAA_015 113 6 1 2 2 0 0 1 2 1
    AAAAAA_016 54 2 1 1 0 0 0 0 3 0
    AAAAAA_017 103 5 0 1 1 0 1 0 0 0
    AAAAAA_018 77 4 0 1 2 0 1 0 2 0
    AAAAAA_019 106 2 0 2 0 0 0 0 2 0
    AAAAAA_020 108 4 0 0 3 1 0 0 2 0
    AAAAAA_021 90 5 1 0 2 0 1 1 3 0
    AAAAAA_022 123 2 1 1 0 0 0 0 1 0
    AAAAAA_023 65 5 1 1 1 0 0 1 2 0
    AAAAAA_024 87 3 0 1 1 1 0 0 2 0
    AAAAAA_025 101 1 0 1 0 0 0 0 0 3
    AAAAAA_026 66 6 3 1 1 0 0 0 2 3
    AAAAAA_027 114 3 0 0 1 1 1 1 1 0
    AAAAAA_028 91 2 1 0 1 0 0 0 5 3
    AAAAAA_029 80 3 1 1 0 1 0 0 0 0
    AAAAAA_030 30 1 1 0 0 0 0 0 1 1
    AAAAAA_031 77 3 2 1 0 0 0 0 4 2
    AAAAAA_032 94 3 2 1 0 0 0 0 1 0
    AAAAAA_033 87 4 1 2 1 0 0 0 0 0
    AAAAAA_034 108 3 1 2 0 0 0 0 2 0
    AAAAAA_035 134 3 1 0 1 0 0 1 0 0
    AAAAAA_036 112 5 2 1 1 1 0 0 3 3
    AAAAAA_037 103 3 1 1 1 0 0 0 0 2
    AAAAAA_038 98 3 0 2 1 0 0 0 3 0
    AAAAAA_039 49 2 0 1 1 0 0 0 0 0
    AAAAAA_040 47 2 1 0 0 0 1 0 4 4
    AAAAAA_041 123 3 2 0 1 0 0 0 3 5
    AAAAAA_042 98 4 0 2 2 0 0 0 2 2
    AAAAAA_043 117 4 0 2 1 0 0 0 1 0
    AAAAAA_044 116 4 1 0 0 3 0 0 2 0
    AAAAAA_045 79 4 1 1 0 1 0 0 5 2
    AAAAAA_046 124 4 1 0 1 1 0 0 2 4
    AAAAAA_047 65 6 4 1 0 0 0 0 1 2


    I tested the numbers with Chi Squared tests and proved that higher counts of some of the "factors" give higher ratings.
    And although I know by how much in summary form for tens of thousands I would like a formula to give an approximation for one horse.

    The difficulty is it is not possible to separate horses with only one factor e.g. CCC as the horses will almost certainly have one or more of the other factors.
    A horse could have four positive factors and three negative factors, or any combination.

    And the summary data has the same problem - positive factors and negative factors that are inseparable.
    This factor (see below), although appearing neutral or slightly negative, could be highly negative as it can not be separated from the positive factors.
    The 688 horses with an average rating of 93.93 could have much higher average rating if they did not have this factor.
    occ XYZ count
    0 93.55 16074
    1 93.97 15396
    2 93.97 4857
    3 93.93 688
    4 92.50 42



    There can be about 18 pedigree factors for each horse, although most of these will be nil (zero).
    Just think of columns AAA, BBB, CCC columns going out to around column RRR.
    The data is in a database.

    Summary data is something like this
    occ VVV count WWW count XXX count YYY count ZZZ count
    0 93.19 24383 93.11 25387 93.11 15847 93.55 16074 93.84 27995
    1 94.27 10236 94.77 9752 93.81 14034 93.97 15396 93.51 8590
    2 97.42 2184 97.71 1785 94.79 5621 93.97 4857 95.81 465
    3 100.15 242 98.63 127 96.71 1300 93.93 688 91.43 7
    4 96.18 11 115.5 6 96.65 229 92.5 42
    5 134 1 107.57 23
    6 91 3


    Any good ideas?


Comments

  • Registered Users, Registered Users 2 Posts: 1,595 ✭✭✭MathsManiac


    Post moved - shouldn't have been in the sticky where it was posted.


Advertisement