Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

Chi Squared ?

  • 25-01-2017 08:40PM
    #1
    Closed Accounts Posts: 4,744 ✭✭✭


    Occurences Group A Group B
    0 231 260
    1 203 263
    2 77 109
    3 14 30
    4 3 3
    5 0 1

    528 666


    The above is a snippet of data from something I've been working on for many years, comparing thoroughbred horse pedigrees with horse ratings/results.
    I wrote programs to compare things (occurences) in the family tree of each horse with its rating.

    This week I finally completed database programs that analysed 159,660 horse pedigrees, and threw out the results.
    The results data is 4116 rows by 40 columns = 164,640 cells.
    That data is a combined file for colts, gelding, and fillies.
    I will later split the initial data into three files of colts, geldings, fillies.

    The rows are groups from 1 to 147. Each of those groups has 28 rows (values 0 to 27).
    Most of the data is clustered as in the above example, in the first few rows of each group, from 0 to 6 or 7.
    (group 1 has 0 to 27, group 2 has 0 to 27, and so on down to 147 with 0 to 27). The 4,116 is 147 X 28 = 4,116).

    I have a book that mentions the Chi squared test for data like this.
    It mentions comparing values with expected values. I'm at a loss to know what should be my expected values.

    I've tried CHITEST in Excel but it doesn't like zero values.

    In the above example Group B are the better horses.
    Are the higher numbers at occurences 2, 3, 5 significant?
    Ideally I would like to analyse all the output, the 164,640 cells.

    Any help welcomed.


Comments

  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    My attempt

    Actual Frequency Group A..... Group B..... Total....... Comment
    0 231 260 491
    1 203 263 466
    2 77 109 186
    3 14 30 44
    4 & 5 3 4 7
    528 666 1194


    Expected Frequency Group A Group B Total
    0 217.13 273.87 491 [ 491/1194*528 = 217.13]
    1 206.07 259.93 466
    2 82.25 103.75 186
    3 19.46 24.54 44
    4 & 5 3.10 3.90 7
    528 666 1194

    p-value 0.28505746 [ =CHITEST(D41:E45,D50:E54) ]


    Chi-Square Terms Group A Group B
    0 0.89 0.70 [ ( 231-217.13)^2 /217.13 = 0.89]
    1 0.05 0.04
    2 0.34 0.27
    3 1.53 1.21
    4 & 5 0.00 0.00

    Chi-Square 5.02 [ =SUM(B61:65) ]

    Alpha 0.01 <--- 1% level
    Critical Value 16.81 <--- critical chi-square value (x2 distribution table)
    [ =CHIINV(0.01,6) ]
    Decision Reject i.e. Group A & Group B are not different (because 5.02 < 16.81)


    lines 4 & 5 added because you can't have a zero value


Advertisement