Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Chi Squared ?

Options
  • 25-01-2017 8:40pm
    #1
    Closed Accounts Posts: 4,744 ✭✭✭


    Occurences Group A Group B
    0 231 260
    1 203 263
    2 77 109
    3 14 30
    4 3 3
    5 0 1

    528 666


    The above is a snippet of data from something I've been working on for many years, comparing thoroughbred horse pedigrees with horse ratings/results.
    I wrote programs to compare things (occurences) in the family tree of each horse with its rating.

    This week I finally completed database programs that analysed 159,660 horse pedigrees, and threw out the results.
    The results data is 4116 rows by 40 columns = 164,640 cells.
    That data is a combined file for colts, gelding, and fillies.
    I will later split the initial data into three files of colts, geldings, fillies.

    The rows are groups from 1 to 147. Each of those groups has 28 rows (values 0 to 27).
    Most of the data is clustered as in the above example, in the first few rows of each group, from 0 to 6 or 7.
    (group 1 has 0 to 27, group 2 has 0 to 27, and so on down to 147 with 0 to 27). The 4,116 is 147 X 28 = 4,116).

    I have a book that mentions the Chi squared test for data like this.
    It mentions comparing values with expected values. I'm at a loss to know what should be my expected values.

    I've tried CHITEST in Excel but it doesn't like zero values.

    In the above example Group B are the better horses.
    Are the higher numbers at occurences 2, 3, 5 significant?
    Ideally I would like to analyse all the output, the 164,640 cells.

    Any help welcomed.


Comments

  • Closed Accounts Posts: 4,744 ✭✭✭diomed


    My attempt

    Actual Frequency Group A..... Group B..... Total....... Comment
    0 231 260 491
    1 203 263 466
    2 77 109 186
    3 14 30 44
    4 & 5 3 4 7
    528 666 1194


    Expected Frequency Group A Group B Total
    0 217.13 273.87 491 [ 491/1194*528 = 217.13]
    1 206.07 259.93 466
    2 82.25 103.75 186
    3 19.46 24.54 44
    4 & 5 3.10 3.90 7
    528 666 1194

    p-value 0.28505746 [ =CHITEST(D41:E45,D50:E54) ]


    Chi-Square Terms Group A Group B
    0 0.89 0.70 [ ( 231-217.13)^2 /217.13 = 0.89]
    1 0.05 0.04
    2 0.34 0.27
    3 1.53 1.21
    4 & 5 0.00 0.00

    Chi-Square 5.02 [ =SUM(B61:65) ]

    Alpha 0.01 <--- 1% level
    Critical Value 16.81 <--- critical chi-square value (x2 distribution table)
    [ =CHIINV(0.01,6) ]
    Decision Reject i.e. Group A & Group B are not different (because 5.02 < 16.81)


    lines 4 & 5 added because you can't have a zero value


Advertisement