Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Stats help

  • 04-03-2010 5:26pm
    #1
    Closed Accounts Posts: 1,581 ✭✭✭judas101


    Ok lads here a query for you all, might be a long one.

    Im a postgrad Engineer. I was never into stats, I didnt cover it at LC level and was never my strongest area. I prefer calculus.

    So in a paper Im reading the data is analyzed statically.
    The content is irrelevant but there are 3 possible outcomes, grasped, ungrasped and damaged.

    There are 7 subjects that do 6 tests each, 3 assisted (by some new technology) 3 unassisted.

    There were a further 12 test for validity (dont know why)

    The test was a two tailed paired t test.

    So theres 54 test altogether.
    The p-value is 0.05

    The testing results were:

    Unassisted: Not Grasped 8 Grasped 14 Damager 5
    Assisted: Not Grasped 2 Grasped 16 Damaged 9



    The test result is 0.037

    Can anyone help me see where this result came from?

    This might be very obvious to some of you but Im not good at this and dont think all the info is provided.

    I should mention that this is just for my own understanding.
    I read a lot of papers and always get lost when it comes to the stastical analysis of data.

    any help is much appreciated.

    thanks


Comments

  • Registered Users, Registered Users 2 Posts: 1,163 ✭✭✭hivizman


    This is rather strange, because it appears that there are three outcomes of the test (grasped, ungrasped, damaged), but you say that the data were tested using a paired t test. This method is really only appropriate if the outcome of the test can be measured using some sort of scale data (for example, the amount of pressure exerted by the test subject), and it isn't suitable for categorical data (where you have several distinct outcomes).

    The presentation of the data would allow you to carry out a Chi-squared test of the null hypothesis that there is no difference in outcome between assisted and unassisted tests. I have calculated the Chi-squared statistic as 4.876, with two degrees of freedom, and the p-value for this is 0.087. Hence, if you require a p-value of 0.05 or less to reject the null hypothesis, you are unable to reject the hypothesis that there is no difference in outcome between assisted and unassisted tests.


  • Closed Accounts Posts: 201 ✭✭ArmCandyBaby


    0.037 = P(ASSISTED & NOT GRASPED | TEST CONDUCTED) ;)


    Almost certain you're not giving all the details here.


  • Registered Users, Registered Users 2 Posts: 3,803 ✭✭✭El Siglo


    If you gave us a link or a citation to the paper that you're reading, that might help.


  • Registered Users, Registered Users 2 Posts: 1,163 ✭✭✭hivizman


    0.037 = P(ASSISTED & NOT GRASPED | TEST CONDUCTED) ;)

    Well spotted! But you wouldn't call this a "test result", would you? From the context, I was reading this as the p-value for the t statistic (though it could be the t-statistic itself, in which case it would not be significant at any conventional level).

    Almost certain you're not giving all the details here.

    I agree. But if you are not very confident with statistics, then you may not appreciate what the relevant details are in the first place. As El Siglo has requested, could the original poster please give a full reference for the paper in case anyone has online access to the original source?


  • Registered Users, Registered Users 2 Posts: 3,803 ✭✭✭El Siglo


    hivizman wrote: »
    I agree. But if you are not very confident with statistics, then you may not appreciate what the relevant details are in the first place. As El Siglo has requested, could the original poster please give a full reference for the paper in case anyone has online access to the original source?

    I think this might (not sure) be the paper the op is talking about: http://web.mit.edu/mars/Conference_Archives/MarsWeek04_April/Speaker_Documents/HapticFeedback-DominicRizzo(Paper).pdf

    I'm too tired to comment on it properly, but it's probably got something to do with the fact that there was significant correlation between the results (assisted and unassisted) that it waranted a paired t-test and that's why you got your results (I'm probably wrong). It's odd as well though, because the t-test assumes ther Gaussian distribution is normal, yet why would you do it for such a small sample size? Strange, I'd have probably gone with a Wilcoxon Rank Sum test, again I'm pretty tired so I could be talking shite!:D


  • Advertisement
  • Closed Accounts Posts: 1,581 ✭✭✭judas101


    Thats for the input folks. I must admit that most of you replies have gone right over my head.

    Armcandyboy, could you show me how you arrived at that value?

    Well played El Siglo, that is indeed the correct paper.
    As a paper, its content is quite poor. Idea is interesting but testing and presentation of results isnt good. Im just trying to get my head around the statistal processes used as I see tonnes of papers with similar analysis and cant get my head around it.

    Would it be possible that the paired t test was used only to discern between grasped and ungrasped thus ruling out damaged as a result?

    The result of 0.037 is stated as the result of the t test.
    Im still not even sure what this means in simple terms.

    The Chi-squared test was used first. That test squared tests returned a final success of 1.8%

    still confused.

    Thanks again everybody


  • Closed Accounts Posts: 1,581 ✭✭✭judas101


    El Siglo wrote: »
    It's odd as well though, because the t-test assumes ther Gaussian distribution is normal, yet why would you do it for such a small sample size? Strange, I'd have probably gone with a Wilcoxon Rank Sum test, again I'm pretty tired so I could be talking shite!:D


    In my engineering studies it is often taken that once the sample size is over 30 it is approaching normal distribution.

    How accurate an approximation this is I dont know but its a commonly used figure.

    Out of interest, how did you know that it was that particular paper?
    There was very little content provided.
    I'm impressed!


  • Closed Accounts Posts: 201 ✭✭ArmCandyBaby


    I had a quick look there and the way I understand it is:

    For test A, Grasping Success, they use a Chi-Squared test to see if there is a difference in the distribution between grasping, non-grasping and damaged @ 5%. And since the results are 2.1% and 1.8%, the results are statistically significant ie. they do have different distributions.

    For test B, Force Control, they don't include any force control data (they just include an example in fig. 5) from what I can see (fig. 4 data that you included is about related to test A). They define the null hypothesis as there being no difference in force control between the two tests (Haptic, No Haptic) @ 5%. The test result of 0.037 means that there is a 3.7% of getting the results that they obtained and since this is less likely than 5%, they reject the null hypothesis and prove statistical significance.


    If I'm wrong I've just confused things massively! :pac:


  • Closed Accounts Posts: 1,581 ✭✭✭judas101


    That help alot, thanks!

    So, for test B, theres no way of explaing how they arrived at that number (0.037) as the data isnt provided?
    Ive been looking at books and Z values all day and Im pulling my hair out!

    And with test A, why do you think they chose a value of 0.05 and not 0.01 which is also a standard starting point?

    If the results were over 5% would that mean that there is no difference?
    Finally, do those values (1.8 and 2.1%) mean anything in real terms or is this figure just an indication of different distributions?


  • Closed Accounts Posts: 201 ✭✭ArmCandyBaby


    judas101 wrote: »
    Finally, do those values (1.8 and 2.1%) mean anything in real terms or is this figure just an indication of different distributions?

    To get those values you use this formula, using the No Haptic data as the expected value:

    8ab2323bab177c9038c8988d7900ed1d.png

    For fig. 3 data

    ((2-8)^2)/8 +((12-11)^2)/11 +((13-8)^2)/8

    = 7.71590909

    If you look this value up in the tables or use this converter

    http://www.danielsoper.com/statcalc/calc11.aspx

    with 3 - 1 = 2 degrees of freedom as hivizman said above you get 0.021111 ie. 2.1%. Try the data in fig. 4 yourself and you'll see you get the 1.8%. They just mean that there is 2.1% and 1.8% chance of them having the same distribution ie. the Haptic does nothing.


  • Advertisement
  • Posts: 0 CMod ✭✭✭✭ Garrett Easy Principal


    judas101 wrote: »
    And with test A, why do you think they chose a value of 0.05 and not 0.01 which is also a standard starting point?
    From my own stats notes testing at 5% level is common enough
    If the results were over 5% would that mean that there is no difference?

    well you reject the null hypothesis if the p value is smaller than or equal to 5%, so you don't have enough evidence to reject the null hypothesis if it's over 5%, so yeah if the null hyp. in this case is "there is no difference", then there's no difference


    also currently trying to study/revise stats and tearing my hair out with this stuff :o


  • Registered Users, Registered Users 2 Posts: 3,803 ✭✭✭El Siglo


    judas101 wrote: »
    In my engineering studies it is often taken that once the sample size is over 30 it is approaching normal distribution.

    How accurate an approximation this is I dont know but its a commonly used figure.

    Out of interest, how did you know that it was that particular paper?
    There was very little content provided.
    I'm impressed!

    Yeh, it depends on what you're looking at I suppose, thirty is a good round number!
    T'was a bit of luck, I'm used to downloading papers and forgetting where I got them from so I have to back track, thanks in any case! Hope you understand what you're dealing with now.
    Like what Bluewolf said, anything above 0.05, you reject it. If you've worked with R then it will rank the p-value (i.e. if it's really, really significant or if it's just significant). You might also want to consider emailing the authors and asking them about their data sets, maybe you could run your own tests on the data (just a thought!:D).


  • Registered Users, Registered Users 2 Posts: 1,163 ✭✭✭hivizman


    I've now had a chance of reading the paper. First, I'm not convinced that taking the unassisted (no haptic) outcomes as the expected values in the Chi-squared test is appropriate. I drew up a 3x2 contingency table with the three outcomes and the two experimental conditions (no haptic and haptic), and worked out expected values based on the null hypothesis that the experimental condition makes no difference to the outcome. As there were a total of 10 not grasped, 30 grasped and 14 damaged outcomes in the 54 tests, this gave me expected outcomes of 5, 15 and 7 for both the no haptic and haptic conditions. This explains the difference between my probability and that of the authors of the paper.

    For the paired t test, it appears that the authors did indeed have a scale variable, position (final) - position (maximum). The authors state that they had a total of nine subjects, each of whom did three no haptic and three haptic tests. What they don't tell us is how they paired the tests. Did they (a) pair the first no haptic test with the first haptic test, the second no haptic test with the second haptic test, and the third no haptic test with the third haptic test, for each subject, or (b) pair the largest score on the three no haptic tests with the largest score on the haptic tests, and so on, for each subject, or (c) pair the largest score on the three no haptic tests with the lowest score on the three haptic tests, and so on, for each subject, or (d) pair randomly within each subject, or (e) average the no haptic and the haptic scores for each subject, or something else? These choices won't affect the mean difference between test scores, but will affect the standard error.

    In a paired t test, the standard null hypothesis is that the mean difference is zero. The paper does not report any helpful data, such as the mean scores across all the trials for the no haptic and the haptic tests. All that we are told is that the probability of obtaining a value of the t statistic actually calculated, or a higher value, by chance if the true mean of the differences in the pairs of scores is zero, is 0.037 (3.7%). As this is less than the 5% criterion, the null hypothesis that the mean of the differences is zero can be rejected. The paper states that "subjects' ability to control force increased, on average, by 205%", which is a bit vague, but which may mean that the mean score for the haptic trials was 3.05 times the mean score for the no haptic trials (remember, an increase by x% means that the increased score is (100+x)% of the original score).

    By the way, be careful about the statement "it is often taken that once the sample size is over 30 it is approaching normal distribution". The second "it" here refers not to the distribution of the sample but to the t distribution. This distribution is different for different degrees of freedom (basically sample size minus one). As the degrees of freedom increase, the t distribution tends towards the normal distribution, and for sample sizes of 30 or more it is common (particularly when you are doing your statistics by hand and using tables rather than using a computer package) to use the normal distribution rather than the t distribution. However, this has nothing to do with the underlying sample distribution. You can have a very large sample that does not follow a normal distribution (think about a sample of one million observations, all of which are zero).


  • Closed Accounts Posts: 1,581 ✭✭✭judas101


    hivizman wrote: »

    For the paired t test, it appears that the authors did indeed have a scale variable, position (final) - position (maximum). The authors state that they had a total of nine subjects, each of whom did three no haptic and three haptic tests. What they don't tell us is how they paired the tests. Did they (a) pair the first no haptic test with the first haptic test, the second no haptic test with the second haptic test, and the third no haptic test with the third haptic test, for each subject, or (b) pair the largest score on the three no haptic tests with the largest score on the haptic tests, and so on, for each subject, or (c) pair the largest score on the three no haptic tests with the lowest score on the three haptic tests, and so on, for each subject, or (d) pair randomly within each subject, or (e) average the no haptic and the haptic scores for each subject, or something else? These choices won't affect the mean difference between test scores, but will affect the standard error.




    Thanks alot for that. Really helpful.
    Its quite odd that the data wasn't provided.

    So it it safe to say that the use of the t-test in this case was not appropriate?

    Are there other methods that would suit better?


  • Registered Users, Registered Users 2 Posts: 1,163 ✭✭✭hivizman


    judas101 wrote: »
    Thanks alot for that. Really helpful.
    Its quite odd that the data wasn't provided.

    So it it safe to say that the use of the t-test in this case was not appropriate?

    Are there other methods that would suit better?

    If the researchers had 27 test subjects, each of whom did one no haptic and one haptic trial, then the paired t test (which investigates whether the mean of the differences between the two trial scores for each subject is significantly different from zero) is appropriate. But as the researchers had only nine subjects, each of whom did three no haptic and three haptic tests, they have some flexibility over how they actually match the scores for each subject, and that reduces my confidence in what they have done. If they took the mean no haptic score and the mean haptic score for each subject and did a matched pairs t test on the nine pairs of means, then this would be a reasonable approach, which would control for learning effects (the subjects are likely to get better at the exercise as they have more trials).

    So what I'm saying is that the research design may not be ideal. However, if the researchers didn't play games with matching the test results, the paired t test would be an appropriate approach for analysing the data. Unfortunately, the paper doesn't give enough details to allow readers to assess the stated results fully. It would not normally be the case for researchers publishing in academic journals to include their raw data, but some academic journals have the policy that researchers should make their data available to enquirers on request.


Advertisement