Advertisement
How to add spoiler tags, edit posts, add images etc. How to - a user's guide to the new version of Boards
Mods please check the Moderators Group for an important update on Mod tools. If you do not have access to the group, please PM Niamh. Thanks!

Help with statistical analysis

  • #1
    Moderators, Science, Health & Environment Moderators Posts: 6,367 mod Macha


    Hi guys,

    I have to statistically analyze results from a quantitative survey. I have a numerical head but higher LC maths is as far as I went so any help would be greatly appreciated.

    The majority of the questions are Likert-scale. I've read that I can't treat them as interval so does this mean I must include standard deviation as well as the mean? Or something else altogether..?

    Also, if anyone could help out with the concept of a t-test, I would be very grateful. Augh! I would have done qualitative research if I'd known about this!

    Taco


Comments



  • taconnol wrote: »
    The majority of the questions are Likert-scale. I've read that I can't treat them as interval so does this mean I must include standard deviation as well as the mean? Or something else altogether..?

    Strictly speaking, Likert scales are ordinal and not interval. That said, many people treat it as interval anyway. There is no globally accepted way to statistically analyze this kind of data. Expressing the data as means + SD is commonly done. This is wrong, but if you did that, you'd only be as wrong as everyone else. :) Simple graphs are always good too, either the means or show the distribution of answers with the Likert scale on the x-axis.
    taconnol wrote: »
    Also, if anyone could help out with the concept of a t-test, I would be very grateful. Augh! I would have done qualitative research if I'd known about this!

    A t-test is a relatively simple calculation that determines the probability of finding the observed difference in 2 means, if the null hypothesis (i.e. that there is no difference, only sampling error) is true. This probability can be used to accept or reject the null hypothesis with a certain level of confidence. To calculate this, it uses the means as a measure of central tendency and the SD as a measure of its variability.

    There are 2 types of t-test. One is for unpaired samples. For example, you would use this if you had two different groups of people answering the same questionnaire (men and women) and were trying to see if their answers were different. A paired t-test is when the same people answer the same questionnaire item twice and you are trying to see if something has changed. You might also use this to compare the same people's scores on 2 items - although, regression/correlation would be a much more informative way to analyze that kind of data.

    Unfortunately, the t-test assumes that the data are normally distributed, which is unlikely the case with Likert scale output. Luckily, there are non-parametric equivalents of the t-test that are just as easy to use and don't make that assumption (Mann-Whitney = unpaired; Wilcoxon signed rank = paired). These are probably the most correct way to proceed, but can usually only be done in proper statistical packages, which you may not have access to.

    It might be easier to simply do whatever your supervisor says, even if it's wrong... :pac:

    Mods: politely request that this be moved to the researcher forum?




  • 2Scoops wrote: »
    Strictly speaking, Likert scales are ordinal and not interval. That said, many people treat it as interval anyway. There is no globally accepted way to statistically analyze this kind of data. Expressing the data as means + SD is commonly done. This is wrong, but if you did that, you'd only be as wrong as everyone else. :) Simple graphs are always good too, either the means or show the distribution of answers with the Likert scale on the x-axis.
    Thank you so much for your response 2Scoops! I really, really appreciate it. I had already started displaying the Likert scale with the distribution of answers in a bar chart so I'll leave it at that.
    2Scoops wrote: »
    A t-test is a relatively simple calculation that determines the probability of finding the observed difference in 2 means, if the null hypothesis (i.e. that there is no difference, only sampling error) is true. This probability can be used to accept or reject the null hypothesis with a certain level of confidence. To calculate this, it uses the means as a measure of central tendency and the SD as a measure of its variability.

    There are 2 types of t-test. One is for unpaired samples. For example, you would use this if you had two different groups of people answering the same questionnaire (men and women) and were trying to see if their answers were different. A paired t-test is when the same people answer the same questionnaire item twice and you are trying to see if something has changed. You might also use this to compare the same people's scores on 2 items - although, regression/correlation would be a much more informative way to analyze that kind of data.

    Unfortunately, the t-test assumes that the data are normally distributed, which is unlikely the case with Likert scale output. Luckily, there are non-parametric equivalents of the t-test that are just as easy to use and don't make that assumption (Mann-Whitney = unpaired; Wilcoxon signed rank = paired). These are probably the most correct way to proceed, but can usually only be done in proper statistical packages, which you may not have access to.

    It might be easier to simply do whatever your supervisor says, even if it's wrong... :pac:

    Mods: politely request that this be moved to the researcher forum?
    Again, thanks so much for this reply. It's a great help. I'll take your advice and go with the standard t-test. My tutor initially advised against using SPSS and to just use excel so you're right that I don't have access to those statistical packages.

    What does it say about my tutor when a person on Boards is more helpful?




  • taconnol wrote: »
    Again, thanks so much for this reply. It's a great help. I'll take your advice and go with the standard t-test.

    Don't forget Excel can do simple correlation/regression too, if you want to compare answers to different questionnaire items by the same people.
    taconnol wrote: »
    What does it say about my tutor when a person on Boards is more helpful?

    :pac:




  • Thanks 2Scoups. I've decided to go ahead and just do the t-test as I'm running quite short of time and, well, that's what my tutor says to do. Just 2 questions:

    What happens when my t Stat is a negative?
    and
    Can I use the t-test to compare more than 2 groups? eg, I compared male/female but I've grouped them into 5 different income categories..so how does that work?

    Also, what is the correlation/regression you mentioned above? Sorry but I'm pretty much starting from scratch here..

    thanks in advance




  • i'm an undergrad but i thought i'd throw an oar in (I'm sure i'll be corrected :)).
    A neg t-test value tells about the relationship between the variables, you just have to interpret the results. Try putting in the variables the opposite way around. I'm sure you'll then get a positive result :)

    Each T-test carries a 5% error so multiple t-tests will lead to analysis with very high levels of error! AFAIK an ANOVA is used as it doesn't carry the same kind of culminative analysis error. I doubt excel can do ANOVAs?

    About the regression in excel i don't know as i've only used SPSS, but isn't the analysis provided by excel either graphical or very basic?

    Why would you not use something like SPSS if you're doing various analyses on data? IMO It would make it much handier if you've used it previously to use what you learned by using it now!
    I hope some of this helped :)

    Edit: "so you're right that I don't have access to those statistical packages". Analysis could be tricky and considerably slower


  • Advertisement


  • dango wrote: »
    i'm an undergrad but i thought i'd throw an oar in (I'm sure i'll be corrected :)).
    A neg t-test value tells about the relationship between the variables, you just have to interpret the results. Try putting in the variables the opposite way around. I'm sure you'll then get a positive result :)

    Each T-test carries a 5% error so multiple t-tests will lead to analysis with very high levels of error! AFAIK an ANOVA is used as it doesn't carry the same kind of culminative analysis error. I doubt excel can do ANOVAs?

    About the regression in excel i don't know as i've only used SPSS, but isn't the analysis provided by excel either graphical or very basic?

    Why would you not use something like SPSS if you're doing various analyses on data? IMO It would make it much handier if you've used it previously to use what you learned by using it now!
    I hope some of this helped :)

    Edit: "so you're right that I don't have access to those statistical packages". Analysis could be tricky and considerably slower

    Thanks Dango! I reversed the column order and got a positive value! It was identical but positive so from now on, if I get a negative value, I'll just make it a positive one.

    I'm afraid the whole thing is due in 2 weeks (gulp) and my tutor advised against using SPSS so I'm stuck with excel. Thanks so much for your reply. Would you know if it's possible to carry out a t-test on more than one group?




  • You don't have to reverse the order just interpret the results but if it works and makes sense then you should be fine.

    From wiki as it says it better than i could
    "One-way ANOVA is used to test for differences among two or more independent groups. Typically, however, the One-way ANOVA is used to test for differences among at least three groups, since the two-group case can be covered by a T-test (Gossett, 1908)."

    However if you wiki (how academic) under t-test and multivariate

    "A generalization of Student's t statistic, called Hotelling's T-square statistic, allows for the testing of hypotheses on multiple (often correlated) measures within the same sample. For instance, a researcher might submit a number of subjects to a personality test consisting of multiple personality scales (e.g. the Big Five). Because measures of this type are usually highly correlated, it is not advisable to conduct separate univariate t-tests to test hypotheses, as these would neglect the covariance among measures and inflate the chance of falsely rejecting at least one hypothesis (Type I error). In this case a single multivariate test is preferable for hypothesis testing. Hotelling's T 2 statistic follows a T 2 distribution. However, in practice the distribution is rarely used, and instead converted to an F distribution."

    Now i haven't done anything like that but i think it pertains to what you want to do?
    Sorry i couldn't be of more help! I'm a floundering student too!




  • Ah, so I use an ANOVA... Excellent

    Er...please don't take offense but I am flailing around in the shallow end of the kiddie's pool of statistics and so am not even going to go near that Hotelling T 2 stuff..

    *blub*...*gargle*... :pac:

    I need to get to the BGRH bar to buy you all virtual pints in thanks.




  • taconnol wrote: »
    What happens when my t Stat is a negative?

    Looks like Dango dealt with this. It's not that important for what you need, which is where the t-statistic falls on the t-distribution - i.e. the significane of the difference.
    taconnol wrote: »
    Can I use the t-test to compare more than 2 groups? eg, I compared male/female but I've grouped them into 5 different income categories..so how does that work?

    The best way to do this is to use a 2(m/f) x 5(income category) ANOVA. Excel can't do this but you could probably do the calculations yourself with a calculator and some time... but it's asking a bit much with just 2 weeks to go. Also, if the ANOVA is significant, it eventually boils down to t-tests again, anyway. You could try cutting out the middle man and just doing the multiple t-tests. Your supervisor has a shown, IMO, a casual disregard for statistics so I doubt s/he'd have a problem this approach. :pac: The problem would be that multiple t-tests inflate type I error (the chance of seeing a significant test, even when there is no real difference) to completely untenable levels. Correcting for this (with Bonferroni or some other multiple comparison test) is possible, but will limit your power if you have a small sample size.
    taconnol wrote: »
    Also, what is the correlation/regression you mentioned above?

    This a test that will show the relation between answers to different questions. For example, you might find out that people who answer positively to Q1 tend to answer positively to Q4; or people who answer positively to Q3 tend to answer negatively to Q7. If you have many items that are in a similar theme, it might be useful to present the fact that they elicit similar responses from people.




  • 2Scoops wrote: »
    Your supervisor has a shown, IMO, a casual disregard for statistics so I doubt s/he'd have a problem this approach. :pac:
    Tell me about it. I rang him to ask about what to use instead of a t-test for multiple sets and his response was "move on"...As it stands, I have started doing ANOVA test, which are working out well, so thanks again to you Dango.

    But before that, I literally could only apply gender and would have been left with two results!! The guy is an A+ muppet.
    2Scoops wrote: »
    This a test that will show the relation between answers to different questions. For example, you might find out that people who answer positively to Q1 tend to answer positively to Q4; or people who answer positively to Q3 tend to answer negatively to Q7. If you have many items that are in a similar theme, it might be useful to present the fact that they elicit similar responses from people.
    Ok that would be extremely handy. Using Excel, do I simply use the CORREL function?

    Thanks guys - you are all being an incredible help to me.


  • Advertisement


  • taconnol wrote: »
    As it stands, I have started doing ANOVA test, which are working out well, so thanks again to you Dango.

    Well, remember that a significant ANOVA will still need to be broken down to find out where the difference is, so you'll end up using multiple comparisons anyway. Incidentally, are you analyzing for gender and income with separate ANOVAs in Excel?
    taconnol wrote: »
    Ok that would be extremely handy. Using Excel, do I simply use the CORREL function?

    Not sure about CORREL function - I would use the Data Anlysis menu again and use regression.




  • 2Scoops wrote: »
    Well, remember that a significant ANOVA will still need to be broken down to find out where the difference is, so you'll end up using multiple comparisons anyway. Incidentally, are you analyzing for gender and income with separate ANOVAs in Excel?
    Yes, well I'm looking at the ANOVA and then putting the n, mean and standard deviation into a table so that the descriptive and inferential data combined gives me the result...is that correct?

    I am analyzing the variables gender, income and education on two data sets with separate ANOVAs in Excel, yes..is that incorrect?
    2Scoops wrote: »
    Not sure about CORREL function - I would use the Data Anlysis menu again and use regression.
    Ok great, thanks. I will do so.




  • taconnol wrote: »
    Yes, well I'm looking at the ANOVA and then putting the n, mean and standard deviation into a table so that the descriptive and inferential data combined gives me the result...is that correct?

    The key statistic from the ANOVA will be the F-value and its significance. If it is significant, then you will have your answer for gender, since there are only two groups. However, if the 'income' ANOVA is significant, you won't know what is driving the difference until you breakdown by each of the 5 sub-groups.
    taconnol wrote: »
    I am analyzing the variables gender, income and education on two data sets with separate ANOVAs in Excel, yes..is that incorrect?

    It's probably the best you can do given your situation, but you won't have the ability to detect interactions. For example, income might be a really predictive variable in women only, but the men in your groups could completely drown out the difference when you analyze them together, statistically speaking. Or maybe men and women are completely different in the lowest income group but the same everywhere else - you could potentially lose that information too.

    Not incorrect... more of a limitation to what conclusions you can draw.




  • Yes, I see what you mean. To be honest, I put a lot of effort into the literature review, was really interested in the subject matter etc and am very disappointed to have been tripped up by the analysis. I was hoping to be able to analyse very throughly and do a good job. Oh well..

    On the ANOVA - yes, I'm looking for the significance with the F-value and p-value and then looking to the mean to find out where the significance is.

    Whew, what a steep learning curve. I didn't even know what a p-value was 2 weeks ago..




  • If you have the time, you could download the evaluation copy of SPSS from their website and work on your stats before the license expires. It's not that hard to use and we could help you with your analysis...

    http://www.spss.com/downloads/Papers.cfm?ProductID=00035&Name=SPSS_Base&DLType=Demo




  • Ok downloading..

    Edit: Tis going to take a while..




  • Hi guys,

    Just to let you know I finished the analysis and am powering ahead in the discussion chapter. Thanks for both your! I really would have been stuck without it. I'm wondering what the protocol is for putting boards users in my acknowledgements page :pac:

    Taco




  • Glad we could help. Good luck with the thesis :)


Advertisement