Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Interpreting A regression

  • 27-03-2010 7:27pm
    #1
    Closed Accounts Posts: 59 ✭✭


    Need help interpreting this regression I ran with GDP per capita and life satisfaction,, Having trouble trying to explain whether or not my regression explains the relationship well or not..


    SUMMARY OUTPUT

    Regression Statistics
    Multiple R 0.698074266
    R Square 0.487307681
    Adjusted R Square 0.468997242
    Standard Error 0.509580765
    Observations 30

    ANOVA
    df SS MS F
    Regression 1 6.910835103 6.910835103 26.61365226 Significance F:1.7965E-05

    Residual 28 7.270831564 0.259672556
    Total 29 14.18166667

    Coefficients Standard Error t Stat P-value

    Intercept 5.89881778 0.253741779 23.24732573 7.57449E-20
    Lower 95% Upper 95% Lower 95.0% Upper 95.0%
    5.379051316 6.418584245 5.379051316 6.418584245
    GDP P.C 4.11988E-05 7.98606E-06 5.158842143 1.7965E-05
    Lower 95% Upper 95% Lower 95.0% Upper 95.0%
    2.48401E-05 5.75575E-05 2.48401E-05 5.75575E-05



    Really sorry about the lay out!!


Comments

  • Registered Users, Registered Users 2 Posts: 13,076 ✭✭✭✭bnt


    I'm no statistician, but I do know that the R-Squared value is a basic "goodness of fit" descriptor. A value of 1 would indicate a perfect fit between the data and a regression trendline calculated from it, while 0 would indicate no fit (random data). The reported value (about 0.48) is somewhere in between, which I read as meaning that the data is a partial fit to the regression line, so use caution if interpolating or extrapolating.

    You are the type of what the age is searching for, and what it is afraid it has found. I am so glad that you have never done anything, never carved a statue, or painted a picture, or produced anything outside of yourself! Life has been your art. You have set yourself to music. Your days are your sonnets.

    ―Oscar Wilde predicting Social Media, in The Picture of Dorian Gray



  • Closed Accounts Posts: 2,771 ✭✭✭TommyGunne


    Probably need a little more info, and it would be best just to see the entire data set, but I'll try help.

    Are you running this in excel? It looks like it.

    Firstly, try running it with no intercept. That means that the regression line does not necessarily pass through the point (0,0) (ie that if GDP is 0, everybody has a happiness rating or whatever it is of 5.9). Intuitively, a no intercept case might be better.

    Secondly, regarding interpreting the data in front of you:
    1) Both the intercept and the GDP coefficient are significant. You can see this by looking at the very low p-values.
    2) This output yields an equation of happiness = 5.9 + 0.0000412*GDP. Try inserting that formula in a column and graphing it against your actual data set. See how it looks. This is the line of best fit (apparently) and explains 49% of the variation of your data set (R-Square).
    3) You can ignore the ANOVA stuff and the bounds. They're not really of much use to you.

    Next, you probably want to refine your regression. If you are just modelling one on the other, the next steps I would do are to remove intercept and try that, check if there is any curve in the data, and check if GDP^2 might add some info to your line of best fit. Add a few other variables, (perhaps GNP might be a good one? It might actually turn out to take precedence over GDP, pollution, neighbours GDP (maybe expect some inverse correlation there, probably not significant though), prevalence of AIDS, child morality, interactions between all of these etc etc) and test their significance. You have to do all this individually. You can either start with all the ones you can think of at the start and remove the least significant at each iteration, or start with a very basic model and keep adding the most significant variable (as long as it is significant). Or you can just mess around for ages and possibly come up with a model that explains the variation very well without getting overly complex. A simple model is usually best though.

    But basically, your regression says that there is indeed a relationship between GDP per capita and life satisfaction.


  • Closed Accounts Posts: 2,771 ✭✭✭TommyGunne


    You could also try downloading R. Its a free statistical package, and makes life much easier when you are trying to run lots of analyses.

    Excel might be easier and better for you to use though, depends on your needs. If you are planning to do lots of these long term, then getting a stat package is gonna make life much easier, but if not, you are probably going to find that excel is a lot more comfortable.


  • Registered Users, Registered Users 2 Posts: 3,620 ✭✭✭Grudaire


    If you really want to use excel there's another function available called logest(), you need to have the statistical package on, and it's an array-formula (On second thoughts, R is a better idea)


  • Closed Accounts Posts: 59 ✭✭Dancingjebus3


    Hey thanks for the reply, I attached the file with my data in it, at the bottom in the sheets there is the regressions I ran, The other variables im using are HDI(human development index) and Consumption Share of Real Gross Domestic Product Per Capita, current price. The idea is to figure out which of the 3 indicated a better measure, the problem I have is still trying to understand it as HDI gives an r squared of .70, but its intercept is -6.49 and the t-stat is -3.870. so I am getting myself confused on what to take as the better measure... In my actual thesis I only have include the results for R squared, coefficent, intercept, t-stat and p-value.. Here is a summary of my results:

    GDP. per capita, HDI Consumption
    R² .48 .70 .22
    Coefficient 4.12 14.59 -.044
    Intercept 5.89 -6.49 9.58
    T-stat 23.24 -3.870 11.0286
    P-value 7.57 .00059 1.06

    Any help explaining this summary to me really would help, and thanks already for all the advice so far...


  • Advertisement
  • Closed Accounts Posts: 2,771 ✭✭✭TommyGunne


    Cliste wrote: »
    If you really want to use excel there's another function available called logest(), you need to have the statistical package on, and it's an array-formula (On second thoughts, R is a better idea)

    Perhaps you are thinking of linest() :P


  • Closed Accounts Posts: 2,771 ✭✭✭TommyGunne


    Hey thanks for the reply, I attached the file with my data in it, at the bottom in the sheets there is the regressions I ran, The other variables im using are HDI(human development index) and Consumption Share of Real Gross Domestic Product Per Capita, current price. The idea is to figure out which of the 3 indicated a better measure, the problem I have is still trying to understand it as HDI gives an r squared of .70, but its intercept is -6.49 and the t-stat is -3.870. so I am getting myself confused on what to take as the better measure... In my actual thesis I only have include the results for R squared, coefficent, intercept, t-stat and p-value.. Here is a summary of my results:

    GDP. per capita, HDI Consumption
    R² .48 .70 .22
    Coefficient 4.12 14.59 -.044
    Intercept 5.89 -6.49 9.58
    T-stat 23.24 -3.870 11.0286
    P-value 7.57 .00059 1.06

    Any help explaining this summary to me really would help, and thanks already for all the advice so far...

    The p-values are the really important ones. These tell you if that the x actually has a statistically significant relationship with the y for the data given. Your p-values here look really really weird. They should all be less that 1. Also, the bigger the t-stat (in absolute terms), the smaller the p-value, and the more significant the relationship. This doesn't hold with the numbers that you have put up, so I dunno whats going on. With t-stats bigger than 3.87 in all cases, all p-values should be 0.0005 anyway. Having p-values > 1 is certainly nonsensical. Perhaps you left out negative exponents?

    You might know this already, but a t-stat of -3.87 here is the exact same as a t-stat of 3.87. Its the absolute value of the t-stat that matters. You're doing a 2-tailed test, so having it in the bottom tail is just as good as having it in the top tail.

    Having a negative intercept doesn't really matter. That just means that when HDI is 0, life statisfaction (:p) is negative. Just means you should definitely not look to extrapolate outside the HDI range given here, as there are very very likely other interactions and power terms coming into play.

    Your R^2 values kinda tell you which of the terms explains life satisfaction best.

    Have you tried regressing all 3 terms on life satisfaction yet? If you're looking to do well in a thesis, I'd imagine you'll have to do that anyway.


  • Closed Accounts Posts: 59 ✭✭Dancingjebus3


    Ah ok thats helped a lot... but with the issue of the negative exponents would this be a case of there being limitations to the data used or would it be down to excel not taking these exponents into account. would these exponents be life satisfaction as it would be more susceptible to this, or would it be in gdp and consumption since they are the two data sets that seem to have the high p values??


  • Registered Users, Registered Users 2 Posts: 3,620 ✭✭✭Grudaire


    TommyGunne wrote: »
    Perhaps you are thinking of linest() :P

    depends what you want it for, i'm a fan of regressions that have a bit of a curve myself...


  • Registered Users, Registered Users 2 Posts: 8,452 ✭✭✭Time Magazine


    TommyGunne wrote: »
    Your R^2 values kinda tell you which of the terms explains life satisfaction best.
    The standard practice in economics is to look at the t to see which has the most significant effect, then look at the coefficient to see which has the "biggest" effect. R^2 is a poor measure of a model.
    Have you tried regressing all 3 terms on life satisfaction yet? If you're looking to do well in a thesis, I'd imagine you'll have to do that anyway.
    Massive problems of multicollinearity there.


  • Advertisement
Advertisement