Interpreting A regression

Dancingjebus3 · 27-03-2010 08:27PM #1

Need help interpreting this regression I ran with GDP per capita and life satisfaction,, Having trouble trying to explain whether or not my regression explains the relationship well or not..

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.698074266
R Square 0.487307681
Adjusted R Square 0.468997242
Standard Error 0.509580765
Observations 30

ANOVA
df SS MS F
Regression 1 6.910835103 6.910835103 26.61365226 Significance F:1.7965E-05

Residual 28 7.270831564 0.259672556
Total 29 14.18166667

Coefficients Standard Error t Stat P-value
Intercept 5.89881778 0.253741779 23.24732573 7.57449E-20
Lower 95% Upper 95% Lower 95.0% Upper 95.0%
5.379051316 6.418584245 5.379051316 6.418584245
GDP P.C 4.11988E-05 7.98606E-06 5.158842143 1.7965E-05
Lower 95% Upper 95% Lower 95.0% Upper 95.0%
2.48401E-05 5.75575E-05 2.48401E-05 5.75575E-05

Really sorry about the lay out!!

bnt · 27-03-2010 09:14PM

I'm no statistician, but I do know that the R-Squared value is a basic "goodness of fit" descriptor. A value of 1 would indicate a perfect fit between the data and a regression trendline calculated from it, while 0 would indicate no fit (random data). The reported value (about 0.48) is somewhere in between, which I read as meaning that the data is a partial fit to the regression line, so use caution if interpolating or extrapolating.

TommyGunne · 31-03-2010 01:43AM

Probably need a little more info, and it would be best just to see the entire data set, but I'll try help.

Are you running this in excel? It looks like it.

Firstly, try running it with no intercept. That means that the regression line does not necessarily pass through the point (0,0) (ie that if GDP is 0, everybody has a happiness rating or whatever it is of 5.9). Intuitively, a no intercept case might be better.

Secondly, regarding interpreting the data in front of you:
1) Both the intercept and the GDP coefficient are significant. You can see this by looking at the very low p-values.
2) This output yields an equation of happiness = 5.9 + 0.0000412*GDP. Try inserting that formula in a column and graphing it against your actual data set. See how it looks. This is the line of best fit (apparently) and explains 49% of the variation of your data set (R-Square).
3) You can ignore the ANOVA stuff and the bounds. They're not really of much use to you.

Next, you probably want to refine your regression. If you are just modelling one on the other, the next steps I would do are to remove intercept and try that, check if there is any curve in the data, and check if GDP^2 might add some info to your line of best fit. Add a few other variables, (perhaps GNP might be a good one? It might actually turn out to take precedence over GDP, pollution, neighbours GDP (maybe expect some inverse correlation there, probably not significant though), prevalence of AIDS, child morality, interactions between all of these etc etc) and test their significance. You have to do all this individually. You can either start with all the ones you can think of at the start and remove the least significant at each iteration, or start with a very basic model and keep adding the most significant variable (as long as it is significant). Or you can just mess around for ages and possibly come up with a model that explains the variation very well without getting overly complex. A simple model is usually best though.

But basically, your regression says that there is indeed a relationship between GDP per capita and life satisfaction.

TommyGunne · 31-03-2010 01:35PM

You could also try downloading R. Its a free statistical package, and makes life much easier when you are trying to run lots of analyses.

Excel might be easier and better for you to use though, depends on your needs. If you are planning to do lots of these long term, then getting a stat package is gonna make life much easier, but if not, you are probably going to find that excel is a lot more comfortable.

Grudaire · 31-03-2010 06:11PM

If you really want to use excel there's another function available called logest(), you need to have the statistical package on, and it's an array-formula (On second thoughts, R is a better idea)

Dancingjebus3 · 31-03-2010 06:57PM

Hey thanks for the reply, I attached the file with my data in it, at the bottom in the sheets there is the regressions I ran, The other variables im using are HDI(human development index) and Consumption Share of Real Gross Domestic Product Per Capita, current price. The idea is to figure out which of the 3 indicated a better measure, the problem I have is still trying to understand it as HDI gives an r squared of .70, but its intercept is -6.49 and the t-stat is -3.870. so I am getting myself confused on what to take as the better measure... In my actual thesis I only have include the results for R squared, coefficent, intercept, t-stat and p-value.. Here is a summary of my results:

GDP. per capita, HDI Consumption
R² .48 .70 .22
Coefficient 4.12 14.59 -.044
Intercept 5.89 -6.49 9.58
T-stat 23.24 -3.870 11.0286
P-value 7.57 .00059 1.06

Any help explaining this summary to me really would help, and thanks already for all the advice so far...

TommyGunne · 01-04-2010 12:50AM

Cliste wrote: »

If you really want to use excel there's another function available called logest(), you need to have the statistical package on, and it's an array-formula (On second thoughts, R is a better idea)

Perhaps you are thinking of linest() :P

TommyGunne · 01-04-2010 01:24AM

Dancingjebus3 wrote: »

Hey thanks for the reply, I attached the file with my data in it, at the bottom in the sheets there is the regressions I ran, The other variables im using are HDI(human development index) and Consumption Share of Real Gross Domestic Product Per Capita, current price. The idea is to figure out which of the 3 indicated a better measure, the problem I have is still trying to understand it as HDI gives an r squared of .70, but its intercept is -6.49 and the t-stat is -3.870. so I am getting myself confused on what to take as the better measure... In my actual thesis I only have include the results for R squared, coefficent, intercept, t-stat and p-value.. Here is a summary of my results:

GDP. per capita, HDI Consumption
R² .48 .70 .22
Coefficient 4.12 14.59 -.044
Intercept 5.89 -6.49 9.58
T-stat 23.24 -3.870 11.0286
P-value 7.57 .00059 1.06

Any help explaining this summary to me really would help, and thanks already for all the advice so far...

The p-values are the really important ones. These tell you if that the x actually has a statistically significant relationship with the y for the data given. Your p-values here look really really weird. They should all be less that 1. Also, the bigger the t-stat (in absolute terms), the smaller the p-value, and the more significant the relationship. This doesn't hold with the numbers that you have put up, so I dunno whats going on. With t-stats bigger than 3.87 in all cases, all p-values should be 0.0005 anyway. Having p-values > 1 is certainly nonsensical. Perhaps you left out negative exponents?

You might know this already, but a t-stat of -3.87 here is the exact same as a t-stat of 3.87. Its the absolute value of the t-stat that matters. You're doing a 2-tailed test, so having it in the bottom tail is just as good as having it in the top tail.

Having a negative intercept doesn't really matter. That just means that when HDI is 0, life statisfaction (:p) is negative. Just means you should definitely not look to extrapolate outside the HDI range given here, as there are very very likely other interactions and power terms coming into play.

Your R^2 values kinda tell you which of the terms explains life satisfaction best.

Have you tried regressing all 3 terms on life satisfaction yet? If you're looking to do well in a thesis, I'd imagine you'll have to do that anyway.

Dancingjebus3 · 02-04-2010 05:18PM

Ah ok thats helped a lot... but with the issue of the negative exponents would this be a case of there being limitations to the data used or would it be down to excel not taking these exponents into account. would these exponents be life satisfaction as it would be more susceptible to this, or would it be in gdp and consumption since they are the two data sets that seem to have the high p values??

Grudaire · 02-04-2010 06:01PM

TommyGunne wrote: »

Perhaps you are thinking of linest() :P

depends what you want it for, i'm a fan of regressions that have a bit of a curve myself...

Time Magazine · 04-04-2010 09:50PM

TommyGunne wrote: »

Your R^2 values kinda tell you which of the terms explains life satisfaction best.

The standard practice in economics is to look at the t to see which has the most significant effect, then look at the coefficient to see which has the "biggest" effect. R^2 is a poor measure of a model.

Have you tried regressing all 3 terms on life satisfaction yet? If you're looking to do well in a thesis, I'd imagine you'll have to do that anyway.

Massive problems of multicollinearity there.

Interpreting A regression

Comments