Econometrics projects

Time Magazine · 20-02-2007 05:42PM #1

Most undergrads, like myself, have to complete a project to pass any econometric modules they may have.

This thread is for those of us who want to boast (or cry) about our results

.

In technical lingo, me and my class mates (third year Economics in TCD) have to complete a multivariate regression. To explain this to non-econometricians, we have to try and predict one variable (say inflation) as a funciton of at least two other varaibles (say GDP growth and interest rate).

My project is trying to track the BP share price (as measured on the NASDAQ) since the '80s. Because of availability of data and so on, I'm using montly data from May 1981 until November 2006. This returns a fairly impressive 307 observations.

My explanatory variables are the price of oil; the US Fed Funds' Rate (that's the interest rate) and the NASDAQ Industrial Average (that's basically how the average of NASDAQ shares are doing this month). I might implement a dummy variable of war in Iraq (with yes=1 and no=0), but I might not be arsed since I'm fairly confident of at least a II.1 from this project and I've exams in three weeks.

The basic theory behind the project is that

a rise in the price of oil is good for BP's share price - positively related
a rise in the interest rate is bad for BP's share price - negatively related
a rise in the NASDAQ suggests a strong economy - positively related

So without further ado:

The blue is what actually happened, whereas the green line is what my formula predicted given the the price of oil, the interest rate, (and because of data-mining, the log of) the NASDAQ Industrial Average.

Not bad, eh?

Using Microfit, some statistical stuff:

                  Estimated Correlation Matrix of Variables                   
                                                                               
*********************************************************************
                 Y         X1       LX2        X3                              
 Y             1.0000    .48956    .91726   -.59064                            
                                                                               
 X1            .48956    1.0000    .25682  .3249E-3                            
                                                                               
 LX2           .91726    .25682    1.0000   -.74337                            
                                                                               
 X3           -.59064  .3249E-3   -.74337    1.0000                            
                                                                               
*********************************************************************

                       Ordinary Least Squares Estimation                       
*******************************************************************************
 Dependent variable is Y                                                       
 306 observations used for estimation from 1981M6  to 2006M11                  
*******************************************************************************
 Regressor              Coefficient       Standard Error         T-Ratio[Prob] 
 C                       -162.4481             5.7398           -28.3021[.000] 
 X1                         .45046            .032602            13.8169[.000] 
 LX2                       26.6999             .78879            33.8491[.000] 
 X3                         .59924             .16452             3.6424[.000] 
*******************************************************************************
 R-Squared                     .91421   R-Bar-Squared                   .91335 
 S.E. of Regression            5.8931   F-stat.    F(  3, 302)    1072.7[.000] 
 Mean of Dependent Variable   28.1177   S.D. of Dependent Variable     20.0200 
 Residual Sum of Squares      10487.9   Equation Log-likelihood      -974.9568 
 Akaike Info. Criterion     -978.9568   Schwarz Bayesian Criterion   -986.4040 
 DW-statistic                  .19466                                          
*******************************************************************************
                                                                               
                                                                               
                               Diagnostic Tests                                
*******************************************************************************
*    Test Statistics  *        LM Version        *         F Version          *
*******************************************************************************
*                     *                          *                            *
* A:Serial Correlation*CHSQ(  12)= 251.2654[.000]*F(  12, 290)= 110.9398[.000]*
*                     *                          *                            *
* B:Functional Form   *CHSQ(   1)=  20.8923[.000]*F(   1, 301)=  22.0569[.000]*
*                     *                          *                            *
* C:Normality         *CHSQ(   2)=   3.1191[.210]*       Not applicable       *
*                     *                          *                            *
* D:Heteroscedasticity*CHSQ(   1)=   .41745[.518]*F(   1, 304)=   .41529[.520]*
*******************************************************************************
   A:Lagrange multiplier test of residual serial correlation                   
   B:Ramsey's RESET test using the square of the fitted values                 
   C:Based on a test of skewness and kurtosis of residuals                     
   D:Based on the regression of squared residuals on squared fitted values

Now I have a problem with multicollinearity, my Durbin-Watson tests clocks in under 0.2, when ideally it should come in at 2.0

. I'm fairly certain this is because of the high level of correlation between the NASDAQ and the interest rate (-.743). However, we can deal with that later

.

So, any comments?

Anyone else willing to throw up their results?

22-02-2007 09:58PM

I notice that you are using a Log measure for X2, are you measuring the elasticity of the variable? Are you able to use logarithms for just one variable and leave the rest unadjusted?

Also, have you factored inflation into the model? Over twenty six years the real value of money might be quite different from the nominal one.

Finally, how are you measuring interest? Are you basing it at 1980? I found that by throwing in the CPI into my model, I perfected my results with an r square of .98, a D-W of 2.01 and perfect distriburition.

However, there I don't really believe there is a link between international tourism arrivals and the CPI, but rather the gradual increments of the data fooled the equation into thinking there was!

22-02-2007 10:34PM

You are also bordering on a problem with Heteroskedascity. But I have serious problems with that, I think most people do

Time Magazine · 26-02-2007 03:38PM

original_psycho wrote:

I notice that you are using a Log measure for X2, are you measuring the elasticity of the variable? Are you able to use logarithms for just one variable and leave the rest unadjusted?

My data-mining told me to do it.

After perusing the 2005 edition of the wonderful [plug]Student Economic Review[/plug], I changed my method of regression from the standard OLS format to the Cochrane-Orcutt method because it accounts for serial correlation over time.

My new r-squared is a whopping .99360 with my D-W coming in at 1.9939, almost perfect results. Whoot. As per advice of my tutor, I changed the specific formula to
Y = c + Log(X1) + Log(X2) + X3
Log(BP Share Price) = intercept + Log(Price of Oil) + Log(NASDAQ industrial average) + Interest rate.

26-02-2007 03:47PM

Well done, apart from the Datamining - thats just scummy!!

Got my project written up - 3200 words over 15 pages!

Just want to validate my findings, and then I can start to trim it down a bit

Absolut · 03-03-2007 01:06AM

I've just started an econometrics project on house prices in the Boston area, using this dataset by Harrison and Rubinfeld.

I'm using Stata to work with the data, regressing the median value of houses on 12 variables: crime, zoning, local industry, proximity to the Charles River, pollution levels (Nitric Oxide concentration), average number of rooms per house, proportion of houses built pre 1940, weighted distance to employment centres, index of accessibility to radial highways, property tax rate, pupil-teacher ratio, percentage of lower status population and proportion of black people in each area.

The latitude and longitude is also supplied, but I can't see how I could use this in any kind of regression, so I'm just ignoring it (or am I overlooking some useful application?).

Here are my initial results, regressing median value on all the variables listed above:


   [FONT="Courier New"]   
[SIZE="2"]      Source |       SS       df       MS              Number of obs =     506
-------------+------------------------------           F( 12,   493) =  114.25
       Model |  31418.5407    12  2618.21172           Prob > F      =  0.0000
    Residual |   11297.755   493  22.9163386           R-squared     =  0.7355
-------------+------------------------------           Adj R-squared =  0.7291
       Total |  42716.2956   505   84.586724           Root MSE      =  4.7871

------------------------------------------------------------------------------
        medv |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       crime |  -.1131391    .033113    -3.42   0.001    -.1781991   -.0480791
        zone |   .0470525   .0138469     3.40   0.001     .0198463    .0742586
       indus |   .0403115   .0617074     0.65   0.514    -.0809305    .1615535
         nox |    -17.367   3.851224    -4.51   0.000    -24.93384   -9.800166
       rooms |   3.850491    .421402     9.14   0.000     3.022526    4.678457
         age |   .0027838    .013309     0.21   0.834    -.0233655     .028933
        dist |  -1.485374   .2011868    -7.38   0.000    -1.880663   -1.090085
         rad |    .328311   .0665423     4.93   0.000     .1975695    .4590526
         tax |  -.0137558   .0037657    -3.65   0.000    -.0211546    -.006357
     ptratio |  -.9909581   .1313991    -7.54   0.000    -1.249129   -.7327868
       black |   .0097415   .0027061     3.60   0.000     .0044246    .0150583
       lstat |  -.5341576   .0510716   -10.46   0.000    -.6345025   -.4338128
       _cons |   36.89196   5.146516     7.17   0.000     26.78015    47.00377
------------------------------------------------------------------------------[/SIZE][/FONT]

And then, a similar regression, but using the log of the median value, distance to employment centres, accessibility to highways and proportion of lower class, along with the square of the amount of rooms in each house:

Linear regression                                      Number of obs =     506
                                                       F( 12,   493) =  167.30
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.8027
                                                       Root MSE      =  .18375

------------------------------------------------------------------------------
             |               Robust
       lmedv |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       crime |  -.0120297   .0023321    -5.16   0.000    -.0166117   -.0074476
        zone |  -.0000506   .0003743    -0.14   0.893     -.000786    .0006849
       indus |   .0009113   .0017306     0.53   0.599     -.002489    .0043116
         nox |  -.8286227   .1619617    -5.12   0.000    -1.146843   -.5104023
      rooms2 |   .0065145   .0019371     3.36   0.001     .0027085    .0103206
         age |   .0003061    .000596     0.51   0.608    -.0008649     .001477
       ldist |    -.20047   .0412498    -4.86   0.000    -.2815171   -.1194229
        lrad |   .1027221   .0193889     5.30   0.000     .0646271    .1408172
         tax |  -.0004526   .0001099    -4.12   0.000    -.0006685   -.0002367
     ptratio |  -.0312132   .0038667    -8.07   0.000    -.0388103    -.023616
       black |   .0003787   .0001465     2.58   0.010     .0000908    .0006667
      llstat |  -.3771233   .0379444    -9.94   0.000     -.451676   -.3025706
       _cons |   4.812734   .2323958    20.71   0.000     4.356125    5.269342
------------------------------------------------------------------------------

That's only after a few mins of work, so I'm fairly sure it's not a very good model. I used the log and square transformations after testing for normality of residuals, but I'm still not sure if it's the right approach.

So, basically I've got a lot of work left to do...

fobster · 07-03-2007 06:53PM

I've only started econometrics since early february so I might completely off, but in relation to the BP regression model.

If BP's share price goes up this in part is explained by a rise in the NASDAQ industrial average, among other factors as mentioned.

This is my question, would simultaneous causality not exist whereby a rise in the share price of BP causes a rise in the NASDAQ?

Time Magazine · 08-03-2007 11:40AM

fobster wrote:

I've only started econometrics since early february so I might completely off, but in relation to the BP regression model.

If BP's share price goes up this in part is explained by a rise in the NASDAQ industrial average, among other factors as mentioned.

This is my question, would simultaneous causality not exist whereby a rise in the share price of BP causes a rise in the NASDAQ?

No and yes.

No because BP is traded on the NYSE so they're not directly dependent on each other.

Yes because traders look beyond which exchange their shares are being sold on. By this I mean that if the BP share price falls by 10%, everyone gets a little bit scared, so the NASDAQ might fall 1% (obviously the figures aren't accurate, they're just illustrative). In matrix algebra this is a problem of linear dependence where the matrix of coefficients does not have full rank (I think

). In econometrics lingo, you're right, there's simultaneous causality.

A similar problem exists with the relationship between the interest rate and the NASDAQ (obviously related). So yes, the model is theoretically a bit dodgy. However, imho, the level of dodginess is small relative to the importance that the general stock market buoyancy has on BP (market up -> production and consumption up -> demand for oil up). The theoretical strength of the model is ditched a little bit for accurate results (and they are pretty good).

We'll see what my lecturer thinks

.

27-03-2007 03:15PM

For anyone who is interested, my econometrics project is here.

Other (and better) projects can be found at the Student Economic Review website.

fobster · 27-03-2007 08:50PM

I have a question, would there be a correlation between the number of terrorist attacks and the seasons? Or did you find they were spread out over the year and not concentrated in any one season?

Do you think all terrorists seek the same level of desired disruption, deaths etc. and therefore times in the year with high levels of people, summer for example, would experience higher incidence of terrorist attacks, yes/no?

What effect, if any, would this have on the regression?

27-03-2007 10:43PM

ahh crap... real questions!

Fobster, I was expecting terrorist attacks to happen mostly in the summer when the tourist numbers are highgher (purpose of terror being to terrorise and all that!), however quite a few took place in the first quarter (Jan, Feb, March)

What the model fails to capture is the type of terrorism:
For example, in the '80s the IRA would 'normally' call in a phone warning whereas the 7/7 attacks were a suprise.

This will effect peoples perceptions, and thus have a different effect.

However, I did not have enough data or econometric skill to put this in the model.

fobster · 28-03-2007 01:18PM

Yeah I was thinking about the motives behind terrorist attacks, the IRA weren't driven with the purpose of causing human suffering they were more directed at the political establishment etc. while the opposite might be true for the more recent attacks.

So people would just accept the IRA attacks in the 80s but nowadays the attacks would, as you say, have a different effect on peoples' decisions to travel.

03-04-2007 09:52AM

Ibid, where is your project???

10-04-2007 05:45PM

Well Mr. Moderator?

(Offer open to all other Econometric Students as well)

Econometrics projects

Comments