Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Econometrics projects

Options
  • 20-02-2007 5:42pm
    #1
    Registered Users Posts: 8,452 ✭✭✭


    Most undergrads, like myself, have to complete a project to pass any econometric modules they may have.

    This thread is for those of us who want to boast (or cry) about our results :).

    In technical lingo, me and my class mates (third year Economics in TCD) have to complete a multivariate regression. To explain this to non-econometricians, we have to try and predict one variable (say inflation) as a funciton of at least two other varaibles (say GDP growth and interest rate).

    My project is trying to track the BP share price (as measured on the NASDAQ) since the '80s. Because of availability of data and so on, I'm using montly data from May 1981 until November 2006. This returns a fairly impressive 307 observations.

    My explanatory variables are the price of oil; the US Fed Funds' Rate (that's the interest rate) and the NASDAQ Industrial Average (that's basically how the average of NASDAQ shares are doing this month). I might implement a dummy variable of war in Iraq (with yes=1 and no=0), but I might not be arsed since I'm fairly confident of at least a II.1 from this project and I've exams in three weeks.

    The basic theory behind the project is that
    1. a rise in the price of oil is good for BP's share price - positively related
    2. a rise in the interest rate is bad for BP's share price - negatively related
    3. a rise in the NASDAQ suggests a strong economy - positively related

    So without further ado:

    shareprice.GIF

    The blue is what actually happened, whereas the green line is what my formula predicted given the the price of oil, the interest rate, (and because of data-mining, the log of) the NASDAQ Industrial Average.

    Not bad, eh?

    Using Microfit, some statistical stuff:
                      Estimated Correlation Matrix of Variables                   
                                                                                   
    *********************************************************************
                     Y         X1       LX2        X3                              
     Y             1.0000    .48956    .91726   -.59064                            
                                                                                   
     X1            .48956    1.0000    .25682  .3249E-3                            
                                                                                   
     LX2           .91726    .25682    1.0000   -.74337                            
                                                                                   
     X3           -.59064  .3249E-3   -.74337    1.0000                            
                                                                                   
    *********************************************************************
    
                           Ordinary Least Squares Estimation                       
    *******************************************************************************
     Dependent variable is Y                                                       
     306 observations used for estimation from 1981M6  to 2006M11                  
    *******************************************************************************
     Regressor              Coefficient       Standard Error         T-Ratio[Prob] 
     C                       -162.4481             5.7398           -28.3021[.000] 
     X1                         .45046            .032602            13.8169[.000] 
     LX2                       26.6999             .78879            33.8491[.000] 
     X3                         .59924             .16452             3.6424[.000] 
    *******************************************************************************
     R-Squared                     .91421   R-Bar-Squared                   .91335 
     S.E. of Regression            5.8931   F-stat.    F(  3, 302)    1072.7[.000] 
     Mean of Dependent Variable   28.1177   S.D. of Dependent Variable     20.0200 
     Residual Sum of Squares      10487.9   Equation Log-likelihood      -974.9568 
     Akaike Info. Criterion     -978.9568   Schwarz Bayesian Criterion   -986.4040 
     DW-statistic                  .19466                                          
    *******************************************************************************
                                                                                   
                                                                                   
                                   Diagnostic Tests                                
    *******************************************************************************
    *    Test Statistics  *        LM Version        *         F Version          *
    *******************************************************************************
    *                     *                          *                            *
    * A:Serial Correlation*CHSQ(  12)= 251.2654[.000]*F(  12, 290)= 110.9398[.000]*
    *                     *                          *                            *
    * B:Functional Form   *CHSQ(   1)=  20.8923[.000]*F(   1, 301)=  22.0569[.000]*
    *                     *                          *                            *
    * C:Normality         *CHSQ(   2)=   3.1191[.210]*       Not applicable       *
    *                     *                          *                            *
    * D:Heteroscedasticity*CHSQ(   1)=   .41745[.518]*F(   1, 304)=   .41529[.520]*
    *******************************************************************************
       A:Lagrange multiplier test of residual serial correlation                   
       B:Ramsey's RESET test using the square of the fitted values                 
       C:Based on a test of skewness and kurtosis of residuals                     
       D:Based on the regression of squared residuals on squared fitted values     
    
    
    

    Now I have a problem with multicollinearity, my Durbin-Watson tests clocks in under 0.2, when ideally it should come in at 2.0 :(. I'm fairly certain this is because of the high level of correlation between the NASDAQ and the interest rate (-.743). However, we can deal with that later ;).

    So, any comments?

    Anyone else willing to throw up their results?


Comments

  • Posts: 5,589 ✭✭✭ [Deleted User]


    I notice that you are using a Log measure for X2, are you measuring the elasticity of the variable? Are you able to use logarithms for just one variable and leave the rest unadjusted?

    Also, have you factored inflation into the model? Over twenty six years the real value of money might be quite different from the nominal one.

    Finally, how are you measuring interest? Are you basing it at 1980? I found that by throwing in the CPI into my model, I perfected my results with an r square of .98, a D-W of 2.01 and perfect distriburition.

    However, there I don't really believe there is a link between international tourism arrivals and the CPI, but rather the gradual increments of the data fooled the equation into thinking there was!


  • Posts: 5,589 ✭✭✭ [Deleted User]


    You are also bordering on a problem with Heteroskedascity. But I have serious problems with that, I think most people do


  • Registered Users Posts: 8,452 ✭✭✭Time Magazine


    I notice that you are using a Log measure for X2, are you measuring the elasticity of the variable? Are you able to use logarithms for just one variable and leave the rest unadjusted?
    My data-mining told me to do it.

    After perusing the 2005 edition of the wonderful [plug]Student Economic Review[/plug], I changed my method of regression from the standard OLS format to the Cochrane-Orcutt method because it accounts for serial correlation over time.

    My new r-squared is a whopping .99360 with my D-W coming in at 1.9939, almost perfect results. Whoot. As per advice of my tutor, I changed the specific formula to
    Y = c + Log(X1) + Log(X2) + X3
    Log(BP Share Price) = intercept + Log(Price of Oil) + Log(NASDAQ industrial average) + Interest rate.


  • Posts: 5,589 ✭✭✭ [Deleted User]


    Well done, apart from the Datamining - thats just scummy!!

    Got my project written up - 3200 words over 15 pages!

    Just want to validate my findings, and then I can start to trim it down a bit


  • Closed Accounts Posts: 208 ✭✭Absolut


    I've just started an econometrics project on house prices in the Boston area, using this dataset by Harrison and Rubinfeld.

    I'm using Stata to work with the data, regressing the median value of houses on 12 variables: crime, zoning, local industry, proximity to the Charles River, pollution levels (Nitric Oxide concentration), average number of rooms per house, proportion of houses built pre 1940, weighted distance to employment centres, index of accessibility to radial highways, property tax rate, pupil-teacher ratio, percentage of lower status population and proportion of black people in each area.

    The latitude and longitude is also supplied, but I can't see how I could use this in any kind of regression, so I'm just ignoring it (or am I overlooking some useful application?).

    Here are my initial results, regressing median value on all the variables listed above:
    
       [FONT="Courier New"]   
    [SIZE="2"]      Source |       SS       df       MS              Number of obs =     506
    -------------+------------------------------           F( 12,   493) =  114.25
           Model |  31418.5407    12  2618.21172           Prob > F      =  0.0000
        Residual |   11297.755   493  22.9163386           R-squared     =  0.7355
    -------------+------------------------------           Adj R-squared =  0.7291
           Total |  42716.2956   505   84.586724           Root MSE      =  4.7871
    
    ------------------------------------------------------------------------------
            medv |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           crime |  -.1131391    .033113    -3.42   0.001    -.1781991   -.0480791
            zone |   .0470525   .0138469     3.40   0.001     .0198463    .0742586
           indus |   .0403115   .0617074     0.65   0.514    -.0809305    .1615535
             nox |    -17.367   3.851224    -4.51   0.000    -24.93384   -9.800166
           rooms |   3.850491    .421402     9.14   0.000     3.022526    4.678457
             age |   .0027838    .013309     0.21   0.834    -.0233655     .028933
            dist |  -1.485374   .2011868    -7.38   0.000    -1.880663   -1.090085
             rad |    .328311   .0665423     4.93   0.000     .1975695    .4590526
             tax |  -.0137558   .0037657    -3.65   0.000    -.0211546    -.006357
         ptratio |  -.9909581   .1313991    -7.54   0.000    -1.249129   -.7327868
           black |   .0097415   .0027061     3.60   0.000     .0044246    .0150583
           lstat |  -.5341576   .0510716   -10.46   0.000    -.6345025   -.4338128
           _cons |   36.89196   5.146516     7.17   0.000     26.78015    47.00377
    ------------------------------------------------------------------------------[/SIZE][/FONT]
    

    And then, a similar regression, but using the log of the median value, distance to employment centres, accessibility to highways and proportion of lower class, along with the square of the amount of rooms in each house:
    Linear regression                                      Number of obs =     506
                                                           F( 12,   493) =  167.30
                                                           Prob > F      =  0.0000
                                                           R-squared     =  0.8027
                                                           Root MSE      =  .18375
    
    ------------------------------------------------------------------------------
                 |               Robust
           lmedv |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           crime |  -.0120297   .0023321    -5.16   0.000    -.0166117   -.0074476
            zone |  -.0000506   .0003743    -0.14   0.893     -.000786    .0006849
           indus |   .0009113   .0017306     0.53   0.599     -.002489    .0043116
             nox |  -.8286227   .1619617    -5.12   0.000    -1.146843   -.5104023
          rooms2 |   .0065145   .0019371     3.36   0.001     .0027085    .0103206
             age |   .0003061    .000596     0.51   0.608    -.0008649     .001477
           ldist |    -.20047   .0412498    -4.86   0.000    -.2815171   -.1194229
            lrad |   .1027221   .0193889     5.30   0.000     .0646271    .1408172
             tax |  -.0004526   .0001099    -4.12   0.000    -.0006685   -.0002367
         ptratio |  -.0312132   .0038667    -8.07   0.000    -.0388103    -.023616
           black |   .0003787   .0001465     2.58   0.010     .0000908    .0006667
          llstat |  -.3771233   .0379444    -9.94   0.000     -.451676   -.3025706
           _cons |   4.812734   .2323958    20.71   0.000     4.356125    5.269342
    ------------------------------------------------------------------------------
    

    That's only after a few mins of work, so I'm fairly sure it's not a very good model. I used the log and square transformations after testing for normality of residuals, but I'm still not sure if it's the right approach.

    So, basically I've got a lot of work left to do...


  • Advertisement
  • Registered Users Posts: 947 ✭✭✭fobster


    I've only started econometrics since early february so I might completely off, but in relation to the BP regression model.

    If BP's share price goes up this in part is explained by a rise in the NASDAQ industrial average, among other factors as mentioned.

    This is my question, would simultaneous causality not exist whereby a rise in the share price of BP causes a rise in the NASDAQ?


  • Registered Users Posts: 8,452 ✭✭✭Time Magazine


    fobster wrote:
    I've only started econometrics since early february so I might completely off, but in relation to the BP regression model.

    If BP's share price goes up this in part is explained by a rise in the NASDAQ industrial average, among other factors as mentioned.

    This is my question, would simultaneous causality not exist whereby a rise in the share price of BP causes a rise in the NASDAQ?
    No and yes.

    No because BP is traded on the NYSE so they're not directly dependent on each other.

    Yes because traders look beyond which exchange their shares are being sold on. By this I mean that if the BP share price falls by 10%, everyone gets a little bit scared, so the NASDAQ might fall 1% (obviously the figures aren't accurate, they're just illustrative). In matrix algebra this is a problem of linear dependence where the matrix of coefficients does not have full rank (I think :o). In econometrics lingo, you're right, there's simultaneous causality.

    A similar problem exists with the relationship between the interest rate and the NASDAQ (obviously related). So yes, the model is theoretically a bit dodgy. However, imho, the level of dodginess is small relative to the importance that the general stock market buoyancy has on BP (market up -> production and consumption up -> demand for oil up). The theoretical strength of the model is ditched a little bit for accurate results (and they are pretty good).

    We'll see what my lecturer thinks :D.


  • Posts: 5,589 ✭✭✭ [Deleted User]


    For anyone who is interested, my econometrics project is here.

    Other (and better) projects can be found at the Student Economic Review website.


  • Registered Users Posts: 947 ✭✭✭fobster


    I have a question, would there be a correlation between the number of terrorist attacks and the seasons? Or did you find they were spread out over the year and not concentrated in any one season?

    Do you think all terrorists seek the same level of desired disruption, deaths etc. and therefore times in the year with high levels of people, summer for example, would experience higher incidence of terrorist attacks, yes/no?

    What effect, if any, would this have on the regression?


  • Posts: 5,589 ✭✭✭ [Deleted User]


    ahh crap... real questions!

    Fobster, I was expecting terrorist attacks to happen mostly in the summer when the tourist numbers are highgher (purpose of terror being to terrorise and all that!), however quite a few took place in the first quarter (Jan, Feb, March)

    What the model fails to capture is the type of terrorism:
    For example, in the '80s the IRA would 'normally' call in a phone warning whereas the 7/7 attacks were a suprise.

    This will effect peoples perceptions, and thus have a different effect.

    However, I did not have enough data or econometric skill to put this in the model.


  • Advertisement
  • Registered Users Posts: 947 ✭✭✭fobster


    Yeah I was thinking about the motives behind terrorist attacks, the IRA weren't driven with the purpose of causing human suffering they were more directed at the political establishment etc. while the opposite might be true for the more recent attacks.

    So people would just accept the IRA attacks in the 80s but nowadays the attacks would, as you say, have a different effect on peoples' decisions to travel.


  • Posts: 5,589 ✭✭✭ [Deleted User]


    Ibid, where is your project???


  • Posts: 5,589 ✭✭✭ [Deleted User]


    Well Mr. Moderator?

    (Offer open to all other Econometric Students as well)


Advertisement