Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Multiple regression question

  • 23-09-2017 11:15am
    #1
    Registered Users, Registered Users 2 Posts: 1,595 ✭✭✭


    I have an intriguing question about how best to use data to make a regression-based prediction.

    Scenario:

    Quantity Y is known to have a linear association with quantities X1 and X2.
    I have 10,000 observations of pairs (X1, Y).
    For 50 of these observations, I also have an observation for X2.

    I want to get as good an estimate of an unknown Y as possible from new observation of X1 and X2.

    I could ignore X2 entirely, use my 10,000 data points as the basis of a regression model for Y on X1 to get an estimate.

    Or, I could ignore 9950 of my observations, and create a regression model for Y on X1 and X2.

    In each case, I seem to be ignoring useful information.

    Any ideas?

    One option I'm considering is getting two separate estimates from simple regression - one using Y on X1 with 10,000 data points, the other using Y on X2 alone based on 50 data points, and combining these by weighting in proportion to the respective squared correlations.


Advertisement