Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

Multiple regression question

  • 23-09-2017 12:15PM
    #1
    Registered Users, Registered Users 2 Posts: 1,595 ✭✭✭


    I have an intriguing question about how best to use data to make a regression-based prediction.

    Scenario:

    Quantity Y is known to have a linear association with quantities X1 and X2.
    I have 10,000 observations of pairs (X1, Y).
    For 50 of these observations, I also have an observation for X2.

    I want to get as good an estimate of an unknown Y as possible from new observation of X1 and X2.

    I could ignore X2 entirely, use my 10,000 data points as the basis of a regression model for Y on X1 to get an estimate.

    Or, I could ignore 9950 of my observations, and create a regression model for Y on X1 and X2.

    In each case, I seem to be ignoring useful information.

    Any ideas?

    One option I'm considering is getting two separate estimates from simple regression - one using Y on X1 with 10,000 data points, the other using Y on X2 alone based on 50 data points, and combining these by weighting in proportion to the respective squared correlations.


Advertisement