Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.

https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

Multiple regression question

MathsManiac · 2017-09-23 11:15:09

I have an intriguing question about how best to use data to make a regression-based prediction. Scenario: Quantity Y is known to have a linear association with quantities X1 and X2. I have 10,000 observations of pairs (X1, Y). For 50 of these observations, I also have an observation for X2. I want to get as good an…

23-09-2017 12:15PM

#1

MathsManiac

Registered Users, Registered Users 2 Posts: 1,595 ✭✭✭

Join Date: April 2007

Posts: 1571

I have an intriguing question about how best to use data to make a regression-based prediction.

Scenario:

Quantity Y is known to have a linear association with quantities X1 and X2.
I have 10,000 observations of pairs (X1, Y).
For 50 of these observations, I also have an observation for X2.

I want to get as good an estimate of an unknown Y as possible from new observation of X1 and X2.

I could ignore X2 entirely, use my 10,000 data points as the basis of a regression model for Y on X1 to get an estimate.

Or, I could ignore 9950 of my observations, and create a regression model for Y on X1 and X2.

In each case, I seem to be ignoring useful information.

Any ideas?

One option I'm considering is getting two separate estimates from simple regression - one using Y on X1 with 10,000 data points, the other using Y on X2 alone based on 50 data points, and combining these by weighting in proportion to the respective squared correlations.

0