Logistic Regression

Kevo · 22-09-2011 05:58PM #1

Hi,
Just a quick question bout logistic regression.

I'm doing a logistic regression analysis of my data. My data is of the format(this is just a sample):

outcome	     A         B               C             D
0		0.3928	0.1115	0.0653	0.0258
1		0.358	0.1671	0.0205	0.0235

A, B, C and D are predictions. 0 and 1 indicate success or failure.

I have successfully computed the coefficient and intercept using R:
(Intercept) -8.0831
A 1.6395
B 1.2020
C 2.6645
D 2.0608

I am now predicting success/failure based on the scores.

outcome = -8.0831 + (1.6395)A + (1.2020)B + (2.6645)C + (2.0608)D

My question is how do I interpret the value of outcome. Do I round it to zero or 1 based on it's value?

e.g 0.99 = 1
e.g 0.20 = 0

This what I'm doing at the moment but I'm unsure if I have missed a step.

Thanks

MathsManiac · 23-09-2011 10:02PM

I'm a bit out of my comfort zone here, but I think that the predicted outcome is effectively giving you the probability of "success" (i.e. probability of the outcome being "1".)

That is, if your predicted outcome based on a particular set of input values is, say, 0.8, then it means that out of every 100 cases displaying this exact set of input characteristics, you would expect 80 of them to be 1 and 20 to be 0.

(Someone who knows more about this might correct this if I'm wrong.)

Ostrom · 26-09-2011 06:04PM

Dont round it - when you sub in specific predictor values for A,B,C and D, the outcome is the probability of success under those conditions

WeatherOrWhich · 28-09-2011 05:54PM

if

outcome = -8.0831 + (1.6395)A + (1.2020)B + (2.6645)C + (2.0608)D

then in R this is converted to a probability of success using

plogis(outcome)

or else you can use predict(..., type="response")

Kevo · 30-09-2011 10:56AM

Thanks, that helps a lot.

I have one last question. Does the ratio of true positives to false positives in the training dataset matter? For example 400 out of 40,000 are positives, the remaining 39,600 are negatives. I this this may cause problems and I may need to shrink the size of the dataset.
Thanks

Logistic Regression

Comments