Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Logistic Regression

  • 22-09-2011 4:58pm
    #1
    Registered Users, Registered Users 2 Posts: 427 ✭✭


    Hi,
    Just a quick question bout logistic regression.

    I'm doing a logistic regression analysis of my data. My data is of the format(this is just a sample):
    outcome	     A         B               C             D
    0		0.3928	0.1115	0.0653	0.0258
    1		0.358	0.1671	0.0205	0.0235
    
    A, B, C and D are predictions. 0 and 1 indicate success or failure.


    I have successfully computed the coefficient and intercept using R:
    (Intercept) -8.0831
    A 1.6395
    B 1.2020
    C 2.6645
    D 2.0608



    I am now predicting success/failure based on the scores.

    outcome = -8.0831 + (1.6395)A + (1.2020)B + (2.6645)C + (2.0608)D

    My question is how do I interpret the value of outcome. Do I round it to zero or 1 based on it's value?

    e.g 0.99 = 1
    e.g 0.20 = 0

    This what I'm doing at the moment but I'm unsure if I have missed a step.

    Thanks


Comments

  • Registered Users, Registered Users 2 Posts: 1,595 ✭✭✭MathsManiac


    I'm a bit out of my comfort zone here, but I think that the predicted outcome is effectively giving you the probability of "success" (i.e. probability of the outcome being "1".)

    That is, if your predicted outcome based on a particular set of input values is, say, 0.8, then it means that out of every 100 cases displaying this exact set of input characteristics, you would expect 80 of them to be 1 and 20 to be 0.

    (Someone who knows more about this might correct this if I'm wrong.)


  • Registered Users, Registered Users 2 Posts: 3,483 ✭✭✭Ostrom


    Dont round it - when you sub in specific predictor values for A,B,C and D, the outcome is the probability of success under those conditions


  • Registered Users, Registered Users 2 Posts: 12 WeatherOrWhich


    if

    outcome = -8.0831 + (1.6395)A + (1.2020)B + (2.6645)C + (2.0608)D

    then in R this is converted to a probability of success using

    plogis(outcome)

    or else you can use predict(..., type="response")


  • Registered Users, Registered Users 2 Posts: 427 ✭✭Kevo


    Thanks, that helps a lot.

    I have one last question. Does the ratio of true positives to false positives in the training dataset matter? For example 400 out of 40,000 are positives, the remaining 39,600 are negatives. I this this may cause problems and I may need to shrink the size of the dataset.
    Thanks


Advertisement