If you have a new account but are having problems posting or verifying your account, please email us on [email protected] for help. Thanks :)
Hello All, This is just a friendly reminder to read the Forum Charter where you wish to post before posting in it. :)
Hi all, The AutoSave Draft feature is now disabled across the site. The decision to disable the feature was made via a poll last year. The delay in putting it in place was due to a bug/update issue. This should serve as a reminder to manually save your drafts if you wish to keep them. Thanks, The Boards Team.
Hello all! This is just a quick reminder to ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere.

Linear Regression with Independent Variable with many zeros

  • 13-05-2021 1:39am
    Registered Users Posts: 4 Billynose

    When I have a dataset where one of my independent variables is all zeroes, this is obviously a rank deficient matrix and they regression cannot be performed. When I have a dataset that has an indicator/dummy variable that many zeros (let's say 95 out of 100) is this ok ? Here's a matlab example illustrating.
    A = rand(100,100);
    inverseATA = inv(A' * A);
    B = rand(100,100);
    B(1:90,2) = 0;
    inverseBTB = inv(B' * B);
    C = A;
    C(1:99,2) = 0;
    inverseCTC = inv(C' * C);
    D = rand(100,100);
    D(:,1)  = 0;
    inverseDTD = inv (D' * D);

    It appears that there is no issue here, but I can't find anything in any of my text books or online backing this up either, outside of a one liner here and there saying it's fine. I'd like a bit more detail on it.



  • Moderators, Science, Health & Environment Moderators Posts: 1,835 Mod ✭✭✭✭ Michael Collins

    Hello Billynose. It really depends on what you're trying to do. Can you give us more info? Why are you trying to fit a line to data?

    It's not clear (to me) what is going on with your MATLAB code, or what your end objective is. Can you provide a minimal example with comments?

  • Registered Users Posts: 4 Billynose

    What I'm trying to do is to seeif my set of independent variables are still viable (I guess you could say well conditioned) to perform a regression on. The matlab code is basically checking that (X'X)^-1 is not singular. I run through 4 examples.
    with a 100 x n matrix

    A is the base case
    B takes one column and converts 95 of the values to 0
    C takes one column and converts 99 of the values to 0
    D takes one column and converts all of the values to 0

    D is obviously rank deficient, but B and C are not. I've tried looking at the condition number as well, but I actually don't see much of a difference between A and B. There is an increase up to C though.