If you have a new account but are having problems posting or verifying your account, please email us on for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact

Linear Regression with Independent Variable with many zeros

  • 13-05-2021 1:39am
    Registered Users Posts: 4

    When I have a dataset where one of my independent variables is all zeroes, this is obviously a rank deficient matrix and they regression cannot be performed. When I have a dataset that has an indicator/dummy variable that many zeros (let's say 95 out of 100) is this ok ? Here's a matlab example illustrating.
    A = rand(100,100);
    inverseATA = inv(A' * A);
    B = rand(100,100);
    B(1:90,2) = 0;
    inverseBTB = inv(B' * B);
    C = A;
    C(1:99,2) = 0;
    inverseCTC = inv(C' * C);
    D = rand(100,100);
    D(:,1)  = 0;
    inverseDTD = inv (D' * D);

    It appears that there is no issue here, but I can't find anything in any of my text books or online backing this up either, outside of a one liner here and there saying it's fine. I'd like a bit more detail on it.



  • Moderators, Science, Health & Environment Moderators Posts: 1,847 Mod ✭✭✭✭Michael Collins

    Hello Billynose. It really depends on what you're trying to do. Can you give us more info? Why are you trying to fit a line to data?

    It's not clear (to me) what is going on with your MATLAB code, or what your end objective is. Can you provide a minimal example with comments?

  • Registered Users Posts: 4 Billynose

    What I'm trying to do is to seeif my set of independent variables are still viable (I guess you could say well conditioned) to perform a regression on. The matlab code is basically checking that (X'X)^-1 is not singular. I run through 4 examples.
    with a 100 x n matrix

    A is the base case
    B takes one column and converts 95 of the values to 0
    C takes one column and converts 99 of the values to 0
    D takes one column and converts all of the values to 0

    D is obviously rank deficient, but B and C are not. I've tried looking at the condition number as well, but I actually don't see much of a difference between A and B. There is an increase up to C though.