Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

Linear Regression with Independent Variable with many zeros

  • 13-05-2021 01:39AM
    #1
    Registered Users, Registered Users 2 Posts: 4


    When I have a dataset where one of my independent variables is all zeroes, this is obviously a rank deficient matrix and they regression cannot be performed. When I have a dataset that has an indicator/dummy variable that many zeros (let's say 95 out of 100) is this ok ? Here's a matlab example illustrating.
    A = rand(100,100);
    inverseATA = inv(A' * A);
    all(all(isfinite(inverseATA)))
    
    B = rand(100,100);
    B(1:90,2) = 0;
    inverseBTB = inv(B' * B);
    all(all(isfinite(inverseBTB)))
    
    C = A;
    C(1:99,2) = 0;
    inverseCTC = inv(C' * C);
    all(all(isfinite(inverseCTC)))
    
    
    D = rand(100,100);
    D(:,1)  = 0;
    inverseDTD = inv (D' * D);
    all(all(isfinite(inverseDTD)))
    

    It appears that there is no issue here, but I can't find anything in any of my text books or online backing this up either, outside of a one liner here and there saying it's fine. I'd like a bit more detail on it.

    Thanks.


Comments

  • Moderators, Science, Health & Environment Moderators Posts: 1,855 Mod ✭✭✭✭Michael Collins


    Hello Billynose. It really depends on what you're trying to do. Can you give us more info? Why are you trying to fit a line to data?

    It's not clear (to me) what is going on with your MATLAB code, or what your end objective is. Can you provide a minimal example with comments?


  • Registered Users, Registered Users 2 Posts: 4 Billynose


    What I'm trying to do is to seeif my set of independent variables are still viable (I guess you could say well conditioned) to perform a regression on. The matlab code is basically checking that (X'X)^-1 is not singular. I run through 4 examples.
    with a 100 x n matrix

    A is the base case
    B takes one column and converts 95 of the values to 0
    C takes one column and converts 99 of the values to 0
    D takes one column and converts all of the values to 0

    D is obviously rank deficient, but B and C are not. I've tried looking at the condition number as well, but I actually don't see much of a difference between A and B. There is an increase up to C though.


Advertisement