Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Defunct files/cleaning out dead wood

Options
  • 19-03-2008 1:17pm
    #1
    Registered Users Posts: 68,317 ✭✭✭✭


    Right, maybe more a web question than a programming one, but I have a problem whereby the web application I've inherited has what seems like a few hundred completely unused files.

    Lots of them are obviously previous incarnations of a page (e.g. includes1.php, includes2.php, includes_old.php) but none of them have any documentation and there's a good chance that they *may* be referenced by one file or another.

    What I'm really looking for is a way to remove the unreferenced files - probably something along the lines of a parser which takes a list of strings (i.e. file names), searches every file a particular directory and then spits back the strings which weren't found in any file.

    I'm doing this mainly to make my life easier. A, I'm in the middle of rebuilding the app's backend and documenting everything and B, when I have to troubleshoot, it adds a few extra minutes with me trying to find out if the script is calling "includes1.php" or "includes2.php" and whether or not either of these include files actually contain the offending piece of code.

    Any ideas?


Comments

  • Registered Users Posts: 2,494 ✭✭✭kayos


    Windows Key + F

    Sometimes the easiest ways are the best :)


  • Registered Users Posts: 68,317 ✭✭✭✭seamus


    kayos wrote: »
    Windows Key + F

    Sometimes the easiest ways are the best :)
    Yes, but can it be scripted? :p

    I have a tonne of these files, so copying and pasting in the name of each file would be a PITA :(


  • Registered Users Posts: 7,468 ✭✭✭Evil Phil


    I don't know how you going to implement it exactly but you should develop it as an application and publish it. Lots of people would use it.


  • Registered Users Posts: 68,317 ✭✭✭✭seamus


    :)
    I've been looking for a reason to do something in Visual Studio 2005. Though I suppose it would be good to have it for any platform. Perl maybe?


  • Registered Users Posts: 7,468 ✭✭✭Evil Phil


    Perl would be good, 'specially for all that string processing/regexp stuff.


  • Advertisement
  • Registered Users Posts: 2,494 ✭✭✭kayos


    I got bored not exactly what you want but meh something you can build on if you fancy doing it in .NET.

    Works more along the line of taking multiple string search expressions, a directory search and then pumps out all the matchs into a treeview grouped by Search Expression - File - Line.

    So

    Search Exp1
    - File A
    - Line X - Line text
    - Line Y - Line text
    - Line Z - Line text
    - File B
    - Line Z - Line text

    Search Exp2
    - File A
    - Line Z - Line text
    - File B
    - Line Y - Line text
    - Line Z - Line text

    Feel free to mock, improve, question or rofl!


  • Registered Users Posts: 68,317 ✭✭✭✭seamus


    Damn, missed that kayos. I'll take a look at it in work tomorrow.

    I wrote a script in PHP this afternoon to do this, or something like it. Takes forever (as someone pointed out in work to me, the complexity is O(n^2)) so I've left it running overnight to process 600-odd files. I'll post the script with comments on the performance tomorrow.

    Basically aggregates a list of all files in a directory and its sub-directories and goes through each one-by-one to determine if any of the other files link to it...


  • Registered Users Posts: 68,317 ✭✭✭✭seamus


    OK, turns out that my script failed overnight. A couple of things I overlooked in my 45 minutes of hacking:
    1. I assumed that all files were small.
    2. I assumed that PHP could load in the entire contents of any file.

    There was a 0.5Gb file in the directory which made it fall over when the script attempted to read it in.

    So I added in a variable for the maximum filesize to open and it runs. I also tweaked the way I read in files. Instead of reading in an entire file, and then doing a preg_match, it reads in the files line-by-line and performs a preg_match on each line. This is faster. I don't know why, but I'm guessing it has something to do with having a smaller memory requirement (i.e. only having to store a line in memory instead of a whole file).

    Scanned through my directory of 600+ files in about 30 minutes and gave me a list of 280 files that were no longer needed. :eek:

    It's still a bit raw and there are a couple of enhancements that could be added to it to speed it up and make it more accurate:
    1. Tell it to only scan certain file types - .php, .htm, .asp, etc. This currently reads through everything, images, log files and all. I noticed that log files tend to mention scripts/pages which are no longer in use, thus giving false negatives.
    2. Tweak the regexps. It's not my forté so I did the best with what I know.

    Let me know what ye think. I've hopefully commented it enough that it should be obvious on how to change it/run it. I've set it up to be used on the command line, not run as a web page, purely because of the length of time it takes - most if not all servers will time out.

    (vBulletin doesn't like me uploading scripts with .php extensions :D)


Advertisement