Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

regex to find a string in a comment

Options
  • 19-03-2008 2:40pm
    #1
    Registered Users Posts: 26,558 ✭✭✭✭


    hey guys,

    i have a perl file (lets call it a1.pl) that i need to read in another file (lets call this one b1.pl) and be able to find a string that's placed in a comment in the file b1.pl.

    what i'm trying to match is a script description that is put in b1.pl and it has the format like this:
    #script_description:blahblahblahblah.
    

    it's the blah blah blah part that i need to store in $desc in the file a1.pl

    i've this so far:
        my $file="b1.pl";
        #open the second file
        open(FHANDLE, "$file");
        #array stores the whole file.
        my @file = <FHANDLE>;
        
        foreach my $line (@file)
        {
            $line =~ m/script_description\:(.+)/;
            $desc = $1;
            print "desc is $desc";        
        } 
    

    anyone know where i'm going wrong, when i run the file a1.pl i just get a print out of "desc is <blank>" for the amount of lines in the file.


Comments

  • Registered Users Posts: 568 ✭✭✭phil


    $1 will just be undef in circumstances where nothing matches that line. (i.e. that print statement will print for every line).

    First of all, wrap it in an if statement.
    if ($line =~ m/script_description:(.+)/) {
        $desc = $1;
        print "desc is $desc\n";
    }
    

    That works for me:
    $ cat test.txt
    Line 1:
        #script_description:blahblahblahblah.
    Line 3:
    $ perl test.pl
    desc is blahblahblahblah.
    

    If you're still having problems, double check you haven't got any weird formatting or characters in the file. Also, it's good practice to chomp() those lines to remove the trailing newlines (unless they're something you explicitly want to take care of).

    Also, print with a newline at the end (your shell might be covering up the line that's actually being printed).

    Phil.


  • Registered Users Posts: 26,558 ✭✭✭✭Creamy Goodness


    you're a life saver, wrapping it in a if statement done the job, i did have the chomp in there as well but i type the above example from memory as i wasn't working on the machine with those files as i posted.


  • Registered Users Posts: 26,558 ✭✭✭✭Creamy Goodness


    back again...

    i have an array that holds the html source code of a file.

    a line of the file is like this:

    <IMG SRC="/icons/folder.gif" ALT="[DIR]"> <A HREF="./mydir/">mydir</A> 09-Jan-2008 01:20 -

    you may notice it's a html file generated by apache ;)

    the part i want to extract is the part in bold <A HREF="./mydir/">mydir</A>

    here's what i've got so far.
    #store what's found in regex in @dirs
    my @dirs;
    
    #@page contains the the source code of the html file.
    
    foreach my $line (@page)
    {
    	if($line =~ /(.*)<\/A>$ /){
    		push(@dirs,$1);
    	}
    }
    


  • Registered Users Posts: 21,264 ✭✭✭✭Hobbes


    />(.*)<\/a>/i

    Although that matching is a bit off I would go with.

    />(.*?)<\/a>/i


  • Registered Users Posts: 26,558 ✭✭✭✭Creamy Goodness


    right i got this regex pulling out the directories links from the html source, it will pull out anything after a "./" in the html source.

    eg. <A HREF="./mydir/"> now i try to print out these directory matchings inside the foreach and the if statement but it only matches once.

    i opened a debug file and outputted each line of the file as i'm looping through it to make sure it has the full source code, which it does.

    the regex i think is fine as it's only directories that will start with "./" .

    am i missing something blindingly obvious here as i'm totally stumped on why it isn't matching the rest of the directories and printing them out.

    use LWP::Simple;
    
    my $page = new CGI();
    
    my $url="www.blah.com/myfiles"
    
    my @page;
    
    @page=get($url); #get the source and store it in an array. 
    
    my @dirs; #array of directories
    
    open(DEBUG, ">debug.txt"); # printing html source to file
    
    foreach my $line (@page)
    {
        print DEBUG $line;
        chomp();
        
        if( $line =~ m#\Q<A HREF="./\E(.+)\Q/">\E# )
        {
            print "in match!";
            push(@dirs,$1);
            print $1;
        }
    }
    close(DEBUG);
    


  • Advertisement
Advertisement