Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

Perl text parsing

  • 07-06-2012 01:59PM
    #1
    Registered Users, Registered Users 2 Posts: 43


    I'm new to perl scripting so I was wondering if anyone could help with this problem I have. I have a file separated by tabs and its set up like this:
    q 1
    q 3
    b 3
    b 4
    b 4
    a 1
    a 3
    a 3
    I want to add up each value on column 2 correspond to each letter on column 1 e.g. q should equal 4.

    Thanks alot for any help


Comments

  • Registered Users, Registered Users 2 Posts: 962 ✭✭✭darjeeling


    Here's a script I wrote a while ago to do this job. There are probably lots of other ways to do this, but this should work:
    #!/usr/bin/perl -w
    
    # get_totals.pl
    # reads in a set of identifiers with replicates and corresponding scores,
    # computes the total for each replicate and outputs
    
    # enforce strict pragma for variable declaration etc (good practice):
    use strict;
    
    # read iinput file name from command line args, 
    # or use STDIN if none is provided:
    my $infile=shift;
    if (! $infile){$infile = '-';}
    open(INFILE, "$infile") || die "opening $infile: $!"; $_="1";
    
    # variables:
    my $item;                  # name of current item
    my $value;                 # value of item from current line
    my %item_totals;           # hash of totalised values for each item
    
    # read through input, one line at a time, 
    # adding values to running totals for each item:
    while(<INFILE>) {
      chomp;
      ($item,$value) = split(/\t/, $_);
      if ( exists ( $item_totals{$item} ) ){
        $item_totals{$item} += $value;
      }
      else {
        $item_totals{$item} = $value;
      }
    }
    close (INFILE);
    
    # print out the totals:
    print STDOUT "Item\tScore\n";
    for $item ( sort {$a cmp $b} keys %item_totals){
      print STDOUT "$item\t$item_totals{$item}\n";
    }
    
    


  • Registered Users, Registered Users 2 Posts: 43 Rhavin


    Thanks very much, works perfectly!


  • Registered Users, Registered Users 2 Posts: 1,110 ✭✭✭Skrynesaver


    or more concisely
    perl -e 'while(<>){$value{$1}+=$2 if (/\w+\s+\d+/);}END{for (sort keys %value){print "$_\t$value{$_}\n";}}' <FILENAME>
    


  • Registered Users, Registered Users 2 Posts: 962 ✭✭✭darjeeling


    or more concisely
    perl -e 'while(<>){$value{$1}+=$2 if (/\w+\s+\d+/);}END{for (sort keys %value){print "$_\t$value{$_}\n";}}' <FILENAME>
    

    Thanks. I added a couple of pairs of brackets in the regexp to get your one-liner to work for me:
    perl -e 'while(<>){$value{$1}+=$2 if (/^[COLOR=Red]([/COLOR]\w+[COLOR=Red])[/COLOR]\s+[COLOR=Red]([/COLOR]\d+[COLOR=Red])[/COLOR]/);}END{for  (sort keys %value){print "$_\t$value{$_}\n";}}' <FILENAME>
    


  • Registered Users, Registered Users 2 Posts: 43 Rhavin


    Why would the second answer round the numbers while the first one gave the whole number?


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 962 ✭✭✭darjeeling


    Rhavin wrote: »
    Why would the second answer round the numbers while the first one gave the whole number?

    It's down to the regular expression used:

    Matches one or more word characters at the start of a line, followed by one or more space characters, followed by one or more digits:
    /^(\w+)\s+(\d+)/

    This additionally allows for an optional decimal point and optional following digits:
    /^(\w+)\s+(\d+\.?\d*)/

    The bits of text matching the terms in brackets (i.e. '\w+' and '\d+\.?\d*') are stored in the match variables $1 and $2, which are then used to populate skrynesaver's hash %value

    Here's the revised code:
    perl -e 'while(<>){$value{$1}+=$2 if (/^(\w+)\s+(\d+\.?\d*)/);}END{for  (sort keys %value){print "$_\t$value{$_}\n";}}' <FILENAME>
    


  • Registered Users, Registered Users 2 Posts: 43 Rhavin


    Thanks for all the help! It made work today alot easier:D


  • Registered Users, Registered Users 2 Posts: 1,414 ✭✭✭Fluffy88


    It's not Perl if it can't be wrote in one line :P


Advertisement