Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.
Hi all, please see this major site announcement: https://www.boards.ie/discussion/2058427594/boards-ie-2026

Perl text parsing

  • 07-06-2012 01:59PM
    #1
    Registered Users, Registered Users 2 Posts: 43


    I'm new to perl scripting so I was wondering if anyone could help with this problem I have. I have a file separated by tabs and its set up like this:
    q 1
    q 3
    b 3
    b 4
    b 4
    a 1
    a 3
    a 3
    I want to add up each value on column 2 correspond to each letter on column 1 e.g. q should equal 4.

    Thanks alot for any help


Comments

  • Registered Users, Registered Users 2 Posts: 962 ✭✭✭darjeeling


    Here's a script I wrote a while ago to do this job. There are probably lots of other ways to do this, but this should work:
    #!/usr/bin/perl -w
    
    # get_totals.pl
    # reads in a set of identifiers with replicates and corresponding scores,
    # computes the total for each replicate and outputs
    
    # enforce strict pragma for variable declaration etc (good practice):
    use strict;
    
    # read iinput file name from command line args, 
    # or use STDIN if none is provided:
    my $infile=shift;
    if (! $infile){$infile = '-';}
    open(INFILE, "$infile") || die "opening $infile: $!"; $_="1";
    
    # variables:
    my $item;                  # name of current item
    my $value;                 # value of item from current line
    my %item_totals;           # hash of totalised values for each item
    
    # read through input, one line at a time, 
    # adding values to running totals for each item:
    while(<INFILE>) {
      chomp;
      ($item,$value) = split(/\t/, $_);
      if ( exists ( $item_totals{$item} ) ){
        $item_totals{$item} += $value;
      }
      else {
        $item_totals{$item} = $value;
      }
    }
    close (INFILE);
    
    # print out the totals:
    print STDOUT "Item\tScore\n";
    for $item ( sort {$a cmp $b} keys %item_totals){
      print STDOUT "$item\t$item_totals{$item}\n";
    }
    
    


  • Registered Users, Registered Users 2 Posts: 43 Rhavin


    Thanks very much, works perfectly!


  • Registered Users, Registered Users 2 Posts: 1,110 ✭✭✭Skrynesaver


    or more concisely
    perl -e 'while(<>){$value{$1}+=$2 if (/\w+\s+\d+/);}END{for (sort keys %value){print "$_\t$value{$_}\n";}}' <FILENAME>
    


  • Registered Users, Registered Users 2 Posts: 962 ✭✭✭darjeeling


    or more concisely
    perl -e 'while(<>){$value{$1}+=$2 if (/\w+\s+\d+/);}END{for (sort keys %value){print "$_\t$value{$_}\n";}}' <FILENAME>
    

    Thanks. I added a couple of pairs of brackets in the regexp to get your one-liner to work for me:
    perl -e 'while(<>){$value{$1}+=$2 if (/^[COLOR=Red]([/COLOR]\w+[COLOR=Red])[/COLOR]\s+[COLOR=Red]([/COLOR]\d+[COLOR=Red])[/COLOR]/);}END{for  (sort keys %value){print "$_\t$value{$_}\n";}}' <FILENAME>
    


  • Registered Users, Registered Users 2 Posts: 43 Rhavin


    Why would the second answer round the numbers while the first one gave the whole number?


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 962 ✭✭✭darjeeling


    Rhavin wrote: »
    Why would the second answer round the numbers while the first one gave the whole number?

    It's down to the regular expression used:

    Matches one or more word characters at the start of a line, followed by one or more space characters, followed by one or more digits:
    /^(\w+)\s+(\d+)/

    This additionally allows for an optional decimal point and optional following digits:
    /^(\w+)\s+(\d+\.?\d*)/

    The bits of text matching the terms in brackets (i.e. '\w+' and '\d+\.?\d*') are stored in the match variables $1 and $2, which are then used to populate skrynesaver's hash %value

    Here's the revised code:
    perl -e 'while(<>){$value{$1}+=$2 if (/^(\w+)\s+(\d+\.?\d*)/);}END{for  (sort keys %value){print "$_\t$value{$_}\n";}}' <FILENAME>
    


  • Registered Users, Registered Users 2 Posts: 43 Rhavin


    Thanks for all the help! It made work today alot easier:D


  • Registered Users, Registered Users 2 Posts: 1,414 ✭✭✭Fluffy88


    It's not Perl if it can't be wrote in one line :P


Advertisement
Advertisement