Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

simple string manipulation

  • 08-04-2008 8:48am
    #1
    Registered Users, Registered Users 2 Posts: 6,265 ✭✭✭


    Hi All,
    my brain is asleep, and i need to do a simple piece of string manipulation
    i've got a file containing about 100k records like this
    &SEQUENCENUMBER=43|TIMESTAMP=20071023113913|TRIGGER=interim|CONTEXTID=node1;167773127+471dd383@1|MSISDN=1234567890|IPADDRESS=10.11.12.13|CLIENTPROTOCOL=rtsp|NETWORKELEMENTID=node1|SERVICETYPE=streaming|APPLICATIONDESCRIPTION=Streaming/downloading service|CONTENTURL=rtsp://100.10.12.13/vod/beer.3gp|TRANSFERREDVOLUME=0|TIMEDURATION=0|CHARGINGSTATUS=notDebited|CONTENTRATE=2000|PRICETYPE=pricePerMinute|CURRENCYCODE=978|CURRENCYEXPONENT=-2|&

    i need to extract these values
    167773127+471dd38

    Using a command like awk like this it works
    awk -F'|' '{print $4}' input.ASCII | awk -F';' '{print $2}'

    but its not very efficient.
    any ideas how to avoid the second awk?


Comments

  • Registered Users, Registered Users 2 Posts: 868 ✭✭✭brianmc


    Hi, nothing terribly simple spings to mind for basic AWK unless there is something more uniform about the structure of that 4th field in each record that could be used.

    If the length of the field and the position within the field of the vale you want is standard in each record you could use the substr function.

    If not, then nawk or gawk might solve your problem with a sub comand to substitute the first part of the fields value up to the "=" with nothing.

    If none of this works out and you have to use traditional AWK (not nawk or gawk) then you could write a loop that uses substr to look at each character in the value one at a time up to the "=" and then extract the remaining characters.


  • Registered Users, Registered Users 2 Posts: 6,265 ✭✭✭MiCr0


    thanks for that

    Any one know is there a quicker way to do it that doesn't involve awk?


  • Registered Users, Registered Users 2 Posts: 1,606 ✭✭✭djmarkus


    For what your doing it looks like you could just use cut


  • Registered Users, Registered Users 2 Posts: 868 ✭✭✭brianmc


    djmarkus wrote: »
    For what your doing it looks like you could just use cut


    Yep, "cut" into another "cut" or into "sed". Or "awk" into "sed". I don't know if any of it is much more efficient though. What shell are you using? The following should work in ksh...
    FIELD=$(awk -F"|" '{print $4}')
    FIELD=${FIELD##*;}
    

    Probably works like that in Bash too but I've not tried it.

    I was talking about "=" symbols in my last post where I meant to say ";" symbols btw. Also the ";" in the second command line above may need to be \ escaped. Haven't tested.


  • Registered Users, Registered Users 2 Posts: 6,265 ✭✭✭MiCr0


    awk -F'|' '{print substr($4,17, length($0)-1)}' file.name


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 868 ✭✭✭brianmc


    MiCr0 wrote: »
    awk -F'|' '{print substr($4,17, length($0)-1)}' file.name

    Curious about this bit...
    length($0) - 1
    

    That's the length of the entire record which is obviously longer than the length of the fourth field but otherwise an irrelevant number. Could you just replace that with, say 999. I.E. a large number.


  • Registered Users, Registered Users 2 Posts: 1,306 ✭✭✭carveone


    MiCr0 wrote: »
    i need to extract these values
    167773127+471dd38

    Using a command like awk like this it works
    awk -F'|' '{print $4}' input.ASCII | awk -F';' '{print $2}'

    Bit late but:

    awk -F '[|;]' '{ print $5 }'

    or

    awk -F '|' '{ str = $4; sub(/^.*;/, "", str); print str; }'

    or

    awk '{ str = $0; sub(/^.*;/, "", str); sub(/\|.*$/, "", str); print str }'

    or many others...

    As for substr, just leave off the last parameter to specify the remaining length. substr("washington", 5) returns "ington".


Advertisement