Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

Regexp

  • 08-02-2010 10:21PM
    #1
    Closed Accounts Posts: 265 ✭✭


    Anyone know how I can match the following words in my large text file:

    aaaaaaaahhhhhhhhhhh
    grrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
    ooooooooooooooooooccchhhhh
    gggooooooooooooooggggggggggle
    etc.

    I'm trying to clean up some noisy text.

    I've got something like s/[a-zA-Z]{3,}//g but it's not working... :(

    If the same letter occurs more than three times, then the regexp should match so I can delete the word.


Comments

  • Registered Users, Registered Users 2 Posts: 5,238 ✭✭✭humbert


    I think this tutorial is the best by a country mile and Expresso is a very handy tool for building up expressions.

    http://www.codeproject.com/KB/dotnet/regextutorial.aspx

    gggooogggle = g{3}o{3}g{3}le

    g{1,3} matches between 1 and 3 repetitions. Dunno if that's exactly what you're looking for.

    Oh and from experience I'd recommend always building up a regular expression gradually while running after each addition instead of writing it all and then wondering why you are getting no matches.

    Oh, didn't fully read your post. You want a back reference like \S*(.)\1{2,}\S*

    (.) matches any character and creates a numbered group labeled \1. Then \1{2,} looks for another two repetitions!


  • Closed Accounts Posts: 265 ✭✭DogmaticLefty


    Here is the answer:

    $line =~ s/\b\w*(\w)\1{2,}\w*(?:\s|$)//g
    Any words like aaaaaahhh or ggggggrrrrrr on the line are deleted.

    I love line noise on vim terminals...


Advertisement