Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Regexp

  • 08-02-2010 9:21pm
    #1
    Closed Accounts Posts: 265 ✭✭


    Anyone know how I can match the following words in my large text file:

    aaaaaaaahhhhhhhhhhh
    grrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
    ooooooooooooooooooccchhhhh
    gggooooooooooooooggggggggggle
    etc.

    I'm trying to clean up some noisy text.

    I've got something like s/[a-zA-Z]{3,}//g but it's not working... :(

    If the same letter occurs more than three times, then the regexp should match so I can delete the word.


Comments

  • Registered Users, Registered Users 2 Posts: 5,238 ✭✭✭humbert


    I think this tutorial is the best by a country mile and Expresso is a very handy tool for building up expressions.

    http://www.codeproject.com/KB/dotnet/regextutorial.aspx

    gggooogggle = g{3}o{3}g{3}le

    g{1,3} matches between 1 and 3 repetitions. Dunno if that's exactly what you're looking for.

    Oh and from experience I'd recommend always building up a regular expression gradually while running after each addition instead of writing it all and then wondering why you are getting no matches.

    Oh, didn't fully read your post. You want a back reference like \S*(.)\1{2,}\S*

    (.) matches any character and creates a numbered group labeled \1. Then \1{2,} looks for another two repetitions!


  • Closed Accounts Posts: 265 ✭✭DogmaticLefty


    Here is the answer:

    $line =~ s/\b\w*(\w)\1{2,}\w*(?:\s|$)//g
    Any words like aaaaaahhh or ggggggrrrrrr on the line are deleted.

    I love line noise on vim terminals...


Advertisement