Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

Losing the plot: Basic grep

  • 17-02-2010 12:12PM
    #1
    Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭


    Losing the plot here. Trying to match on any capital letters.

    Ok, first off we have the working one:
    echo Test | grep -E -e "[tT]est"
    Test
    echo $?
    0
    

    Perfect.
    echo est | grep -E -e "[tT]est"
    echo $?
    1
    

    Perfect also.

    Now I do:
    echo test | grep -E -e "[A-Z]"
    test
    echo $?
    0
    

    What in the name of god is happening here? I have tried many many variations of this and they all match.


Comments

  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    Slight update. I am running gentoo. It works as expected on an ubuntu box.

    alias grep says:
    alias grep='grep --colour=auto'

    I just installed the latest repository version of grep. Same rubbish.


  • Registered Users, Registered Users 2 Posts: 1,110 ✭✭✭Skrynesaver


    A lot of my scripts rely on

    grep "thing" && doSomeMagicIncation.

    This is quite weird RHEL has the same issue
    grep --version
    grep (GNU grep) 2.5.1
    
    $ echo test | grep -E   '^te[A-Z]t$'
    test
    $ echo test | grep -E   '^te[0-9]t$'
    $
    

    Seems as though -i is set by default !!


  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    That looks like a bug to me to be honest. My head is melted here. I'm glad it's not just me though. :)


  • Registered Users, Registered Users 2 Posts: 1,110 ✭✭✭Skrynesaver


    Seems to be only in character ranges that it happens
    $ echo test | grep -E   '^teSt$'
    $ echo test | grep -E   '^te[S]t$'
    $ echo test | grep -E   '^te[R-T]t$'
    test
    $
    

    So are you reporting it or am I?
    PS. Which release of grep are you on?


  • Registered Users, Registered Users 2 Posts: 1,110 ✭✭✭Skrynesaver


    Aha, found it.
     export LC_COLLATE=POSIX
    
    and try again. Seems if you have dictionary sort order rather than posix sort order the range
    [a-d]
    
    is actually
    [aBbCcDd]
    
    .
    Obvious really ;)
    from grep bug search


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    Stuff like that really irritates me about Linux. Never in the history of the universe has a-d ever meant aAbBcCdD. Never!

    Nice catch btw.


  • Registered Users, Registered Users 2 Posts: 1,110 ✭✭✭Skrynesaver


    I think it's an aspect of catering to usability, people aren't used to the idea that Z comes before a and so a "friendly" locale setting fixes it for you.

    If, as a result, some "obscure" thing like regex gets broken, well the geeks should be able to fix it themselves, seems to be the attitude.


  • Registered Users, Registered Users 2 Posts: 2,780 ✭✭✭niallb


    So are you reporting it or am I?
    PS. Which release of grep are you on?

    Real nice catch.
    Who to report it to though?
    It's really just a LOCALE issue.

    Lets get started on en_IE.UTF8.pedantic :-)
    To paraphrase sendmail: Don't blame grep!

    Of course, it would be far more productive to fix the collate order in ga_IE,
    and wait for bewilderment further down the line when only people from gaelscoileanna can pass a regex exam


Advertisement