Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Losing the plot: Basic grep

  • 17-02-2010 12:12PM
    #1
    Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭


    Losing the plot here. Trying to match on any capital letters.

    Ok, first off we have the working one:
    echo Test | grep -E -e "[tT]est"
    Test
    echo $?
    0
    

    Perfect.
    echo est | grep -E -e "[tT]est"
    echo $?
    1
    

    Perfect also.

    Now I do:
    echo test | grep -E -e "[A-Z]"
    test
    echo $?
    0
    

    What in the name of god is happening here? I have tried many many variations of this and they all match.


Comments

  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    Slight update. I am running gentoo. It works as expected on an ubuntu box.

    alias grep says:
    alias grep='grep --colour=auto'

    I just installed the latest repository version of grep. Same rubbish.


  • Registered Users, Registered Users 2 Posts: 1,110 ✭✭✭Skrynesaver


    A lot of my scripts rely on

    grep "thing" && doSomeMagicIncation.

    This is quite weird RHEL has the same issue
    grep --version
    grep (GNU grep) 2.5.1
    
    $ echo test | grep -E   '^te[A-Z]t$'
    test
    $ echo test | grep -E   '^te[0-9]t$'
    $
    

    Seems as though -i is set by default !!


  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    That looks like a bug to me to be honest. My head is melted here. I'm glad it's not just me though. :)


  • Registered Users, Registered Users 2 Posts: 1,110 ✭✭✭Skrynesaver


    Seems to be only in character ranges that it happens
    $ echo test | grep -E   '^teSt$'
    $ echo test | grep -E   '^te[S]t$'
    $ echo test | grep -E   '^te[R-T]t$'
    test
    $
    

    So are you reporting it or am I?
    PS. Which release of grep are you on?


  • Registered Users, Registered Users 2 Posts: 1,110 ✭✭✭Skrynesaver


    Aha, found it.
     export LC_COLLATE=POSIX
    
    and try again. Seems if you have dictionary sort order rather than posix sort order the range
    [a-d]
    
    is actually
    [aBbCcDd]
    
    .
    Obvious really ;)
    from grep bug search


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    Stuff like that really irritates me about Linux. Never in the history of the universe has a-d ever meant aAbBcCdD. Never!

    Nice catch btw.


  • Registered Users, Registered Users 2 Posts: 1,110 ✭✭✭Skrynesaver


    I think it's an aspect of catering to usability, people aren't used to the idea that Z comes before a and so a "friendly" locale setting fixes it for you.

    If, as a result, some "obscure" thing like regex gets broken, well the geeks should be able to fix it themselves, seems to be the attitude.


  • Registered Users, Registered Users 2 Posts: 2,775 ✭✭✭niallb


    So are you reporting it or am I?
    PS. Which release of grep are you on?

    Real nice catch.
    Who to report it to though?
    It's really just a LOCALE issue.

    Lets get started on en_IE.UTF8.pedantic :-)
    To paraphrase sendmail: Don't blame grep!

    Of course, it would be far more productive to fix the collate order in ga_IE,
    and wait for bewilderment further down the line when only people from gaelscoileanna can pass a regex exam


Advertisement