Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

POSIX regex file expression question

  • 07-11-2002 06:19PM
    #1
    Registered Users, Registered Users 2, Paid Member Posts: 14,174 ✭✭✭✭


    hey guys,

    am setting up some file-extension filters and am just wondering where or not I'm getting this right

    for example:
    .*\..*vb.*
    
    will pick up filename1.vbs, or filename2.vbe, or filename.blah.vbs.txt, yes?



    similarly ....
    .*\.d*l
    
    will pick up filename1.dll and filename2.dpl, but not filename3.dogl, yes?


Comments

  • Closed Accounts Posts: 5,563 ✭✭✭Typedef


    Hmm.

    Here is the line I use to reject window$ executable attachments on mailservers if it's any help

    /etc/postfix/body_checks


    /(filename|name)=".*\.(au|bat|chm|cmd|com|css|dll|dot|exe|hlp|hta|exe|hlp|jse|lnk|ocx|pak|pif|pps|scr|sct|shs|src|vbe|vbs|vxd|wsh"/ REJECT


  • Registered Users, Registered Users 2, Paid Member Posts: 14,174 ✭✭✭✭Lemming


    Originally posted by Typedef

    /(filename|name)=".*\.(au|bat|chm|cmd|com|css|dll|dot|exe|hlp|hta|exe|hlp|jse|lnk|ocx|pak|pif|pps|scr|sct|shs|src|vbe|vbs|vxd|wsh"/ REJECT

    Hmm ... would I be right in saying that that filename setup checks for filename.extension?

    If it does, whilst it's a one-stop-filter-shopping-list, what I'm trying to do is create as generic a list as possible, rather than have to list every file-type and/or filetype combination (eg. name.txt.vbs) that I want blocked (since some file extensions are similar - give or take a character) something like:
    .*\.(vbs|vbe|dll|dpl|asp|tsp|etc etc).
    

    So what I'm asking is the following:

    "Was my syntax correct from my initial post"?


  • Closed Accounts Posts: 5,563 ✭✭✭Typedef


    That'll pick up a filename.txt.vbs (I just tested it).

    /(filename|name)=".*\.(vb*)"/ REJECT

    should work for vb(x) I think.


  • Registered Users, Registered Users 2, Paid Member Posts: 14,174 ✭✭✭✭Lemming


    Originally posted by Typedef
    That'll pick up a filename.txt.vbs (I just tested it).

    /(filename|name)=".*\.(vb*)"/ REJECT

    should work for vb(x) I think.

    Cheers type :)

    one last question (I hope anyway) for ye. The syntax * means one character, whilst if you have .* does this mean that you have 0 or more characters ? Or does it mean that you have one or more?

    If you follow the distinction I'm trying to make in that question?

    So , say for example I want to filter hta, htt, htm, html would I just have to have .*\.(ht.*) ? or would I have to specify a seperate filter for .html (since it's four characters as opposed to three) ?


  • Closed Accounts Posts: 95 ✭✭krinDar


    * means 0 or more occurrences of the previous RE.
    . (<period>) matchs any character except new line.
    Therefore the RE '.*' matches 0 or more occurences of any characters.

    The RE you give will match what you want, but be careful as it will match *any* file that has
    '.ht' anywhere in the name e.g important.ht.exe

    Check out regexp(5)


  • Advertisement
  • Registered Users, Registered Users 2, Paid Member Posts: 14,174 ✭✭✭✭Lemming


    Originally posted by krinDar
    * means 0 or more occurrences of the previous RE.
    . (<period>) matchs any character except new line.
    Therefore the RE '.*' matches 0 or more occurences of any characters.

    Ok, so let me just clarify this.

    \.vb* will give me an RE that checks vb(vb n times)

    \.vb.* will give me an RE that checks all paterns with .vb in them? (eg. vbs/vbe)


    Check out regexp(5)
    MyBox# man 5 regexp
    No entry for regexp in section 5 of the manual
    

    pants ....


  • Closed Accounts Posts: 5,563 ✭✭✭Typedef


    man re_syntax

    * a sequence of 0 or more matches of the atom
    + a sequence of 1 or more matches of the atom
    ? a sequence of 0 or 1 matches of hte atom
    {m} a sequence of exactly m matches of the atom

    . matches any single character
    \k (where ks is a non-alphnumeric character) matches that chcaracter taken as an ordinary character, a.g. \\ matches a blackslash character.

    rtfm ; )


  • Closed Accounts Posts: 95 ✭✭krinDar


    Originally posted by Lemming
    Ok, so let me just clarify this.

    \.vb* will give me an RE that checks vb(vb n times)

    \.vb.* will give me an RE that checks all paterns with .vb in them? (eg. vbs/vbe)

    You can use either, they both do the same thing really.


  • Registered Users, Registered Users 2, Paid Member Posts: 14,174 ✭✭✭✭Lemming


    Originally posted by Typedef
    man re_syntax


    rtfm ; )

    hehe ... helps if you know that said man page exists to rtfm in the first place ;)


  • Closed Accounts Posts: 5,563 ✭✭✭Typedef


    : )


  • Advertisement
  • Registered Users, Registered Users 2, Paid Member Posts: 14,174 ✭✭✭✭Lemming


    Originally posted by krinDar
    You can use either, they both do the same thing really.


    Hmm .. well the understanding I had was that they don't

    .*\.vb* checks for name.vb, or name.vbvb, or name.vbvbvb(n times)

    whereas

    .*\.vb.* checks for name.vb(x), or name.vb(xy) etc.

    By the man page so graciously suggested that I rftm (courtesy of Type :p ), it would appear that to do this you actually would type:

    .*\.vb. to do a search for name.vb(x), but something like .*\.vb.. or .*\.vb.* to search for anything more than .vb(x)


    yes ? No ?


  • Closed Accounts Posts: 5,563 ✭✭✭Typedef


    I think so.


  • Closed Accounts Posts: 286 ✭✭Kev


    Originally posted by Lemming


    .*\.vb* checks for name.vb, or name.vbvb, or name.vbvbvb(n times)

    that would check for name.vb or name.vbbbbb or name.v


  • Closed Accounts Posts: 5,563 ✭✭✭Typedef


    use *\.vb.


  • Registered Users, Registered Users 2, Paid Member Posts: 14,174 ✭✭✭✭Lemming


    Cheers guys :)

    now to see what other Regexp situations I can come up ......


  • Registered Users, Registered Users 2, Paid Member Posts: 14,174 ✭✭✭✭Lemming


    And she's back .........

    lets say I want to filter for .com or .cmd extensions,

    instead of putting in:
    .*\.cmd
    .*\.com
    or .*\.(cmd|com)

    can I do the following:

    .*\.c{1,}o?m{1,}d?

    to get the same effect?

    From what I /think/ it will interpret is the 'c' character once, 'o' occurs 0-1 times, 'm' occurs once, and then 'd' occuers 0-1 times therefore leaving me with the following possibilities:

    .cm
    .cmd
    .com
    .comd


    that right ?


  • Closed Accounts Posts: 286 ✭✭Kev


    {1,} mean 1 or more times. the + modifier is also a shortcut for this and looks nicer.

    so it would match multiple c's and d's

    if you want to filter for just com and cmd.

    .*\.c[om]d$

    the $ mean match the end of the string.


  • Registered Users, Registered Users 2, Paid Member Posts: 14,174 ✭✭✭✭Lemming


    Originally posted by Kev
    {1,} mean 1 or more times. the + modifier is also a shortcut for this and looks nicer.

    so it would match multiple c's and d's

    if you want to filter for just com and cmd.

    .*\.c[om]d$

    the $ mean match the end of the string.


    oops .. that should have been c{1}o?m{1}d?

    but anyway ..... doesn't the [] only allow the matching of one character? So I could match either 'o' or 'm', but not 'om' ??


  • Closed Accounts Posts: 286 ✭✭Kev


    yes it would only match one o or m, if you just want to match cmd or com then you only need one, for more use + or {x,y}

    also {1} is redundant.

    c{1}o?m{1}d? will match all of

    comd
    com
    cmd
    cm


Advertisement