Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Regular Expression Help

Options
  • 23-04-2004 4:32pm
    #1
    Registered Users Posts: 6,762 ✭✭✭


    I'm looking for a regular expression to match HTML tags with specific attributes & values. The only case-sensitive value would be the id attributes value.

    For example
    These must all match:
    Seacrhing for a tag which was a table with attribute values of "tblMainArea" and "server"
    1. <table id="tblMainArea" cellSpacing="0" cellPadding="0" width="100%" align="left" border="0" runat="server">
    2. <table id="tblMainArea" runat="server" cellSpacing="0" cellPadding="0" width="100%" align="left" border="0" >
    3. <table cellSpacing="0" cellPadding="0" id="tblMainArea" width="100%" align="left" border="0" runat="server">
    
    The reg ex must find all sub-strings to be a match.

    This would fail
    <table cellSpacing="0" cellPadding="0" id="tblMainArea" width="100%" align="left" border="0">
    

    I haven't a lot of experience with Regular expressions so I have been trying in vain for a while now :(

    By the way I'm using .NET (C#) and am testing my regex's here

    Thanks!


Comments

  • Closed Accounts Posts: 304 ✭✭Zaltais


    Yeah you've a problem here in that you're trying to match two things in a string. Normally that's not a major concern, but in this case you don't know what order they appear in.

    Best thing in this instance is actually to match twice...

    I'm not a C#'er so I wont give you any code examples but basically

    if the string you're testing matches this RegEx
    <table.+?tblMainArea.+?>

    AND this one
    <table.+?server.+?>

    then do whatever


  • Registered Users Posts: 437 ✭✭Spunj


    I don't have time to do it all for you right now, but i will post a section of it and point you in the right direction...

    For this to work for any order of tags, you wil have to do the expression out again for each order and OR (|) each whole expression eg (exp1)|(exp2)|(exp3)

    So for 1 tag the expression will look like this:
    
    <table\s+?id=\"\w+?\"\s+?cellSpacing=\"\d{1,2}\"\s+?cellPadding=\"\d{1,2}\"\s+?width=\"\d{0,3}%\"\s+?align=\"(left)|(right)|(center)\"\s+?border=\"\d{1,2}\"\s+?runat=\"server\"[^>]>
    
    

    Any questions on why I did some things just ask.

    edit: doh theres a couple of mistake in there, I'll fix them after tea.
    edit: there you go that should do it...


  • Registered Users Posts: 6,762 ✭✭✭WizZard


    I actually half solved this earlier today. I don't have the exact solution regex to hand but I'll post it up tomorrow.

    It went something like this:
    <(?i)table.*(?-i)tblMainArea(?i).*server.*>
    

    And I used regex options of compiled and singleline so that it would cover multiple lines if needed (I could also have used [^..]* instead, but I have a feeling that .NET's RegexOption may be slightly better, correct me if I'm wrong here)

    However I do have to create two regex's as I'm not sure how to search for the strings in any order.

    Any idea's here?


  • Closed Accounts Posts: 304 ✭✭Zaltais


    Actually, ignore what I said in my previous post....

    This one checks for both in the same regex, seems I was getting my syntax confused yesterday, which is why it wouldn't work.
    <table.+?(tblMainArea|server).+?(tblMainArea|server).+?>
    


  • Registered Users Posts: 6,762 ✭✭✭WizZard


    Great, a combination of yours zaltais, and mine should do the trick. Thanks.

    If anyone knows how to check for the strings without using the alternate (|) operator can you tell me, as I intend to expand this regex in future to handle more than two strings, and so an alternate of four strings might be a little processing intensive?


  • Advertisement
  • Closed Accounts Posts: 304 ✭✭Zaltais


    I don't think that increasing the number of alternates to four will significantly increase the processing resources required. I think it's more likely to cause problems when the string you're actually matching against gets very long.

    If you're going to be doing a lot of html checking, it may be worth while using an HTML parser instead of a RegEx. (Naturally depends on what you're using the code to do.)

    A very quick Google turned up this

    No idea if it'll do what you want, but might be worth invesigating.


Advertisement