Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

Any regular expression experts?

  • 20-08-2007 03:36PM
    #1
    Moderators, Science, Health & Environment Moderators Posts: 9,221 Mod ✭✭✭✭


    I use a thing called the widgEditor for html editing in a browser mainly because it's unobtrusive and degrades nicely. Anyway I decided to allow tables to be pasted into the thing. The code in the widgEditor automatically cleans pasted word or excel stuff and does a nice job. It removes tables though so I have stopped it doing this. One bit:-
    [SIZE=2]theHTML = theHTML.replace(/(<[^\/]>|<[^\/][^>]*[^\/]>)\s*<\/[^>]*>/g, [/SIZE][SIZE=2][COLOR=#800000]""[/COLOR][/SIZE][SIZE=2]);[/SIZE]
    [SIZE=2]
    
    [/SIZE]

    strips out all empty tags. Now I need to strip all empty tags except empty td tags as this would be allowed. Any regex experts know what to change here?


Comments

  • Closed Accounts Posts: 4,943 ✭✭✭Mutant_Fruit


    This might help: http://www.regular-expressions.info/conditional.html

    Basically what you want to do is something like:
    if(tag is not td) then (remove tag if it's empty)

    I'd attempt to write it, but the actual regex syntax changes between languages, so i'd be wasting my time ;) Plus, it'd take me a while. If you can't get it figured out, i'll write it up for ya later, and you should be able to translate it then (if it needs translating).


  • Registered Users, Registered Users 2 Posts: 2,931 ✭✭✭Ginger


    I find this very handy

    http://tools.osherove.com/CoolTools/Regulazy/tabid/182/Default.aspx

    Use it to verify any regexes I need to build


  • Subscribers Posts: 4,077 ✭✭✭IRLConor


    Try:
    theHTML = theHTML.replace(/(?:<(?!td)[^>\/]*>\s*<\/(?!td)[^>\/]*>|<(?!td)[^>\/]*\s*\/>)/g, "");
    

    I used the following code to test it:
    #!/usr/bin/perl
    
    my $sample = "<tr id=\"foo\"><td><i>foo</i><b id=\"bar\"></b><b /></td><td /><td id=\"baz\"></td></tr><tr id=\"quux\" /><tr></tr>\n";
    
    print $sample;
    $sample =~ s/(?:<(?!td)[^>\/]*>\s*<\/(?!td)[^>\/]*>|<(?!td)[^>\/]*\s*\/>)//g;
    print $sample;
    

    Edit: I know that Perl's regex dialect is different to Javascript's (which is what I'm assuming you're using) but I think I've only used features that are in the JS dialect.


  • Moderators, Science, Health & Environment Moderators Posts: 9,221 Mod ✭✭✭✭mewso


    Thanks Conor. Works a charm. Was about to sit down and try my hand at this myself but I've always hated regular expressions. Thanks again. Thanks for the links guys. Next time I have a regex problem I promise I'll try it myself :)


  • Subscribers Posts: 4,077 ✭✭✭IRLConor


    No problem. Glad to be of help.

    Mastering Regular Expressions by Jeffrey Friedl is your friend if you need to use regular expressions a lot. Well worth the investment.


  • Advertisement
Advertisement