Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

java regex

  • 05-02-2009 11:43PM
    #1
    Registered Users, Registered Users 2 Posts: 163 ✭✭


    I need two java regex expressions to parse the anchor text and url from a http link like this: "<a href="http://www.example.com/chapter2.html">chapter two</a>"

    so what i want to be left with is
    url: http://www.example.com/chapter2.html
    anchor: chapter two

    I have something like this for the url : "http://[a-zA-Z_0-9.-&/+=]+&quot;

    it works for simple urls but exotic characters mess it up, alos im using the " at the end to end the match, don't think this is the best way

    I have ">[a-zA-Z_0-9[\\W]]+<a/>" for the anchor, but they don''t seen to cover each eventually, has any one got a set that would handle any permutation ?


Comments

  • Subscribers Posts: 4,077 ✭✭✭IRLConor


    Untested:
    Pattern p = Pattern.compile("<a\s+.*?href=(?:\"(.*?)\"|'(.*?)').*?>(.*?)</a>");
    Matcher m = p.matcher(theStringToSearch);
    if (m.matches()) {
        String anchorText = m.group(3);
        String url = m.group(1);
        if (url == null) {
            url = m.group(2);
        }
    }
    


  • Registered Users, Registered Users 2 Posts: 163 ✭✭stephenlane80


    thanx, i will try it this afternoon, but it looks pretty good


Advertisement