Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

Help needed with preg_match regex

  • 22-07-2011 12:36PM
    #1
    Registered Users, Registered Users 2 Posts: 1,127 ✭✭✭


    Hi guys

    I have this line in a text file
    X Y ZZZ 444444 <a name="6">developer name<a>
    

    And I need to extract certain values from it. So far I have this
    $developer_details = preg_split("/[\s,]+/", $line);
    

    which will make an array, so the first 4 elements are populated fine (based on creating array items of values separated by space or comma). However, I need to take the 5th element as a whole string. But there might be more elements in between 4 and 5. So the pseudo code would be

    Extract all elements separated by a space or comma as separate array items, except if you encounter a HTML tag, in which case, take the string as a whole (match open and closing tags).


    The one consolation is that I know the HTML is well formed.

    Any ideas?
    Tagged:


Comments

  • Registered Users, Registered Users 2 Posts: 89 ✭✭tehjimmeh


    Would this not work? (note: I don't know PHP very well at all)
    $developer_details = preg_split("/[\s,]+/", $line);
    for($i=5; $i < count($developer_details); $i++)
       $developer_details[4] .= " ".$developer_details[$i];
    

    EDIT: Actually I think it'll only work if you can guarantee there'll be no commas in the 5th item.


  • Registered Users, Registered Users 2 Posts: 1,393 ✭✭✭Inspector Gadget


    If the format of the file you're reading is fixed, and looks like what you've got there, then maybe preg_split() is the wrong function. You could write a regex that matches the whole line at once (i.e. matches each desired item, assuming there'll always be five items), or perhaps take every match whose index is greater than 3 (0..3 should be your first four terms) and implode() them together?

    There are a lot of ways of skinning this particular cat, but it's possible that you haven't provided enough examples of what you're parsing?


  • Subscribers Posts: 9,716 ✭✭✭CuLT


    If you're trying to parse free format HTML, regex alone won't do the trick, you'll need a HTML parser.

    If you know it's always going to be some text followed by a single "a" element containing everything, then it's straight forward and can be broken into two simple expressions:
    [php]
    <?php
    $string = 'X Y ZZZ 444444 <a name="6">developer name</a>';

    /* Breaks the string into two components, everything before the a tag and everything after */
    preg_match('/(^[^<a]+)([<a].*)$/', $string, $matches);

    /* Splits the first components on space or comma */
    $split = preg_split('/[\s,]/', trim($matches[1]));

    var_export($matches);
    var_export($split);
    ?>
    [/php]


Advertisement