Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

PHP to parse HTML

  • 15-03-2011 1:19pm
    #1
    Closed Accounts Posts: 27,857 ✭✭✭✭


    Hey there,

    There's a series of webpages from which I want to retrieve certain data. I've managed to figure out how to do it pretty much, except I'm getting errors due to all of the apostrophes (') and quotation marks (").

    This is the script I'm using:
    $str = '
    
    <!-- HTML goes here -->
    
    ';
    	$DOM = new DOMDocument;
    	@$DOM->loadHTML($str);
        $DOM->preserveWhiteSpace = false;
    	$DOM->validateOnParse = true;
    
        $tables = $DOM->getElementByID('report');
    	$rows = $tables->getElementsByTagName('td');
    	foreach ($rows as $row) {
    		$att = $row->getAttribute("headers");
    		if($att == "header5") {
    			$v = $row->nodeValue;
    			echo $v.'<br />';
    		}
    	}
    


    I was just using the above to test whether I'm able to parse the correct data, and I am, but it requires me to go through the HTML and delete/whatever all the characters that are confusing PHP.

    Obviously what I want to do is just pass the URL into the script, have it go through the elements and give me the relevant information.

    How can I avoid the parse error I'm receiving?

    And also, what method/s can I use to just pass a URL/file rather than a string?

    Cheeeeeeeeers


Comments

  • Closed Accounts Posts: 27,857 ✭✭✭✭Dave!


    Well that was pretty easy :p
    <?php
    //$str = 'Dave';
    
    $str = file_get_contents('http://www.nwhc.usgs.gov/publications/quarterly_reports/1995_qtr_1.jsp');
    
    	$DOM = new DOMDocument;
    	@$DOM->loadHTML($str);
    	
        $DOM->preserveWhiteSpace = false;
    	$DOM->validateOnParse = true;
    
        $tables = $DOM->getElementByID('report');
    	$rows = $tables->getElementsByTagName('td');
    	foreach ($rows as $row) {
    		$att = $row->getAttribute("headers");
    		if($att == "header5") {
    			$v = $row->nodeValue;
    			echo $v.'<br />';
    		}
    	}
    
    
    ?>
    
    


Advertisement