Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Help scraping data from a website?

  • 12-05-2010 12:55pm
    #1
    Closed Accounts Posts: 3,689 ✭✭✭


    Hey,

    I'd like to scrape data from a single cell from a table on another website. They other website is happy to let me use their data. I've tried a few tutorials online but haven't managed to get it working yet.

    This method looks perfect but I'm having no joy with it.

    Here's a snippet from the source of the site showing the value I want to extract....

    [HTML] <td bgcolor="#EBEBEB"><i>
    <table>
    <tr>
    <td width="155"><font size="2">
    Guage22
    </font>
    </td>
    <td width="80"><font size="2">
    12 May 12:00
    </font>
    </td>
    <td width="100" align="center"><font size="2">
    -1.202
    </font>
    </td>
    <td width="100" align="center"><font size="2">
    11.400
    </font>
    </td>
    [/HTML]

    I'd like to be able to extract that 11.400 and echo it on the page.

    Would also like to truncate that to 1 decimal place, but first things first I suppose!

    Thanks for any help!


Comments

  • Subscribers Posts: 1,911 ✭✭✭Draco


    I've used simplehtmldom to scrape sites before with great success. There's a few good examples in their documentation.

    Once you have your number you can use sprintfto display 1 decimal place:
    echo sprintf("%.1f", $yourDecimal);


  • Registered Users, Registered Users 2 Posts: 21,263 ✭✭✭✭Eoin


    How far have you gotten with this? Obviously the first thing to do is get the entire contents of the remote page, but then you're going to need to identify what table cell you're getting the contents from.


  • Closed Accounts Posts: 3,689 ✭✭✭joeKel73


    Well according to this tutorial, after a bit of editing, this should work shouldn't it?

    [PHP]<?php

    // get the HTML
    $html = file_get_contents("URL");

    preg_match_all(
    '/<td bgcolor="#EBEBEB"><i>.*?<table>.*?<tr>.*?<td width="155"><font size="2">.*?<\/font>.*?<\/td>.*?<td width="80"><font size="2">.*?<\/font>.*?<\/td>.*?<td width="100" align="center"><font size="2">.*?<\/font>.*?<\/td>.*?<td width="100" align="center"><font size="2">(.*?)<\/font>/s',
    $html,
    $value, // will contain the value
    PREG_SET_ORDER // formats data into an array of posts
    );

    echo $value;

    }


    ?>
    [/PHP]

    If I run it all I get is a blank page. :confused:
    Have the actual url in of course.


  • Registered Users, Registered Users 2 Posts: 21,263 ✭✭✭✭Eoin


    I don't know PHP, but try printing out the whole page first to check that's working, and then you can confirm that it's your regular expression that's causing the problem.


  • Closed Accounts Posts: 3,689 ✭✭✭joeKel73


    I don't know PHP either really!

    I tried
    [PHP]echo $html;[/PHP]

    just under the "$html = file_get_conte......" line, but still a blank page. :confused:


  • Advertisement
  • Subscribers Posts: 1,911 ✭✭✭Draco


    You've possibly hit a fatal error then and you need to look in the error log (or turn on display errors). You'll find those setting in your php.ini.

    Instead of echoing out $html, try 'var_dump($html);'. It could be that the machine you are running the script on can't access the url.


  • Registered Users, Registered Users 2 Posts: 6,570 ✭✭✭daymobrew


    I often run my scripts through a syntax checker to rule out a typo in my code.

    Since the other web site is okay with you getting the data, could you ask them to make the data available in another form e.g. a web page that only has the cell you need?

    It could be that getting data from a url is disabled, for security reasons.
    Often you can use the curl functions to achieve the same thing - it's more powerful, and more complicated as a result.


Advertisement