Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

Scraping info from website

  • 05-11-2010 08:52PM
    #1
    Registered Users, Registered Users 2 Posts: 207 ✭✭


    I was hoping to start downloading information from a website on a daily basis in order to analyse the data. The files are published in html and in xml format. I then want to store the data on the hard drive of a laptop that has gone passed its usefulness and operate it almost like a network.

    My query is there a “scraper” that can allow me to download this publically available information from the website on a daily basis, and put it in a database for analysis. I am mildly technically minded and obviously use computers on a daily basis but this is pushing the realms of my abilities and I apologise if the query is a bit naive

    Any help is appreciated

    Shakeydude


Comments

  • Registered Users, Registered Users 2 Posts: 2,370 ✭✭✭Knasher


    There is probably an application that can parse websites in some sort of automated way, but unless somebody can come up with a better suggestion I'd recommend just using regular expressions to parse the raw xml/html and pull the data you need into a database yourself. Provided they publish the data in a reasonably standardized way (which their use of xml would suggest they do) it really shouldn't be all that difficult.


  • Registered Users, Registered Users 2 Posts: 1,530 ✭✭✭CptSternn


    Yeah, if the data is already in XML format, what would you need a scraper for?

    Just write a script that will download the XML and put it into a database.


Advertisement