Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Data Scraping from Website

  • 18-02-2014 1:44pm
    #1
    Registered Users, Registered Users 2 Posts: 72 ✭✭


    HI can anyone tell me any good user friendly software out there in order to scrape data from a website which i am developing an app for - rather than using the WebView in the API for Android where it would take a web representative view of the site, i wish to just scrape certain dynamic items instead??

    Thanks


Comments

  • Registered Users, Registered Users 2 Posts: 18,272 ✭✭✭✭Atomic Pineapple


    Generally the best approach is to do it yourself server side using something like PHP, if its small and the content is under your control then you can parse on device in Android using JSoup but I wouldn't recommend it and if the content is under your control it would be better to create a web service to feed the mobile application some JSON.


  • Registered Users, Registered Users 2 Posts: 72 ✭✭shanard


    Thanks but i new to this android app development and have not used the above - hence why i was wondering if any good free user friendly scraping tools out there or what would you advise for a novice??

    Thanks :-)


  • Closed Accounts Posts: 19,777 ✭✭✭✭The Corinthian


    If you're certain you want to go down this road, rather than write your own scraping system, then you might want to look for Web based toolkits and scrape from a server and then allow your users to download the resultant, parsed data from it.

    Otherwise, you'll have difficulty finding such a toolkit for Android, and TBH, I've generally found that doing it yourself actually works out as being less hassle in the long term, than becoming dependant on a third party toolkit, that may not quite fit the bill.


  • Moderators, Society & Culture Moderators Posts: 17,643 Mod ✭✭✭✭Graham


    scrape from a server

    +1

    Definitely scrape from your own web server rather than trying to do it on the device.

    Scraping on your server also allows you to cache results if necessary which is much politer than putting additional load on your targets server/s.

    I'd also recommend you consider how the site you're scraping is going to react. If it's a government data source you're probably covered by the re-use of public information' rules. If you're thinking of scraping live ticket prices from somewhere like Ryanair, you probably won't get a great reaction. Just because the data is there and you can see it, that doesn't mean you can just help yourself to it for your app.


  • Closed Accounts Posts: 19,777 ✭✭✭✭The Corinthian


    Graham wrote: »
    If you're thinking of scraping live ticket prices from somewhere like Ryanair, you probably won't get a great reaction. Just because the data is there and you can see it, that doesn't mean you can just help yourself to it for your app.
    They'll block your IP and/or change how their data is encoded so as to break your parsing algorithm. Failing all else they may bring you to court.

    If they break your parsing algorithm, then doing the parsing in an app will present update problems. Parsing on the server means you can adjust your code rapidly without requiring users to update their app.

    But if you're parsing on a server, then it makes you an easy target to block your IP, as you'll be sucking down all their data from that server.

    So here's an alternative, hybrid strategy. Have the user apps suck down the raw data and relay it to your server, where you parse, cache and send it back to them nice and processed.

    Naturally though, this would not protect you from lawyers.


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 124 ✭✭shanefitz360


    import.io


  • Moderators, Society & Culture Moderators Posts: 17,643 Mod ✭✭✭✭Graham


    import.io

    If you can get over the app crashing every few seconds and you're happy to build an app on top of something with no pricing listed yet. Go for it.

    There's also kimonolabs.com, which is still in Beta but does at least give you an idea what they'll be charging.


  • Registered Users, Registered Users 2 Posts: 72 ✭✭shanard


    HI Folks,
    I found out how to scrape the website - but we are currently waiting on permission to access their data - whether it be via a cloud or database i am not sure at the moment - however if it is through a database, is their any particular tool / software we would use to access this or is it complicated??

    Thanks in advance


  • Moderators, Society & Culture Moderators Posts: 17,643 Mod ✭✭✭✭Graham


    If you're accessing their data directly from their database then there won't be any scraping.

    If the data is intended for an App, there would usually be an API which your app will access to get the data. If they're giving you access to the database you/they will need to build an API if they don't already have one.


  • Closed Accounts Posts: 19,777 ✭✭✭✭The Corinthian


    shanard wrote: »
    I found out how to scrape the website - but we are currently waiting on permission to access their data - whether it be via a cloud or database i am not sure at the moment - however if it is through a database, is their any particular tool / software we would use to access this or is it complicated??
    Microsoft Word. Or Open Office. Or Word Perfect. Or, most likely, Adobe Acrobat.

    If they're opening up the data for you to access remotely then what they will be doing is exposing an API, either so you can query the database directly, or via some Web service or other TCP/IP based protocol.

    Whatever the means, they'll be giving you an API document describing this means, written in some format such as Microsoft Word, Open Office, or as a PDF.


  • Advertisement
Advertisement