Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Scraping a website using VBA

Options
  • 30-01-2016 7:09pm
    #1
    Registered Users Posts: 1,863 ✭✭✭


    I recorded a basic macro that does a data web import. Usually this brings in all the website data and i can then use other macros to filter out data I don't require. However, when trying to use the above method for this url (http://www.espn.co.uk/rugby/match?gameId=267769&league=267979) next to no data is returned. Why is this and is there anyway to return all the data that you can physically see on the above mentioned URL?


Comments

  • Registered Users Posts: 1,311 ✭✭✭Procasinator


    The actual link you have posted here has a ) at the end, which is breaking it, but I assume this is an issue with vBulletin rather than the URL you are using.

    The actual stat content is loaded with AJAX, rather than in the initial response. Looking at the AJAX requests made by the page, this is probably the URL you could get your data from:
    http://site.api.espn.com/apis/site/v2/sports/rugby/267979/summary?event=267769

    Which is an API that returns a JSON structure.

    Of course, the terms of service and/or limits on using their API is another issue.


  • Registered Users Posts: 1,863 ✭✭✭MuddyDog


    The actual link you have posted here has a ) at the end, which is breaking it, but I assume this is an issue with vBulletin rather than the URL you are using.

    The actual stat content is loaded with AJAX, rather than in the initial response. Looking at the AJAX requests made by the page, this is probably the URL you could get your data from:
    http://site.api.espn.com/apis/site/v2/sports/rugby/267979/summary?event=267769

    Which is an API that returns a JSON structure.

    Of course, the terms of service and/or limits on using their API is another issue.

    Thanks for the quick reply. The URL should not end with a ). So should I be using a data import and using that URL you provided or something else? When I use that URL for a data import it keeps trying to download it instead of importing it!


  • Registered Users Posts: 1,311 ✭✭✭Procasinator


    I haven't done tasks like this using VBA/Excel before, but I assume it's an actual browser environment (IE shell) doing the work. More than likely when IE in this mode sees the Content-type of "application/json;charset=UTF-8" it thinks downloading is the best course of action.

    Look up how to make requests and parse JSON with VBA. Quick glance and appears this is something that can be done. It should be easier than parsing HTML, and less fragile to changes as well as it looks like ESPN versions their APIs.


  • Registered Users Posts: 1,863 ✭✭✭MuddyDog


    I haven't done tasks like this using VBA/Excel before, but I assume it's an actual browser environment (IE shell) doing the work. More than likely when IE in this mode sees the Content-type of "application/json;charset=UTF-8" it thinks downloading is the best course of action.

    Look up how to make requests and parse JSON with VBA. Quick glance and appears this is something that can be done. It should be easier than parsing HTML, and less fragile to changes as well as it looks like ESPN versions their APIs.

    I've looked it up but am getting a lot of conflicting results. Lots of JSON to CSV converters which you'd then import into excel etc. I can't seem to find code that would look at a URL and if it's json code contained within, then spit it out onto a worksheet within excel.


Advertisement