Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Data Mining/Building HTTP Requests

  • 25-03-2010 6:09pm
    #1
    Registered Users, Registered Users 2 Posts: 10,148 ✭✭✭✭


    Hey, I'm looking for some help with a side-project I'm working on.

    I'm trying to scrape a number of ID's from a website, and then use those ID's to build up a URL which will get me particular data for that ID. The problem I'm facing is that the website I'm hitting is just not properly processing the requests I'm making (I think by going through the website it's adding some sort of cookie or session). I don't care how I get the information I'm looking for, just want any solution!

    The website I'm trying to scrape from is at http://www.tse.or.jp/tseHpFront/HPLCDS0101E.do?method=init&callJorEFlg=1

    Do a search and you'll get a number of search items. Click "Display Stock Price" and you get a page of information for that stock. Basically, I want to try and get all the data off the page for all stocks.

    Any thoughts, ideas or suggestions on how I do this?


Comments

  • Registered Users, Registered Users 2 Posts: 885 ✭✭✭clearz


    Yes it seems that it is using some sort of method to see if the request is coming from the same place as the search. It could be a cookie or a session or the referrer http header. You can always check this by copying and pasting the url into another browser or a private session in your current browser.

    When I am developing a scarper for any site I use wireshark if I run into problems to see what exactly is being sent in the requests and responces.


Advertisement