Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.

Scraping tweets from twitter.

  • 21-09-2014 03:54PM
    #1
    Registered Users, Registered Users 2 Posts: 225
    ✭✭


    I am looking to scrape and collect tweets from twitter given a certain search term.

    I have looked around the web and found various modules for python etc. which claim to do this. However, most of them are outdated and frustratingly difficult to setup (at least for me anyway!) so I have had no luck so far.

    Has anyone any experience or input on this topic?

    I tried a python module named tweepy this morning but after downloading it from the github repository and running into a seemingly never-ending number of errors when trying to set it up from the cmd, I gave up about 20 minutes ago.

    I'm hoping there is an easier way of doing this, appreciate any help anyone can give me!


Welcome!

It looks like you're new here. Sign in or register to get started.

Comments

  • Moderators, Society & Culture Moderators Posts: 17,642 Graham
    Mod ✭✭✭✭


    Take a look at the twitter api, you can access tweets without adding in the complications of scraping.


  • Registered Users, Registered Users 2 Posts: 401 irishbuzz
    ✭✭


    If you think your usage would be within rate limits you could simply use Twitter's API:

    https://dev.twitter.com/rest/reference/get/search/tweets

    Alternatively have a look at Scrapy or ScraperJS


  • Registered Users, Registered Users 2 Posts: 124 shanefitz360
    ✭✭


    Use the Twitter API with the pypi.python.org/pypi/twitter package


  • Registered Users, Registered Users 2 Posts: 225 TheSetMiner
    ✭✭


    thanks for the replies, I'll have a look into this today and hopefully have some luck!


  • Registered Users, Registered Users 2 Posts: 225 TheSetMiner
    ✭✭


    Use the Twitter API with the pypi.python.org/pypi/twitter package

    I vistited this link and downloaded the file but I am a little unsure what the next step is. Sorry I am a bit new to directories and installing modules etc.

    There was a python wheel file and a .tar.gz file there. I downloaded both. The former could not be opened and the latter opened just like a regular zip file.

    My question is do I need to put this entire file (zipped or unzipped?) file into my python34 "lib" folder or is there another destination. And will that be installed then or do I need to run setup.py from the cmd like I read somewhere else.

    Sorry for the questions but the documentation wasn't that clear on setup procedure.

    I'd greatly appreciate any help!


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 159 magooly
    ✭✭


    twitter4j


  • Technology & Internet Moderators Posts: 28,858 oscarBravo
    Mod ✭✭✭✭


    You probably shouldn't be downloading files from PyPI directly, but using something like pip:
    pip install twitter
    


  • Registered Users, Registered Users 2 Posts: 7,579 jmcc
    ✭✭✭


    Get the book "Mining The Social Web" by Matthew A. Russell. It is an O'Reilly book and is quite good on the subject. It does require some understanding and expertise with the Python language though.

    Regards...jmcc


  • Registered Users, Registered Users 2 Posts: 225 TheSetMiner
    ✭✭


    Thanks for all the helpful comments.


    I managed to install twitter using pip, which was preinstalled with python 3.4. I had some success using the search api to get 15 random tweets(very messily encoded in JSON I think) about a given search query but no such luck with the streaming api as I keep getting an error with the stream.py and api.py files when I try to do that.
    Deciphering the tweets from the messy string of JSON is going to be a task for regex I think.

    And has anyone any idea on what might be going wrong with the TwitterStream class?


  • Technology & Internet Moderators Posts: 28,858 oscarBravo
    Mod ✭✭✭✭


    Deciphering the tweets from the messy string of JSON is going to be a task for regex I think.
    import json
    


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 225 TheSetMiner
    ✭✭


    update: I saw an answer on stack overflow recommending Twython so I said I'd give it a quick go and thanks to their great, up-to-date documentation I managed to get the streaming api working in less than 15 minutes! Amazingly satisfying to watch the data flowing in on the python shell at last! Now my next step will be to figure out how to store the tweets along with their respective time of post and country of origin.

    Anyone know what the streaming limit is exactly? I've had the shell going non stop for about 10 minutes now, I wonder if I am getting close to the limit?


  • Registered Users, Registered Users 2 Posts: 159 magooly
    ✭✭


    http://www.eirwig.com uses twitter streaming API (Java + Spring MVC) and its 24/7.

    There is no streaming limit per se since as a developer connected to the streaming API you are only seeing approx 1% of random tweets globally.

    Re: tweet times and country the info is all there in the tweet object returned, you simply need to call the correct getters on the tweet.

    You will find all the info you need via the twitter streaming API docs, signup to dev.twitter.com and create your own app for the connection details.

    Its a great API and a very rewarding experience, you already know that.


Welcome!

It looks like you're new here. Sign in or register to get started.
Advertisement