Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Robots sucking bandwidth?

  • 04-02-2004 12:28pm
    #1
    Registered Users, Registered Users 2 Posts: 6,315 ✭✭✭


    I have recently put in a gallery on my site and have noticed that my bandwidth usage is off the richter scale.

    Do googlebot and other bots download every file they find?

    If so how do I stop them entering the /gallery/ directory?


Comments

  • Registered Users, Registered Users 2 Posts: 258 ✭✭peterd


    Many bots will obey your robots.txt file (if you have one), Google included. Lookup the http://www.robotstxt.org/ website for more info, but basically putting...
    User-agent: *
    Disallow: /gallery/
    

    in a robots.txt file in your web directory will keep them out of there.


  • Moderators, Politics Moderators Posts: 41,240 Mod ✭✭✭✭Seth Brundle


    The bots will download any files that are available to the public. If you are finding that you have all your bandwidth used then you have a problem with your hosting account - what happens if 1 million people want to view your gallery? Are you going to stop them because it is going to ruin your transfer allowance?


  • Registered Users, Registered Users 2 Posts: 6,315 ✭✭✭ballooba


    The gallery section is of interest only to a few people.

    I don't want bots downloading 45megs of stuff off the site everday.


  • Banned (with Prison Access) Posts: 16,659 ✭✭✭✭dahamsta


    Strange bots if they're downloading your images. Or strange gallery if it contains 45 megs of HTML.

    adam /confused


  • Registered Users, Registered Users 2 Posts: 6,315 ✭✭✭ballooba


    Originally posted by ballooba

    Do googlebot and other bots download every file they find?

    The term 'every file' including image files.


  • Advertisement
  • Banned (with Prison Access) Posts: 16,659 ✭✭✭✭dahamsta


    Most spiders download HTML only, usually up to a specified length. If your bandwidth usage is off the charts, either a spider is stuck in a loop (unlikely these days) or something else is happening. If you generate stats for the site, check them out; otherwise install something.

    adam


  • Registered Users, Registered Users 2 Posts: 476 ✭✭Pablo


    the ones i don't allow :
    # Rover is a bad dog <[url]http://www.roverbot.com[/url]>
    User-agent: Roverbot
    Disallow: /
    # Another annoying bot
    User-agent: ia_archiver
    Disallow: /
    # No point in having images stored like this
    User-agent: Googlebot-Image
    Disallow: /
    
    make sure it is in your root, and titled robots.txt and not robot.txt
    HTH


  • Banned (with Prison Access) Posts: 16,659 ✭✭✭✭dahamsta


    # No point in having images stored like this
    User-agent: Googlebot-Image
    
    That could be the kiddy right there, I forgot about Google Images. I don't think I've ever seen images from galleries in Google Images, but there's plenty of other sites out there that go looking for 'em. That being said, only the stupidest spider would go in and download every image on a regular basis.

    adam


  • Registered Users, Registered Users 2 Posts: 7,521 ✭✭✭jmcc


    Originally posted by dahamsta
    That being said, only the stupidest spider would go in and download every image on a regular basis.

    Unless it has completely purged its database, any returning robot should get 304 results indicating that the file/page has not changed.

    What you have to watch out for are muppets who hoover the complete site with website rippers like Xenu Link Sleuth and the like. These are best blocked using either .htaccess or via httpd.conf.

    Regards...jmcc


  • Registered Users, Registered Users 2 Posts: 476 ✭✭Pablo


    .htaccess is a good way to prevent people hotlinking your images. Well worth the little effort to put on inplace.


  • Advertisement
Advertisement