Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Wordpress and duplicate content

Options
  • 31-08-2009 12:55pm
    #1
    Registered Users Posts: 6,464 ✭✭✭


    Just noticed that google has indexed 130+ pages on my 2 post blog.
    I presume I should be using robots.txt or htaccess entries to control this.
    Any tips on how to do this properly?

    From looking at site:xxxx.com, it looks like I should be excluding /wp-content, and probably /tag.

    Should I just disallow these in robots.txt? Is there anything else I'm missing, don't want to accidentally exclude content.

    [edit]
    Just to clarify, it's a WP install on my own site


Comments

  • Registered Users Posts: 1,452 ✭✭✭tomED


    Why not try the canonical tag - there's a plugin here : http://yoast.com/wordpress/canonical/

    Not sure how effective it will be - but a good place to start. I've been meaning to try out mine for sometime now.


  • Registered Users Posts: 6,464 ✭✭✭MOH


    Hmm. That looks interesting - so it basically just sticks a canonical tag in the header of each post? Seems a lot simpler than messing around with my htaccess, I'll giv it a shot, thanks.


  • Registered Users Posts: 1,452 ✭✭✭tomED


    Yeah that's the idea.

    I've just installed it on my own blog. Don't know how well it will wor. But will see how it goes!

    Tom


  • Registered Users Posts: 6,464 ✭✭✭MOH


    Actually, just realised that's not going to work (and the All in One SEO pack already puts in the canonical tag for me).

    Canonical tags will still help with the issue Joost describes, but I'll need the robots.txt too.

    www.site.com/2009/post1
    www.site.com/tag/tag1
    www.site.com/category/cat1

    These are all going to list some amount of duplicate content as excerpts from post1 are listed on the tag and category pages.
    To prevent it indexing the same stuff multiple times, I'll need to exclude google in my robots.txt from:
    /tag
    /category

    but make sure it still has a way of getting to the individual posts.


    [edit]
    Proof if any more were need that I'm an idiot. The All in One SEO plugin has checkboxes for both Categories and Tags, to put a robots meta noindex, nofollow in the header. So turning those on should do it.

    I do still need the robots.txt to stop google indexing my wp-plugins (thought that would have been in the default setup?), but I'm getting there now.


Advertisement