June 17th, 2006
Duplicate Content in Blogs: The Problem
In recent days, the issue of duplicate content is getting far more attention than it used to, sparking some enlightening discussions over at forums such as WebMasterWorld. For those out of the loop, duplicate content is primarily an SEO problem, where it is believed that search engines (SEs) penalize sites that contain too much duplicate content.
If you need deeper understanding of the duplicate content issue, check out recently written detailed articles on the subject over at Stuntdubl and SEO by the Sea. After reading those, come back here for the implications of the duplicate content issue for blogs - in particular, the automatic generation of duplicate content by blog CMSes.
As most blog CMS users well know, whether for WordPress, Movable Type, or the many others that I can’t possibly list here, there is automatic generation of duplicate content through archive pages such as Monthly Archives, Category Archives or even Tag Archives (for those using the Ultimate Tag Warrior WordPress plugin), and of course, RSS feeds. And given the (mostly good) tendency of bloggers to provide links in their navigation to those archives and feeds (remember your Subscribe in your RSS Reader link?), search engines will definitely be given the opportunity to index them.
Of course, many of you might be asking: So what? Google indexes all my individual posts pages and archive pages - category and date archives. Indeed, for most of us, that’s the case. There seems to be sufficient difference in our archives to prompt search engines to index them, or perhaps search engines are just smart enough to detect what they are: archives. In addition, I’m sure many of you have seen your XML feeds being indexed in Google as well (though I’ve seen much less recently, and none for many of my blogs). It’s just that we never notice them in our searches because the original post pages have higher SE rankings.
So, should we even do anything about it? After all, no point fixing what ain’t broke, right? True enough, but as the primary search engines start taking tougher stands against duplicate content, or just want to lower their indexing overheads, we might very well see more and more penalties being given out.
And even if you’re looking at negligible penalties, duplicate content may very well prevent search engines from indexing your best version of the same content (usually your individual post page) if they choose to index only one of your many variants. To an algorithm, your category page might “feel” more contexually-relevant than your individual post page. But you can be sure your AdSense or YPN ads will be less relevant on your category page than your individual post page. Of course, this doesn’t necessarily happen very often, but if you check your traffic stats, I’m sure you have more than just a visitor or two entering from your archives.
What can we do about this? Plenty, but whether the effort is worth it is still unknown. Many of the most well-indexed blogs with thousands of indexed pages still have multiple locations of duplicate content, with date, category archives, plus individual posts all showing up in Google. In fact, I don’t see any reason why I’d want to do anything about it either. If all this is part and parcel of blogging, the search engines should very well adapt to the medium, instead of us adapting to the SEs. Nevertheless, if you are interested at what can be done, read my post tomorrow on the subject.
If you found this post useful, keep updated with future posts by subscribing to blogHelper (for free) through RSS or email.















2 Comments
June 19th, 2006 at 10:49 pm
[…] This comes a day late since “tomorrow” after my previous post discussing the duplicate content problem on blogs was published on the 17th, but that’s what you get in World Cup season, eh? […]
August 31st, 2006 at 10:46 pm
[…] Archive EffectivelyOne of the suggestions here is to tackle the pagination issue, which is deemed harmful to search engine (SE) traffic since it provides “constantly changing, duplicate content pages), via the “noindex” meta tag or robots.txt file. In June, I wrote on the duplicate content problem in blogs (and provided some possible solutions), but didn’t consider pagination to be an issue then. While I still don’t think it’s an issue now, it could be an issue for blogs already in trouble with SEs. […]
Leave a Reply