Questions about Duplicate Content
Q: With our platform, we cannot customize URLs in our content library. so we have hundreds of URLs that are /content-library/files/72, only the number changes. are those duplicate URLs?
A: In most cases you can edit your .htaccess file for URL rewrites, which I strongly suggest. If you’re on Windows hosting and using something like .NET (IIS 7.0) you can use this URL rewrite extension (which prevents Uppercase/lowercase letters to become duplicate URLs) and get that nice SEO-friendly URL structure which you really want in the long run.
HOWEVER, if you still can’t rewrite your URLs and have to keep your existing structure, then your particular URLs aren’t necessarily causing duplicate content. Use this search operator in Google to check (in the Google search bar) site:yoursite.com/content-library/files/ then browse through what’s indexed in Google and check to see how many URLs (if any) have the same page/meta titles and same meta description. Do a full site crawl to check for sure.
Q: Is this important for a simple site with maybe 30 or 40 pages?
A: Yes because duplicate content becomes amplified even more with smaller sites. Think of it in terms of ratios. If even 5 pages of your site happen to be duplicate then that’s 12% of the entire site! With all sites (large or small), it’s especially important to pay attention to the ratio of duplicate content within each meta title., that the actual content within each meta title isn’t duplicate. Sometimes the brand name clogs up much of your desired 70-character limit and then you end up with a whole site with meta titles that are largely duplicate.
Q: What columns should be included in the keyword map/list?
A: I always use the following: URL, primary keyword(s), secondary keyword(s), meta description, meta title, header tag, on-page content (actual text of the page), image files/URLs of image files and DON’T even think about meta keywords.
Q: If you have content that is same copy on your website, squidoo, pinterest, etc. is this duplicate content?
A: Oh boy, yes! A quick content dupe check/test I like to do is just put a block of text in quotes and search in Google (example “this potential block of duplicate content that may be residing on Squidoo and your site…”). Then look to see if that block of text is contained on any other URLs in the Google SERPs. Always write unique content for every separate platform whether it be social media, Squidoo page, etc. Note, this duplicate content is treated differently than traditional syndicated content.
Q: How do you avoid duplicates when you have products in multiple categories?
A: Great question. You can change your navigation to link to only ONE parent category and don’t let those products reside in another UNLESS you are planning to de-index the other categories using the meta robots tag or rel=”canonical”. Now, this is may seem extreme but may be necessary not only for Panda penalties but for overall ranking ability of those particular products. Do a quick check in Google using this search operator (site:yoursite.com “your product”) and see how many URLs are showing up duplicate in the SERPs.
Q: What steps in a redesign can I take to avoid/remove duplicate content?
A: If your site is small enough, you should map every URL of the old site in a spreadsheet and list out the “plan” for each page on the new site. Ensure the setup of your new site is complete with proper rel=”canonical” and 301 redirect every URL to the equivalent URL of the new site. Make sure that the index/home page is properly “canonicalized” and that internal links including the navigation isn’t sending mixed signals by linking to two variations (http://yoursite.com and http://www.yoursite.com/). Also remember to prevent any staging from getting indexed via the robots.txt file (if applicable).
Q: We archive newsletters that generally includes blog posts, which are also housed on the same site. Will no follow tags address the duplicate content between the two?
A: If I understand you correctly, blog posts are converted to newsletters which are then archived separately from the original blog post and both reside on the site? If so, then yes use meta robots “NOINDEX, FOLLOW” to allow the bots to continue through your content but don’t allow both versions of the post to remain indexed.
Q: For a travel site we’re using destination descriptions on itinerary pages that are similar. Does using the blockquote tag (html5) prevent this from being considered dupe content?
A: It’s good you’re employing something to protect your duplicate destination descriptions but HTML5 blockquote was designed to external sources. If it’s a case where your site is generating a new itinerary for every client and then resulting in dozens, hundreds, thousands of pages that aren’t unique I’d make sure to de-index them all via meta robots “NOINDEX, FOLLOW.”
Q: We send Press Releases across the web that get picked up on freewires. They are posted in multiple places online as well as on our website. Is this a bad practice?
A: Search engines treat syndicated content differently than regular duplicate content BUT I don’t recommend posting the press release on your own site (or if you really want it for branding/usability purposes, etc, use meta robots “NOINDEX, FOLLOW”). Allow the freewires to do their thing and you won’t be penalized in any way for OTHER sites re-purposing your content but I have seen sites get outranked for their own press release content when it’s posted on the company’s parent site, and then the original content owner’s parent site getting penalized for duplicate content, ironically. There used to be a handy syndication source tag that was devalued in 2012 (but don’t rely on it.
Q: What percentage of duplicate content is acceptable on a page? Is there a certain percentage that is allowed that will not make the page appear duplicated?
A: Many SEOs agree that 95% unique content is a safe threshold – including code, text, images, everything. But, my experience has been that Google is fairly generous with this threshold when you’re using a code-heavy platform/CMS like Magento, WordPress, etc. as long as each meta title is completely unique, there are less than 5% duplicate URLs on the entire site, and there is no intentionally scraped content on your site.
Do your best to measure and monitor your duplicate content ratio but handle it properly, sit back, and have no fear.