• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Add-on Stolen Content: Protection against scrapers, spammers and other unauthorized copying of content.

Alfa1

Well-known member
#1
Some members repost their content to multiple sites.
Some human spammers post copied content from other sites to look like valid users.
Other sites copy our content to monetize on it. Often without a link to our site.
Scraper sites benefit from mass copying.

This sort of activity often happens without the admin knowing it. This all has damaging impact on the SEO of a community.
It would be nice if a developer would release a solution to address this. Its just as important as dealing with dead links.

Basically what I am suggesting is a addon which checks if the exact content of forum posts exists on other sites.
Then list all copied posts and their URL.
And the function to automatically email a DMCA take-down request or a backlink request to the webmaster of the offending site.
And the function to submit a form to Google DMCA Dashboard. Some info.

This would deal with the most important issues.

Then there is the issue of forum members using copied content. These are most likely human spammers who will later go back and edit spam links into these posts. So its useful to be aware if users in certain usergroups are making use of copied content.
A list of such members would be useful.
It would also be useful to get a report if currently active (new) members do this. (of course as an optional setting)
 

Yugensoft

Active member
#2
It's an interesting concept. I have some questions about it:
  1. How pervasive is this, as far as you know? Is it just because your site specifically has become very large and popular due to good content?
  2. Do the scrapers make any attempt to spin or salt the content so it looks different than a pure copy-paste?
  3. About what percentage are backlinking to your content when they copy it?
  4. Are there any patterns in the types of posts that are being copied?
 
Last edited:

Alfa1

Well-known member
#4
1 & 2: As a whole its quite pervasive, but its made up of many factors which makes it so hard to manage.
There are scrapers who just copy entire sites, which of course is quite a hazard when it happens. See: https://encrypted.google.com/search?q=my+whole+site+copied On The Admin Zone we see threads about this now and then:
https://theadminzone.com/threads/scraper-sites-whats-the-drill-in-2017.143427/
https://theadminzone.com/threads/wh...en-your-site-is-being-duplicated-live.133472/
We encounter copy sites once every few years.

There are e-retailers who just need good content to add to their store. This happens a lot.

For example I have just searched google for the exact text from a random thread with more than 100k views. The text is original and was written especially for the site.
The exact text appears on:
  1. a shady site plastered with porn adds.
  2. a spammer posted the plain text without backlink on a comments site in order to spam related links. (no backlink to my site)
  3. Somehow google finds the exact text in Quora and actually ranks the Quora threads higher than the original. But when opening the threads up, the text is not in there. This is probably non-actionable.
  4. A vendor on tradekey has used the exact text in an advertisement.
  5. A scraper site with a generic url has copied the text from tradekey but refuses all non-bots.
  6. A vendor has put the exact text in their page source.
  7. A vendor has put up a fake forum claiming copyrights over the text just beforee the text starts. This is found in Google's cache, but the page redirects visitors (non-bots) to an online store.
The above is just a random example of a few minutes searching on google.

A different example. lets take a look at the most recent spammers banned through the spam cleaner.
A google search for the exact post content of the last post of a spammer results in dozens of sites dating back to 2010. It seems to me that the 2010 post was a legit post which has since then been reused by spammers around the net. A search for the other posts by this spammer shows the same pattern. it took a while for this spammer to be recognized because the posts seemed legit. This is certainly not a unique case. I see this frequently on different sites including TAZ.

Another example is a regular member, who likes to move up the ranks on various sites so he just copies content from other members and reposts it to forums and reddit in the same niche and acts as its his own. This happens a lot, because the member gets higher rank and status by doing this.

3. A very small percentage is backlinking. Sites that backlink are mostly the sites that scrape in order to display only a snippet with a clear link to my site. We have a lot of automatic links inserted in the forum post text, but this is removed from the text by all scrapers.

4. The types of posts are posts that have a lot of views and often have more than a few lines. My guess is that scrapers / spammers search google for high ranking content related to their topic and just copy that. For very long posts there is an affinity for the start of a post. (first paragraph)