Blog Scrapers Imagine a Magical Concern

RSS Scraping

Original Photo by Bret Arnett

Since there have been blogs, there have been people who steal your content. I’m not taking about borrowing your thoughts or words under a Creative Commons license, I’m talking about directly stealing your content to house on blogs loaded with Google Adwords or other advertising. Actually, for some of these blogs I’m not even sure what the point is. I’m not sure I understand a lot of the scraping and comment spam I’ve seen. If anyone has a good post on it, let me know in the comments.

Shel Holtz briefly introduced CopyGator during episode #416 of The Hobson and Holtz Report last week. CopyGator is:

…a free service designed to monitor your RSS feed and find where your content has been republished in the blogosphere. We automatically notify you when a new post of yours is copied to another feed, we also build an overview page you can view to see how/when/where your content is being duplicated, quoted or plagiarized.

It’s a great idea, but one I haven’t been able to test it out yet.  I’m looking forward to it, as in the past I’ve found a surprising amount of my content posted to other sites, which, while flattering, is annoying.

So while monitoring the blogosphere for some client mentions today, imagine my surprise when I found this bizarre review of a product with the strangest non-native-English-speaking tone to it, i.e.

  • “Imagine a magical concern where you read text scribbled by a kinsfolk member in their poorest cowardly scratch” or
  • “Make trusty to yield a interpret here to intend this terminal entry.”

And while absolutely hysterical to read in the Engrish Funny kind of way, it just shows that for every tool created hackers, scrapers and spammers will figure out a way around it.

Upon further review, I did discover the original blog post written about my client’s product. So apparently scrapers are now taking your content and running it through some sort of thesaurus program or other word-altering script so you can’t easily locate them, except that the product name was still in there along with the images. Not cool. CopyGator appears to work on the feed, not the content, so I look forward to delving further into that and seeing how it works.

So if you find your content being scraped you might want to look into CopyGator. Has anyone tried it? Thoughts? Comments?

For kicks, I just wish I had whatever program they were running this content through. It would be fun to push some classic poems or literature through it, i.e.

  • “To have being or to not exist, that is the interrogatory statement”
  • “Times being the most plentiful, also worst of all were the times.”
  • “More than one pathway did fork in timberland, and myself taken to me the unihabited choice, and that has made all the expression of the form f(x + h) − f(x).”
Advertisements

3 Responses

  1. Hi, Luke –

    The thesaurus trick is a new one on me. We’ve had problems with a blog scraper for a long time, but one persistent scraper in particular got my goat – especially when they ignored a series of requests to stop.

    So I posted a screen capture of their site – including our stolen blog post – to our blog. When they scraped it, I posted another screen capture. And another. The result: what I modestly consider to be a fine work of recursive performance art.

    One little bonus: the post where I explained to our readers what I was up to, and why, is now the number three Google result for the scraper’s name.

  2. Rob,

    That’s hysterical. I love the concept of fractal blogging. Interesting how Goggle turned around and bit them back from your post, nice work.

    It’s an interesting dilemma. Up until recently, I’ve been prone to ignoring it, but I may have to look further into it. Have you tried CopyGator? Is it something you think you’d find value in?

  3. I’d heard of it in passing, but your post was the first I actually read. 🙂 I can imagine it would be very useful… if only to feed my seething rage. Unfortunately, my experience with actually getting a scraper to stop misbehaving hasn’t been too successful so far.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s