Monitoring The Conversation
One of the side effects of the rise of weblogging is that companies are becoming keenly interested in what people are saying about them. People are increasingly turning to search engines to gather information about the experiences of other customers before they consider doing business with a company. This has led to the growth of reputation management services that “mine” weblogs, message boards, and any other relevant websites to find what is being said, and in some cases to “counter” that information with their own.
Sometimes that strikes me as a bit “big brotherish,” but I suppose as long as the company joins the “conversation” (as these reputation managers/miners like to call it) in a honest fashion, it’s probably not too bad. At least it gets them talking. It’s when companies try to astroturf blogs or message boards that I find the practice reprehensible.
It was almost by accident that I discovered these services. I tend to review my referrer logs frequently in an effort to block abusive spam referrer entries. I don’t remember the name of the service anymore, but it showed up because it was using its own robot to spider my site, and it was hitting the site frequently enough that it looked abusive. Because there are so many robots already hitting the site, I tend to block anything that hits the site too often, other than Google, Yahoo, and MSN (and even for those I use robot.txt directives to slow them down to prevent performance impacts). So whenever I discover one of these services hitting the site I just block its referrer, user agent, and/or IP, because I don’t trust them to obey robots.txt.
But this morning I discovered another service through an almost accidental referrer entry. It’s called Conversation Miner and it’s from a company called Converseon. In this case they appear to be using Google’s cache to get my page data, thereby circumventing the need for a robot of their own, and also preventing me from easily blocking them (or at least that’s how it appears to me, as I didn’t see any obvious activity in the past few days that appears to be a Converseon robot). It’s actually a smart strategy. Since Google is already spidering the web, and monitors blogs, why bother reinventing the wheel and having to index and store all that content? It’s simpler to just mine Google’s database by searching for information that’s directly relevant.
Anyhow, the way I discovered this was with this referrer entry:
http://72.51.39.238/~jdoak/converseon/ pagerecord.php?id=436647
Curious as to what that was, I attempted to follow the link and was greeted with an HTTP authorization prompt for “Conversation Miner.” A quick search on Google took me to Converseon’s page on the product. But when I checked the logs, I discovered that the page at the above link was just loading my CSS file and nothing else. This is an indicator that it’s using a cached copy of the content, but that the cache was not scrubbed to remove all external references. I often see similar entries for users who hit the Google cache for my page (it only loads CSS and graphics, and the graphic file loads cause entries in my image hotlink log that are very distinctive). In this case they seem to be ignoring images and just loading the CSS, though.
The only other information I could glean is that the IP is a server hosted by Server Beach and the IP of the user (not shown here) was from a Speakeasy DSL user in Los Angeles, CA.
Whomever you are, ~jdoak, I’d be really interested to know why you’re monitoring my site and/or what triggered Converseon’s interest in it.
Hi Aubrey! I am the mysterious jdoak.
Your site made it into our system because you had a post that was discussing a topic that is important to one of our clients (I’m assuming you know which post it is). We pull results a variety of ways, often starting with one of several different search engines and then using our own technology to screen those results. Once they have been pre-screened, one of our employees will actually visit the blog post and read it to determine if it is of interest to our client; the hit you saw was via that system.
Let me know if you have any further questions or want to learn more about what we do. Nice detective work btw.
Thanks for stopping by.
I actually don’t know which post you were hitting, because all I saw was the referrer loading my CSS file. But I suppose if it’s of sufficient interest to your client I’ll eventually find out.
You could block Google from caching your website, though it would be kind of pointless unless you are worried about scrubbing private information.
If Jeff Doak reads this again, and it is proper to have this sort of third party conversation, is Jeff related to Jack Doak? I mean being from Los Angeles, and all.
I am related to a Jack Doak, but he’s my 6 year-old son.