the state of the blog: past, present & future 06.28.2005, 6:37 PM
posted by dan visel
Since Ben's on vacation (you may have noticed the crickets chirping in his absence), I've been in charge of pruning the comment- and trackback-spam that if:book and the rest of our website generates. Hopefully, you haven't noticed much of this around here, but it arrives in ever-increasing volume: lately, we've been getting upwards of twenty comment-spams per day. They've become increasingly less coherent: while once they attempted to cajole our visitors to try out dubious sexual aids or patronize online casinos, the latest batch have been streams of random letters linking to websites that don't seem to exist.
To combat the problem (which I imagine is much the same at any blog), we've installed a Movable Type plugin that filters comments and trackbacks. It does a pretty good job: like a spam filter in a mail program, it can guess what spam is, and it learns quickly. One curious piece of its method, however, might have wider repercussions for how we read & use blogs: it automatically suspects comments made on older posts to be comment spam. This is, by and large, correct: there aren't a lot of people finding our old posts and leaving comments on them. But this does feel like we're increasingly killing off old discussions. This ties into my musings from two weeks back, when I wondered how well blogs function as an archive.
A discussion at Slashdot zooms out to look at the ever decreasing signal-to-noise ratio from the soi-disantblogosphere as a whole. Spam blogs – often created to drive up Google rankings, for example – are becoming ever more common; just as it's simple for you to create a blog, it's simple for a robot to create a thousand. At what point does the sheer volume of spam start turning users away?
A decent guess, if the history of forms on the web is any indicator, is that something new will arise. Mentioned in the Slashdot discussion is Usenet, the newsgroup-based discussion system. Spam first reared its ugly head on Usenet, and by the late 1990s had almost consumed it. As the level of spam rose, users departed - some, undoubtedly, to the comparatively safer environs of the blogosphere. What comes after blogs?
While on the history of blogs: Matt Sharkey has an interesting history of suck.com (here helpfully archived by its creator, Carl Steadman). Suck wasn't a blog as we know them (readers could email the author, but not directly leave comments for others to see), but it did premiere (in 1995) what would become a key concept of the blog, having fresh concept daily. It also brought snarky semi-anonymous commentators to the Web, and the idea of using hyperlinks for humor. They did get in five solid years, though, and the site is arguably an important milestone in the history of how we read online. Browsing through Steadman's archive provides food for thought about archives on the web: while it's still entertaining, you quickly notice that almost every one of the links is broken. Nothing lasts forever.
Posted by dan visel on June 28, 2005 6:37 PM
gary frost on June 28, 2005 11:25 PM:
5 years is short for "forever"!
Tell us more about IftFotB If:Book traffic. What referers? What characterizes authentic comments?
I can't recall a sustained thread at a given posting since the beginning.
dan visel on June 28, 2005 11:50 PM:
A demonstration: every time we get a comment (or a trackback, which is a little different) we get an email that looks something like this (for Gary's comment above):
>A new comment has been posted on your blog if:book, on entry
>#613 (the state of the blog: past, present & future).
>View this comment:
>Edit this comment:
>De-spam this comment:
>IP Address: XXX.XXX.XXX.XXX
>Name: gary frost
>Email Address: email@example.com
>5 years is short . . .
(I've taken out the real URLs & Gary's email address.) This is a "real" comment (=not spam); the program has run it through its filters and determined that it doesn't meet any criteria that define comment-spam, and so this comment was posted. If the program wasn't sure if the comment was "real" or spam, it would ask a moderator (me) what to do.
I'm sure comment-spam will turn up before the morning; I'll post that here too as a comparison.
dan visel on June 29, 2005 11:22 AM:
Okay, here's an example of comment-spam. This came in from Alex Itin's blog:
>MT-Blacklist has forced moderation of a comment made by an
>an unregistered user on IT IN place, on entry #134 (The Duality
>REASON: Old entry
>Edit this comment:
>De-spam this comment:
>IP Address: 22.214.171.124
>Name: Online Poker
>Email Address: firstname.lastname@example.org
>congrats mate! Fine job and fine site!
>[p][a href="http://online-poker-en.itp4kids.com" title="Online >Poker"]Online Poker[/a][/p]
This is typical in some ways. You'll note like the comment starts out looking realistic - this might be one of Alex's friends congratulating him ‐ but the last line, a link to an online poker site (I've broken the HTML so you can see it) reveals the economic motive here. However, you'll notice at the top that the reason the Blacklist plugin decided it was spam wasn't because it recognized the URL as being from a poker site or recognized the IP address as one of a known spammer, but because it was a comment on an old entry.
dan visel on June 29, 2005 11:40 AM:
And, just for the sake of completeness, here's a slightly different variety, the trackback spam. Trackbacks were originally meant to be ways to see easily who's linking to a particular entry on your blog. What ends up happening, however, is that spammers fake them; here's one that turned up this morning:
>A new TrackBack ping has been sent to your weblog, on the
>entry 558 (reading manga on Sony Librie).
>De-spam this ping:
>IP Address: 126.96.36.199
This isn't actually from someone else's blog; it's someone advertising their ebay auction (of collector cars ‐ this isn't remotely related). The reason people do this is for the free advertising: blogs usually link to people who link to them with trackbacks, and the spammer imagines that we won't notice that collector cars have nothing to do with reading manga on a Sony Librié. Here, though, the plugin can't tell that this is spam and not genuine, and this was posted until I manually deleted it.
This has been rather a self-indulgent excursion, and I think I might be straying from my original premise: that the fall in signal-to-noise ratio that this kind of spam heralds may well have a negative impact on the blog as a form.
[An addendum: my first attempt at posting this was blocked because the system decided I was posting "questionable content" &ndash presumably the URL of the collector cars site. Trying this again . . .]
Dave Munger on June 30, 2005 10:00 AM:
I think the signal-to-noise ratio may be a problem for comments, but no so much for blogs in general. After all, we choose the blogs we want to visit, so unlike in e-mail, we're not forced to consider all the millions of blogs created just for the sake of referrer spam.
Now the referrer system may end up being useless, but we'll still be able to use blogs themselves.
And e-mail, which has been subjected to spam for much longer, is still holding out okay. Most people still regularly use e-mail, despite the spam problems, and still are able to get messages to the people they want to.