magpiebrain

Sam Newman's site, a Consultant at ThoughtWorks

Archive for ‘October, 2003’

For a while now some colleagues have been raving about XPath, but I must admit its something I’ve never really looked into. In a brief post Simon has managed to not only explain what XPath is, but also why its so damn handy. I would quote from his post, but its some concise there isn’t any real point – go read it for yourselves! He goes on to mention that those of us with valid XHTML markup (I knew there was a reason I did it) can use XPath queries to search our websites, and the search engine on Sam Ruby’s blog allows you to do just that.

I’ve been reliably informed that the “Telegraph(Telegraph.co.uk)”:http://www.telegraph.co.uk/ are currently carrying out experiments with RSS, and are planning on rolling the service out in the near future. I don’t know if there are going to be any restrictions on the service yet (e.g. ads in the feeds etc), but I’ll let you know as soon as I do.

Work is proceeding on my second article on the “Informa API”:http://informa.sourceforge.net/ – I’m currently refactoring the FeedManager class whose use forms the basis of the article. The FeedManager‘s job is to manage multiple feeds, and handle their lazy loading (which will fix a major flaw with the simple code presented in “my first article(Java.net – Using RSS in JSP pages)”:http://today.java.net/pub/a//today/2003/08/08/rss.html). I’m also getting to grips with OPML – support for which was added to Informa as of the 0.4.0 release by Niko Schmuck (Informa’s project manager), and will be adding a method to add channels from an OPML file to the FeedManager itself. All changes will be going back into the Informa code base – expect them to appear in the next release.

It seems just as spammers decided to start targeting blogs, we (or more accurately more able, less lazy people) have come up with all kinds of solutions to keep our blogs spam free. Jay Allen’s “MT Blacklist”:http://www.jayallen.org/journey/2003/10/mtblacklist_stop_spam_now works like a charm and really should be a feature of MT itself. James Seng’s “elegant solution(James Seng’s Blog – Solution for comments spams)”:http://james.seng.cc/archives/000145.html using automatic image generation to determine if a poster is very nice and works well if you don’t mind making your comments less accessible. Now James has created a “Bayesian filter for MT(James Seng’s Blog – Bayesian filter for MT)”:http://james.seng.cc/archives/000152.html which I’m installing as we speak, and I found out that Feedster are attempting to generate a “definitive blacklist list of spammers(Feedster – OPML in Action : Updates to the Comment Spammer BlackList)”:http://www.feedster.com/blog/archives/187_OPML_in_Action__Updates_to_the_Comment_Spammer_BlackList.html. The only problem with Feedster’s list is that its OPML which means I have to cut and paste to get it in to MT Blacklist, and they seem to have very few entries right now, although this should be rectified when I send them my list of 500+ IP addresses.

_Update_: You can report spam to Feedster to get IP’s added to their OPML file by using “this interface(Feedster – Report a comment spammer)”:http://www.feedster.com/commentspam.php.

_Update 16-Oct-03, 12:37_: OK, as Jay helpfully pointed out I am of course putting URL’s rather than IP addresses in MT Blacklist. For those who care, you can see my blacklist “here(magpiebrain – Spammer blacklist)”:http://www.magpiebrain.com/blacklist.txt.

“Jay Allen(Jay Allen :: The Daily Journey)”:http://www.jayallen.org/journey/ as per his “promise(Jay Allen :: The Daily Journey – MT-Blacklist almost ready)”:http://www.jayallen.org/journey/2003/10/mtblacklist_almost_ready got his “MT Blacklist plugin(Jay Allen :: The Daily Journey – MT-Blacklist: Stop Spam Now)”:http://www.jayallen.org/journey/2003/10/mtblacklist_stop_spam_now available for download, and I’ve now installed it. Its quite niffty – the interface for adding blacklisted sites is fantastic (and supports regexps). It also handles blacklisting of trackbacks, and also seems to block comments based on their content not just their links, although I’m unsure how this is configured. Anyway, thanks Jay!

As I “mentioned before”:http://www.magpiebrain.com/archives/000099.html, james Seng has an “elegent MT plugin”:http://james.seng.cc/archives/000145.html, which attempts to indentify if a commentor posting to your blog is actually human. He dynamically generates a random number as a GIF, and has the commentor type the displayed number. The only problem with such a solution at present is that those commentors that cannot view graphics will be unable to type the correct number in and therefore will be unable to type in the correct value. My potential solution is to have the pluing generate a sentance explaining what values to type in (something like “From the work “Orange” type in the 3rd character and the number positon of the letter g”). Probably not as foolproof as Jame’s solution, but more accesible anyway.

Jeremy Zawodny is “fighting back( Jeremy Zawodny’s blog- Cheap Viagra, Vicodin, Xanax, Prescription Drugs, and Penis Enlargement Pills!!!)”:http://jeremy.zawodny.com/blog/archives/001002.html, by trying to beat the spammers at their own game. Comment spammers are trying to make sure that search phrases such as those mentioned in this posts title will match their site. They’re doing this by posting links to their websites with the phrases thereby warping Goggle’s page rank. Jeremy is hoping his post will get enough Trackbacks (like the one I’m sending) so that his post will actually come out top and annoy all those spammers. Lets see what happens…