magpiebrain

Sam Newman's site, a Consultant at ThoughtWorks

Archive for ‘April, 2004’

“Paperairplane”:http://www.paperairplane.us/ looks _very_ interesting (link courtesy of “Die Puny Humans”:http://www.diepunyhumans.com/archives/000161.html). Based on JXTA technology, it is a Mozilla plugin (here’s hoping Firefox is supported too) which allows the development of decentralized communities. The content is stored and served from individuals machines, hopefully providing the ability for individuals to setup communities where they would otherwise be restricted form doing so (either due to commercial or political reasons). I was a big fan of the “peek-a-booty”:http://www.peek-a-booty.org/ project, originally launched with much fanfare, that attempted to provide methods for sharing information that could not be blocked by governments, however little has materialized since the project was first announced over two years ago. Paperairplane seems to of made at least one of the same mistakes as peekabooty – it has announced a project without anything to actually show for it (the java.net website for the project contains the usual “We’ll post code when we have it” type message). From the peekabooty Lessons Learned doc (PDF):

What did I learn from the first version of Peekabooty?

1. Don’t release before it does something useful.
This lesson is recounted in Open Sources: Voices from the Open Source
Revolution as well as other places. I had even read about this rule before we
released, but I had to learn it for myself. If you release too soon, you spend a
lot of your time answering emails instead of developing.

I hope that the paperairplanes project manages to produce some actual usable code, and doesn’t join the ranks of other vapourware OS projects like peekabooty (and many of my own). If it can, then it will undoubtedly prove useful in a world where censorship is on the rise.

Its been a day for revisiting previous topics, so I thought I’d readdress some of the “troubles(Strange Java regexp behaviour – grouping)”:http://www.magpiebrain.com/archives/000219.html I was having with regular expressions last week. To recap, I was writing an expression to grab the serial number from the following string:

|          Serial          |
|        1234567890        |

The regexp I was using didn’t seem to work – @Serial(?s).*([0-9]+)@ should of captured the serial number into a group, but was only capturing the last number. Many commentors posted that the reason for this is that @.*@ is a greedy operator (“Doug’s”:http://www.magpiebrain.com/archives/000219.html#comment419 comment should especially be noted – rarely have I seen such effort put into a blog comment!).

Simply put a greedy operator matches as many characters as possible – when this stuff was new to me I used to think of a greedy operator as a little Pacman, chomping his way through my string, waiting till the last possible moment before letting the next operator get a look in. In this instance, the @*@ ate everything until it left just enough text for the @[0-9]+@ to match – which was just the last digit of the serial number. As you would expect, to balance greedy operators, you have lazy (or reluctant) operators. To further abuse a metaphor, I think of a lazy operator as a very full pacman, who is looking for any excuse to go off for a nap. The lazy form of the @*@ operator in Java is @*?@ – in my case this operator gives up when it sees the first number, letting the @[0-9]+@ take over. So lets look at my fixed code:


String input = "|          Serial          |n|         1234567890        |";
Pattern p = Pattern.compile("Serial(?s).*?([0-9]+)");
Matcher m = p.matcher(input);

while(m.find()) {
  System.out.println("Found match: " + m.group());
  System.out.println("Found serial number: " + m.group(1));
}

This particular mistake was quite embarrassing. I’ve always prided myself on my regexp knowledge and to make such a bonehead mistake (not to mention exhibit at least one fundamental misunderstanding about the whole thing) has gone some way to puncture my ego, which I guess is no bad thing… The moral of the story? Reach for the manual before reaching for the blog – you might still make mistakes, but at least you’re making them in private that way!

As part of my ongoing efforts to streamline this site, I recently added the ability to link to specific comments. I detailed this process in an “earlier post”:http://www.magpiebrain.com/archives/000222.html (which lets hope gets automatically linked to using RelatedEntries), whereupon “Cheah(Redemption In a Blog)”:http://blog.codefront.net/ “pointed out(Comment on Comment Permalinks with MovableType)”:http://www.magpiebrain.com/archives/000222.html#comment422 that my use of anchors (via the “name (Links in HTML Documents – name attribute)”:http://www.w3.org/TR/html4/struct/links.html#adef-name-A attribute) was not as meaningful as linking to a semantic construct. I did some reading around the subject, notably Tantek’s “Anorexic Anchors”:http://www.tantek.com/log/2002/11.html#L20021128t1352, and the W3C’s “specification”:http://www.w3.org/TR/html4/struct/links.html#anchors-with-id.

I was using a standard @@ tag to create an anchor within a page – for example on a page @post.html@, a @@ tag allows the page to be loaded and to have the page automatically scroll to the location of the link when the link to @post.html#comment@ is clicked on. Allowing specific areas of a page to be directly linked to can be very powerful – I have used this in other areas, for example I have dropped the use of the MT standard comment listing template in favour of a single page containing the post itself and the comments, with an anchor linking to the comment region. The thrust of Cheah’s comment and Tantek’s post was that using an @

As this blog has attracted more and more visitors each month, the number of comments I’ve been receiving has also been on the increase. Every now and then one of my threads generates some extremely interesting feedback, and it was starting to get annoying that I couldn’t reference these comments directly. What I really wanted was the ability to link directly to a comment.

A quick look through the Movable Type documentation, and I found details for the @MTCommentID@ tag. Simply put when placed inside an @MTComments@ tag (or in my case “@MTSimpleComments@(Plugin which will print comments and trackbacks in the same listing.)”:http://mt-plugins.org/archives/entry/simplecomments.php) it will print the ID for a comment. It is then a simple matter to create an HTML anchor for that specific comment like so:

" />

In of itself this isn’t terribly useful – I need to look at the source of the page to get the anchor name, and then have to create the link by hand. So I’ve also added a comment permalink, using the following code – again within the @MTComments@ tag:

#"
title="Permalink to this comment">Permalink

So, when I next get an interesting post, I can link right to it, and so can anyone else.

_Updated_: I posted more on this subject in “Comment Permalinks revisited”:http://www.magpiebrain.com/archives/000223.html.

Matt Riable has encountered a performance issue with his JUnit tests, and is “advocating the use of static data(Make your JUnit Tests run faster when using Spring)”:http://raibledesigns.com/page/rd?anchor=make_your_junit_tests_run to construct his Spring @ApplicationContext@. I think in this specific instance he might be justified in his choice (I’m nothing if not pragmatic) but I am always extremely cautious in allowing the use of a shared environment between tests.

Lets look at the crux of Matt’s problem – by the book you should define the environment for a test within @setUp()@, and should clear it up in @tearDown@. @setUp@ is called before each @test@ call, and @tearDown@ after, in order to isolate each test from the other. In Matt’s case, the @setUp@ call is quite slow – the creation of his @ApplicationContext@ involves file IO and XML parsing, not the fastest of processes. So why is @setUp@ called prior to each test? Let’s look at a very simple example:



public ExampleTest extends TestCase {
  private SharedObject sharedObject;

  public void setUp() {
    sharedObject = new SharedObject();
  }

  public void testOne() {
    //run some test using sharedObject
  }

  public void testTwo() {
    //run some test using sharedObject
  }
}

Imagine if @setUp()@ is only called one for the whole @testCase@ – what if @testOne@ or @testTwo@ change the state of @sharedObject@? If @testOne@ is run first, then @testTwo@ will be reliant on the state of @sharedObject@ manipulated by @testOne@ – can you be sure @testTwo@ will still work if you run it first? Many of us run our JUnit tests directly in our IDE, and we have no control over the order in which tests are run. By sharing potentially mutable data between tests we are exposing ourselves to the possibility of variable results based on the order in which tests are run.

Now I’m not advocating the notion that shared data should never be used – for example I cannot see a problem sharing immutable data between tests. However think very carefully about sharing any data which has the capacity to be altered by one of the tests – its important to balance the benefits of instantiating Objects once for all of the tests against the potential risk of poluting the state of one test with the operations of another.

I was playing around with the Regular Expression support in Java 1.4, with a view to repeating my earlier tutorials on the use of sed for text manipulation (this time with Java), when I can across a rather strange problem. Imagine my input is a multi-line string, part of which looks like this:

|          Serial          |
|        1234567890        |

I want to match the serial number, which in this case is 1234567890. First off, I want to match the title itself (I cannot just assume any numbers are serial numbers) but I also have to match the numbers themselves. The code to match the string I want to extract the number from looks like this:



String input = "|          Serial          |n|        1234567890        |";
Pattern p = Pattern.compile("Serial(?s).*[0-9]+");
Matcher m = p.matcher(input);

while(m.find()) {
  System.out.println("Matched String " + m.group());
}


Note: The use of the embedded (?s) tag forces the . to match line terminators – by default it doesn’t unless this flag or DOTALL is used.

Put simply the pattern reads “Match the work serial, followed by any characters until you get to a list of numbers and stop there. Sure enough, running this gives the following result:


Found match Serial          |
|        1234567890

Next I group the numbers being matched using ‘(‘ and ‘)’. This gives me grouping exactly as with sed – I can now index these matching groups, using m.group(index):



  String input = "|          Serial          |n|         1234567890        |";
  Pattern p = Pattern.compile("Serial(?s).*([0-9]+)");
  Matcher m = p.matcher(input);

  while(m.find()) {
    System.out.println("Found match: " + m.group());
    System.out.println("Found serial number: " + m.group(1));
  }


But this gives the following output:


Found match: Serial          |
|        1234567890
Found serial number: 0

For some reason the grouping is only matching the last number, not the whole list. I can’t for the life of me work out why…. Oh well, expect some tutorials on the use of regular expressions soon.

_Updated_: ditched all the dashes as they screwed up the formatting.

_Updated_: Fixed the second code fragment

“Paper prototypes”:http://www-106.ibm.com/developerworks/library/us-paper/?dwzone=usability are a great tool for quickly designing and demonstrating GUI. Sometimes however the interactions can be a little hard to see – in which case a GUI-prototype can be a boon. Problems can come however from knocking up these dummy interfaces – management and users can get the idea that the product itself is nearly done, or they may start obsessing on little UI idiosyncrasies that aren’t really the point of the exercise. Ken Arnold’s “Napkin Look and Feel”:http://napkinlaf.sourceforge.net/ is an attempt to give coded interfaces a paper-prototype feel – so users get a clear idea that this is a rough draft and nothing more. The webstart demo sows the SwingSet demo using the new look and feel, and it seems to work very well.

File Under…

Ideas for a new .com website now the bubble is expanding again

Synopsis

Site rates things based on their evil nature

Evil List

* Getters
* Setters
* GOTO’s
* Sun/Microsoft settlement
* Renaming Patterns
* Long Commutes

Not Evil List

* Expose
* Battlefield Vietnam
* Scripting Languages
* Categorising people as Dwarves, Elves, Pirates or Ninjas

Pending filing

* Groovy
* Egoware
* Extreme Programming
* Lotus Notes

_Update_: Both “Mark Pilgrim(Interoperability)”:http://diveintomark.org/archives/2004/04/06/interoperability and “Russell Beattie(Real Player 10: Evil or Not?)”:http://www.russellbeattie.com/notebook/1007374.html are also categorising things as evil or not.