By and large, I’ve been able to ignore the whole character encoding issue in Java. During a move to a new server however I had to sit up and take notice. Part of our code is using the getBytes()
method to prepare some text files. We’d been using the default method, whose documentation states:
Encodes this String into a sequence of bytes using the platform’s default charset, storing the result into a new byte array.
The behavior of this method when this string cannot be encoded in the default charset is unspecified. The CharsetEncoder class should be used when more control over the encoding process is required.
Our new machine of course defaults to UTF-8 unlike our other servers, so upon producing our files it duly screwed up all our £ symbols. A system-property change later and all was well, but it has just reinforced the fact that I should really pay more attention to all those little caveats in the API documentation.
Leave a Reply