Monday, 15 May 2006

Sorting it all out

One of the things that I learned in the last month or so is that sorting in different languages, with accented characters, isn't the same. :-( I'm going to outline a case-insensitive example in Java and Oracle and my experience with that.

With Oracle as a back-end, if I wanted to order by a String column I would call the upper (or lower) method on the field to return a sorted list. I knew that this was a bit of a hack because some (European?) languages have strange conversions from upper to lower case. I remember reading that's why they made XML case sensitive...

So, with the work that I was doing I run some sql (or hql) with something like "... order by (UPPER(col_name))" and in my test suite I would check that it was sorting correctly by using a case insensitive comparator. Now, to find out that it will not work you just have to read the documentation. But course I never thought that it applied to what I was doing.

Here's an example: if you have the names "BÈretta" and "Butler", the default in Java and using upper in the db are both to return "BÈretta" after "Butler". Not correct in a French locale.

To sort it correctly in Oracle, you should do something like:
"order by nlssort(col_name, 'NLS_SORT = FRENCH')"

To sort it correctly in Java, you would have to use a collator like the RuleBasedCollator. Example:
Collator collator = Collator.getInstance(Locale.CANADA_FRENCH);
// make the ordering case-insensitive
collator.setStrength(Collator.SECONDARY);

The Collator is actually very interesting because you can put different emphasis on different parts of the sorter. Really kind of cool.

4 comments:

  1. I was just reading about this for MySQL. You can put a collator on each column that applies to all queries, so you wouldn't have this problem in MySQL.

    ReplyDelete
  2. humm... that would make things easier, but I won't be able to use it in this case. :-( Good to know though.
    My next problem is that hibernate doesn't seem to have collator's built in (from what I have looked for). I either have to build a hack-ish implementation or wait for it to be incorporated. I just don't know how portable collator's are across different db's or how they implement it.
    I just don't want to sort large datasets in Java when I can do it in the db.

    ReplyDelete
  3. Ryan: Based on your comments I've been asking around if (in Oracle) this can be done on the user level. So far I know it works on the session level.
    alter session set NLS_LANGUAGE='FRENCH';
    I'll post more info if I figure out a better solution.

    ReplyDelete
  4. I found my exact question on the hibernate forum.
    "ALTER SESSION SET NLS_LANGUAGE = French"
    http://forum.hibernate.org/viewtopic.php?t=950050&view=next&sid=cd80419e464ae0c6327c341649e3ab34
    Of course it's posted Dec 12, 2005 and still has no responces... *sigh*
    When I figure out a good way to do it, I'll post a responce.

    ReplyDelete