Collation in Java
If database collation is not acceptable (as is the case with some
database products like mySQL), Java's Collator class
does allow the application to perform string comparisons for different
languages. You invoke the Collator.compare method to
perform a locale-independent string comparison. The compare
method returns an integer less than, equal to, or greater than zero
when the first string argument is less than, equal to, or greater
than the second string argument.
Use this class to build searching and sorting routines for natural
language text.
Collator is an abstract base class. Subclasses implement
specific collation strategies. One subclass, RuleBasedCollator ,
is currently provided with the JDK and is applicable to a wide set
of languages. Other subclasses may be created to handle needs that
are more specialized.
Like other locale-sensitive classes, the Collator
can use the static factory method, getInstance , to
obtain the appropriate Collator object for a given
locale . You will only need to look at the subclasses
of Collator if you need to understand the details of
a particular collation strategy or if you need to modify that strategy.
You can set a Collator 's strength property to determine
the level of difference considered significant in comparisons. Four
strengths are provided: PRIMARY , SECONDARY ,
TERTIARY , and IDENTICAL . The exact assignment
of strengths to language features is locale dependent. For example,
in Czech, "e" and "f" are considered primary
differences, while "e" and "ê" are secondary
differences, "e" and "E" are tertiary differences
and "e" and "e" are identical.
Java Collation Example
Look at the following three strings: äpple , banan ,
and orange . The order shown is the correct order if
we were to sort these strings using German collation rules. An uninformed
programmer might try to sort these strings using the following program:
public class IncorrectSort {
public static void main(String [] argv) {
String fruit[] = { "orange", "äpple", "banan"
};
String tmp;
for (int i = 0; i < fruit.length; i++) {
for (int j = i + 1; j < fruit.length; j++) {
if ( fruit[i].compareTo( fruit[j] ) > 0 ) {
// Swap fruit[i] and fruit[j]
tmp = fruit[i];
fruit[i] = fruit[j];
fruit[j] = tmp;
}
}
}
for (int k = 0; k < fruit.length; k++)
System.out.println(fruit[k]);
}
}
The program sorts the strings incorrectly as banan ,
orange , äpple . It does this because
the encoded value of "ä" is greater than "b"
and "o".
Below is the correct way to sort these strings:
import java.util.Locale;
import java.text.Collator;
public class CorrectSort {
public static void main(String [] argv) {
String fruit[] = { "orange", "äpple", "banan"
};
String tmp;
Collator collate =
Collator.getInstance(Locale.GERMAN);
for (int i = 0; i < fruit.length; i++) {
for (int j = i + 1; j < fruit.length; j++) {
if ( collate.compare( fruit[i], fruit[j] ) > 0 ) {
// Swap fruit[i] and fruit[j]
tmp = fruit[i];
fruit[i] = fruit[j];
fruit[j] = tmp;
}
}
}
for (int k = 0; k < fruit.length; k++)
System.out.println(fruit[k]);
}
}
In this example, the strings properly sort as äpple ,
banan , and orange .
The following example shows how to compare two strings using the
Collator for the default locale :
// Compare two strings in the default locale
Collator myCollator = Collator.getInstance();
if( myCollator.compare("abc", "ABC") < 0)
System.out.println("abc is less than ABC");
else
System.out.println("abc is greater than or equal to ABC");
The following shows how both case and accents could be ignored
for US English:
//Get the Collator for US English and set its strength to
PRIMARY
Collator usCollator = Collator.getInstance(Locale.US);
usCollator.setStrength(Collator.PRIMARY);
if(usCollator.compare("abc", "ABC") == 0) {
System.out.println("Strings are equivalent");
|