ICU Collator for C/C++
The
ICU Collator performs collation based upon a series of rules defined
by the current locale. Whereas in US English everything is sorted
based on A-Z, other countries have different rules:
- The letters A-Z can be sorted in a different order than in English.
For example, in Lithuanian, "y" is sorted between "i"
and "k".
- Combinations of letters can be treated as if they were one letter.
For example, in traditional Spanish "ch" is treated
as a single letter, and sorted between "c" and "d".
- Accented letters can be treated as minor variants of the unaccented
letter. For example, "é" can be treated equivalent
to "e".
- Accented letters can be treated as distinct letters. For example,
"Å" in Danish is treated as a separate letter
that sorts just after "Z".
The following code illustrates the ICU Collator:
UChar *s [] = { /* list of Unicode strings */ }
uint32_t listSize = size_of_the_list;
UErrorCode status = U_ZERO_ERROR;
UCollator *coll = ucol_open("en_US", &status);
uint32_t i, j;
if(U_SUCCESS(status)) {
for(i=listSize-1; i>=1; i--) {
for(j=0; j<i; j++) {
if(ucol_strcoll(s[j], -1, s[j+1], -1) = UCOL_LESS) {
swap(s[j],
s[j+1]);
}
}
}
ucol_close(coll);
}
|