ICU Collator for C/C++

The ICU Collator performs collation based upon a series of rules defined by the current locale. Whereas in US English everything is sorted based on A-Z, other countries have different rules:

The letters A-Z can be sorted in a different order than in English. For example, in Lithuanian, "y" is sorted between "i" and "k".
Combinations of letters can be treated as if they were one letter. For example, in traditional Spanish "ch" is treated as a single letter, and sorted between "c" and "d".
Accented letters can be treated as minor variants of the unaccented letter. For example, "é" can be treated equivalent to "e".
Accented letters can be treated as distinct letters. For example, "Å" in Danish is treated as a separate letter that sorts just after "Z".

The following code illustrates the ICU Collator:

UChar *s [] = { /* list of Unicode strings */ } uint32_t listSize = size_of_the_list; UErrorCode status = U_ZERO_ERROR; UCollator *coll = ucol_open("en_US", &status); uint32_t i, j; if(U_SUCCESS(status)) { for(i=listSize-1; i>=1; i--) { for(j=0; j<i; j++) { if(ucol_strcoll(s[j], -1, s[j+1], -1) = UCOL_LESS) { swap(s[j], s[j+1]); } } } ucol_close(coll); }

Internationalization Topics

ICU Collator for C/C++