Locales in C/C++
A locale identifier is a string composed of 2 or 3 elements specifying
a language, the region in which that language is employed, and an
optional variant. A somewhat oversimplified view of how a locale
is used by an application is that the language portion is used to
display text appropriate to the locale, and the region portion is
used to format dates, time, currency, etc., appropriate to the locale.
Examples of locales are:
Locale |
Language |
Region |
Variant |
en_US |
English |
USA |
None |
de_AT |
German |
Austria |
None |
en_GB |
English |
United Kingdom |
None |
fr_FR |
French |
France |
None |
fr_FR_Euro |
French |
France |
The variant specifies that currency is to be displayed as
Euro, rather than Francs |
Abbreviations for the language portion of the locale, always written
in lower case, are defined by ISO-639.
Abbreviations for the country portion of the locale, which are written
in upper case, are defined by ISO-3166.
The variant field is defined by the run-time environment and differs
between IBM, Sun, and Microsoft. For example IBM and Sun support
the _Euro variant but Microsoft does not. Oracle uses
unique language and region designators, not standard locale designators.
For applications that can be fully converted to Unicode,
Lingoport recommends the use of the ICU Locale. Applications relying
on the platform character set should use the ANSI C locale.
ANSI C Locale
Using the glibc
documentation, default locales are available to all the objects
in a program. If you set a new default locale for one section of
code, it can affect the entire program. Application programs should
not set the default locale as a way to request an international
object. The default locale is set to be the system locale on that
platform.
A C program inherits locale environment variables upon startup.
However, by default, these variables do not control the locale used
by library functions. To use these environment variables, you must
use the call setlocale :
setlocale (LC_ALL, "");
The following is a list of locale categories:
Category |
Description |
LC_COLLATE |
This category applies to collation of strings. |
LC_CTYPE |
This category applies to classification and conversion of
characters. |
LC_MONETARY |
This category applies to formatting monetary values. |
LC_NUMERIC |
This category applies to formatting numeric values that are
not monetary. |
LC_TIME |
This category applies to formatting date and time values. |
LC_ALL |
This is not an environment variable, it is only a macro that
you can use with setlocale to set a single locale for all purposes. |
LANG |
If this environment variable is defined, its value specifies
the locale to use for all purposes except as overridden by the
variables above. |
In Visual C++ the locale is a unique combination of language, Country/Region, and code page, and is specified using the setlocale()
function for an MBCS application, _wsetlocale for a UTF-16 Unicode
application, or _tsetlocale for a Generic application, where the _MBCS
and _UNICODE compiler flags determine which function is called.
For example, using the wide-character _wsetlocale function:
wchar_t fr_FR L'French_France';
_wsetlocale(LC_ALL,fr_FR);
_wsetlocale(LC_NUMERIC,fr_FR);
This code sets the locale to fr_FR . The first call
to _wsetlocale() sets all locale information to fr_FR ,
while the second sets only the numeric locale information to fr_FR .
Subsequent calls to wprintf() will display numeric
information in a manner appropriate to setlocale() information.
See the MSDN Library
for the list of language and country strings that can be used to specify the locale.
ICU Locale for C
In ICU
for C, a locale is a character string. For example, to set a
locale based upon Belgian French with a Euro currency convention:
const char *loc = "fr_BE_EURO";
The locale string will be used by the ICU components to perform
various locale-based formatting activities. For example, the following
creates various number formatters for the German locale:
UErrorCode status = U_ZERO_ERROR;
UNumberFormat *nf;
nf = unum_open(UNUM_DEFAULT, "de_DE", &status);
unum_close(nf);
nf = unum_open(UNUM_CURRENCY, "de_DE", &status);
unum_close(nf);
nf = unum_open(UNUM_PERCENT, "de_DE", &status);
unum_close(nf);
More information can be found at the ICU
Userguide website.
Standard C++ Locale
In C++, the locale class is an abstraction that manages
the locale facets--separate classes that encapsulate specific internationalization
functionality. More information on using the standard locale for
C++ can be found at cantrip.org.
The following describes the capabilities of the standard C++ facets:
- Code Conversion: The facet
codecvt<internT,externT,stateT>
is used when converting from one encoding scheme to another, such
as from the multibyte encoding JIS to the wide-character encoding
Unicode. The main member functions are in() and out() .
- Collate: The facet
collate<charT>
provides features for string collation, including a compare()
function used for string comparison.
- Ctype: The facet
ctype<charT> encapsulates
the Standard C++ Library ctype features for character classification,
like tolower() , toupper() , is(ctype_base::space,...)
etc.
- Messages: The facet
messages<charT>
implements message retrieval. It provides facilities to access
message catalogues via open() and close(catalog) ,
and to retrieve messages via get(..., int msgid,...) .
- Monetary: The facets
money_get<charT,bool,InputIterator>
and money_put<charT,bool, OutputIterator> handle
formatting and parsing of monetary values. They provide get()
and put() member functions that parse or produce
a sequence of digits, representing a count of the smallest unit
of the currency. For example, the sequence $1,056.23 in a common
US locale would yield 105623 units, or the character sequence
"105623. The facet moneypunct <charT, bool
International> handles monetary formats and punctuation
like the facet numpunct<charT> handles numeric
formats and punctuation. It comes with functions like curr_symbol() ,
etc.
- Numeric: The facets
num_get<charT,InputIterator> and
num_put<charT, OutputIterator> handle numeric
formatting and parsing. The facets provide get()
and put() member functions for values of type long ,
double , etc. The facet numpunct<charT>
specifies numeric formats and punctuation. It provides functions
like decimal_point() , thousands_sep() ,
etc.
- Time: The facets
time_get<charT,InputIterator>
and time_put<charT, OutputIterator> handle
date and time formatting and parsing. They provide functions like
put() , get_time() , get_date() ,
get_weekday() ,etc.
More information about the C++ locale and facets can be found
here.
ICU for C++
In ICU for C++, locales are represented by the locale
class. These locale objects can be specified according to a user's
preference and then be passed as arguments for functions requiring
locale-sensitive processing. It should be noted that ICU locales
do not specify the character encoding used by the operating system.
For example, to set a locale based upon Belgian French
with a Euro currency convention:
Locale *loc = new Locale("fr", "BE",
"EURO");
The locale object will be used by the ICU components
to perform various locale-based formatting activities. For example,
the following creates various number formatters for the "Germany"
locale:
UErrorCode status = U_ZERO_ERROR;
NumberFormat *nf;
nf = NumberFormat::createInstance( Locale::GERMANY, status);
delete nf;
nf = NumberFormat::createCurrencyInstance( Locale::GERMANY, status);
delete nf;
nf = NumberFormat::createPercentInstance( Locale::GERMANY, status
);
delete nf;
|