Microsoft Code Pages
A code page is a platform specific encoding of a character set, and
can be represented in a table as a mapping of characters to single
or multibyte values. Many code pages share the ASCII character set
for characters in the range
0x00 - 0x7F
.
The Microsoft run-time library uses the following types of
code pages:
- System-default ANSI code page. When an application starts,
the run-time system automatically sets the multibyte code page to
the operating system's default ANSI code page. To set the locale
to the system-default ANSI code page, use the C call:
setlocale(LC_ALL, "");
- Locale code page. Many of the C run-time routines
are dependent on the current locale
setting, which, in turn, is dependent on the locale code page. On
application startup, the locale-dependent routines in the
Microsoft run-time library use the code page that corresponds to
the "C" locale. However, you can change or query the locale code
page within your application by calling
setlocale
.
- Multibyte code page. In addition to
locale-sensitive C run-time functions, Microsoft also supports
many multibyte-character functions that are dependent on the
application's multibyte code page setting. By default, these
routines use the system-default ANSI code page. However, at
run-time you can query and change the multibyte code page by
calling
_getmbcp
and _setmbcp
, respectively.
- "C" locale code page. This is the name of the code
page that corresponds to the ASCII character set, and is the code
page that is used as the C/C++ application's default locale code
page.
Multibyte Code Page Functions
Most multibyte-character routines in the Microsoft run-time library
recognize multibyte-character sequences according to the current
code page setting. This includes the _ismbc
routines. The multibyte code page also affects multibyte processing
in the following set of routines:
_exec functions |
_mktemp |
_stat |
_fullpath |
_spawn functions |
_tempnam |
_makepath |
_splitpath |
tmpnam |
In addition, all run-time library routines that have
multibyte-character
argv
or
envp
program arguments (such as the _exec and _spawn families) process
these strings according to the multibyte code page. Hence these
routines are also affected by a call to
_setmbcp
that changes the multibyte code page.
See the MSDN
Library for more information on the multibyte code page-dependent
functions.
Locale Code Page Functions
There are a number of functions that are dependent on the locale
code page. As stated above, call setlocale
to ensure that the locale is set properly before calling one of
these functions.
atof, atoi, atol |
is functions |
isleadbyte |
localeconv |
MB_CUR_MAX |
_mbccpy |
_mbclen |
mblen |
_mbstrlen |
mbstowcs |
mbtowc |
printf functions |
scanf functions |
setlocale, _wsetlocale |
strcoll, wcscoll |
_stricmp, _wcsicmp, _mbsicmp |
_stricoll, _wcsicoll |
_strncoll, _wcsncoll |
_strnicmp, _wcsnicmp, _mbsnicmp |
_strnicoll, _wcsnicoll |
strftime, wcsftime |
_strlwr |
strtod, wcstod, strtol, wcstol, strtoul, wcstoul |
_strupr |
strxfrm, wcsxfrm |
tolower, towlower |
toupper, towupper |
wcstombs |
wctomb |
_wtoi, _wtol |
See the MSDN
Library for more details on C locale-dependent functions.
There are also many locale-dependent Win32 functions. See Windows C++
Locale Functions for details.
And for a comprehensive list of Microsoft code page identifiers,
click here.
|