Single-Byte Character Manipulation Functions
Related Links
Link to Wide Character Manipulation Functions.
Link to Multibyte Character Manipulation Functions.
Link to Windows Generic Character Manipulation Functions.
Internationalization (I18n) Issue:
This category of potentially locale-sensitive functions operates or manipulate
characters that are 7-bit or 8-bit ASCII characters.
I18n Solution:
The appropriate wide-character or multibyte
equivalent function should be used within the internationalized
application. In the case of a Windows Generic application, call the equivalent
generic function, and use the _MBCS or _UNICODE
define to map to the correct multibyte or wide-character function.
I18n Discussion:
Character testing and conversion functions
Expanded ctype functions can handle only 256 bytes
because they are limited to taking a value that can be represented
as an unsigned char as input. This is despite the fact
that the functions actually take and return an int ,
which is always at least 16 if not 32 bits. This causes these functions
to only work with single-byte code sets.
Multibyte characters require either specific multibyte functions,
or conversion to wide-characters to use the wide-character functions.
Character I/O
The single-byte character string input/output functions generally work for multibyte-character strings since
this encoding method allows a single null byte to terminate the string. These functions will not work
for wide-character strings because their code characters may include all-zero octets. In the
case of wide-character strings (i.e. based on wchar_t characters), use the
wide-character input/output functions.
As for single-character input/output functions, although they require a single byte character argument,
they can be called multiple times to output a multibyte character. This works for both Windows MBCS
characters, which are either 1 or 2 bytes per character, and ANSI UTF-8 platforms, where a character
can occupy 1 to 6 bytes.
Special consideration needs to be given Windows MBCS applications, and Windows Unicode applications
that are running on older Windows OS systems (i.e. Win 95/98/Me) that do not support
UTF-16 Unicode as the system's native encoding. In the case of a Windows MBCS application,
the system's multibyte code page will be used to either directly utilize the multibyte string (when
running on a non-Unicode system), or to convert the application's multibyte string
to a UTF-16 Unicode string before using it (on a Unicode system). In the case of a Unicode
application running on a non-Unicode system, the OS will use the system's multibyte code page
to convert the application's UTF-16 Unicode string to a multibyte string prior to use. In order
for strings to be correctly accessed in these scenarios, the application's UI language must be in
agreement with the system's multibyte code page; otherwise, characters may be lost in the
conversions.
There is no issue for a Windows Unicode application running on a later version of Windows (NT/2K/XP)
where the native encoding is UTF-16 Unicode. Using the wide character
functions will correctly input and output the wide characters.
See File I/O for information on reading and writing non-ASCII data
to files and streams.
Click on a function for more information:
_cgets
_cputs
ecvt/_ecvt
ecvt_r
fcvt/_fcvt
fcvt_r
fgetc
fgetc_unlocked
fgetchar/_fgetchar
fgets
fgets_unlocked
fputc
fputc_unlocked
fputchar/_fputchar
fputs
fputs_unlocked
gcvt/_gcvt
_gcvt_s
_getch
_getche
getc
getc_unlocked
getchar
getchar_unlocked
getdelim
getline
gets
isalnum
isalpha
isascii/__isascii
isblank
iscntrl
iscsym/__iscsym
iscsymf/__iscsymf
isdigit
isgraph
isleadbyte
islower
isprint
ispunct
isspace
isupper
isxdigit
_i64toa
itoa/_itoa
_itoa_s
ltoa/_ltoa
_ltoa_s
putc
putc_unlocked
_putch
putchar
putchar_unlocked
puts
qecvt
qecvt_r
qfcvt
qfcvt_r
qgcvt
strfry
__toascii
tolower/_tolower
toupper/_toupper
_ui64toa
_ui64toa_s
ultoa/_ultoa
_ultoa_s
ungetc
_ungetch
Locale-Sensitive C++ Methods
|