Internationalization and localization tools


Single-Byte Character Manipulation Functions

Related Links

Link to Wide Character Manipulation Functions.
Link to Multibyte Character Manipulation Functions.
Link to Windows Generic Character Manipulation Functions.

Internationalization (I18n) Issue:

This category of potentially locale-sensitive functions operates or manipulate characters that are 7-bit or 8-bit ASCII characters.

I18n Solution:

The appropriate wide-character or multibyte equivalent function should be used within the internationalized application. In the case of a Windows Generic application, call the equivalent generic function, and use the _MBCS or _UNICODE define to map to the correct multibyte or wide-character function.

I18n Discussion:

Character testing and conversion functions

Expanded ctype functions can handle only 256 bytes because they are limited to taking a value that can be represented as an unsigned char as input. This is despite the fact that the functions actually take and return an int, which is always at least 16 if not 32 bits. This causes these functions to only work with single-byte code sets. Multibyte characters require either specific multibyte functions, or conversion to wide-characters to use the wide-character functions.

Character I/O

The single-byte character string input/output functions generally work for multibyte-character strings since this encoding method allows a single null byte to terminate the string. These functions will not work for wide-character strings because their code characters may include all-zero octets. In the case of wide-character strings (i.e. based on wchar_t characters), use the wide-character input/output functions.

As for single-character input/output functions, although they require a single byte character argument, they can be called multiple times to output a multibyte character. This works for both Windows MBCS characters, which are either 1 or 2 bytes per character, and ANSI UTF-8 platforms, where a character can occupy 1 to 6 bytes.

Special consideration needs to be given Windows MBCS applications, and Windows Unicode applications that are running on older Windows OS systems (i.e. Win 95/98/Me) that do not support UTF-16 Unicode as the system's native encoding. In the case of a Windows MBCS application, the system's multibyte code page will be used to either directly utilize the multibyte string (when running on a non-Unicode system), or to convert the application's multibyte string to a UTF-16 Unicode string before using it (on a Unicode system). In the case of a Unicode application running on a non-Unicode system, the OS will use the system's multibyte code page to convert the application's UTF-16 Unicode string to a multibyte string prior to use. In order for strings to be correctly accessed in these scenarios, the application's UI language must be in agreement with the system's multibyte code page; otherwise, characters may be lost in the conversions.

There is no issue for a Windows Unicode application running on a later version of Windows (NT/2K/XP) where the native encoding is UTF-16 Unicode. Using the wide character functions will correctly input and output the wide characters.

See File I/O for information on reading and writing non-ASCII data to files and streams.

Click on a function for more information:

_cgets

_cputs

ecvt/_ecvt

ecvt_r

fcvt/_fcvt

fcvt_r

fgetc

fgetc_unlocked

fgetchar/_fgetchar

fgets

fgets_unlocked

fputc

fputc_unlocked

fputchar/_fputchar

fputs

fputs_unlocked

gcvt/_gcvt

_gcvt_s

_getch

_getche

getc

getc_unlocked

getchar

getchar_unlocked

getdelim

getline

gets

isalnum

isalpha

isascii/__isascii

isblank

iscntrl

iscsym/__iscsym

iscsymf/__iscsymf

isdigit

isgraph

isleadbyte

islower

isprint

ispunct

isspace

isupper

isxdigit

_i64toa

itoa/_itoa

_itoa_s

ltoa/_ltoa

_ltoa_s

putc

putc_unlocked

_putch

putchar

putchar_unlocked

puts

qecvt

qecvt_r

qfcvt

qfcvt_r

qgcvt

strfry

__toascii

tolower/_tolower

toupper/_toupper

_ui64toa

_ui64toa_s

ultoa/_ultoa

_ultoa_s

ungetc

_ungetch

 

 Locale-Sensitive C++ Methods

 

Lingoport internationalization and localization services and software