Single-Byte Character Manipulation Functions

Internationalization (I18n) Issue:

This category of potentially locale-sensitive functions operates or manipulate characters that are 7-bit or 8-bit ASCII characters.

I18n Solution:

The appropriate wide-character or multibyte equivalent function should be used within the internationalized application. In the case of a Windows Generic application, call the equivalent generic function, and use the _MBCS or _UNICODE define to map to the correct multibyte or wide-character function.

I18n Discussion:

Character testing and conversion functions

Expanded ctype functions can handle only 256 bytes because they are limited to taking a value that can be represented as an unsigned char as input. This is despite the fact that the functions actually take and return an int, which is always at least 16 if not 32 bits. This causes these functions to only work with single-byte code sets. Multibyte characters require either specific multibyte functions, or conversion to wide-characters to use the wide-character functions.

Character I/O

The single-byte character string input/output functions generally work for multibyte-character strings since this encoding method allows a single null byte to terminate the string. These functions will not work for wide-character strings because their code characters may include all-zero octets. In the case of wide-character strings (i.e. based on wchar_t characters), use the wide-character input/output functions.

As for single-character input/output functions, although they require a single byte character argument, they can be called multiple times to output a multibyte character. This works for both Windows MBCS characters, which are either 1 or 2 bytes per character, and ANSI UTF-8 platforms, where a character can occupy 1 to 6 bytes.

Special consideration needs to be given Windows MBCS applications, and Windows Unicode applications that are running on older Windows OS systems (i.e. Win 95/98/Me) that do not support UTF-16 Unicode as the system's native encoding. In the case of a Windows MBCS application, the system's multibyte code page will be used to either directly utilize the multibyte string (when running on a non-Unicode system), or to convert the application's multibyte string to a UTF-16 Unicode string before using it (on a Unicode system). In the case of a Unicode application running on a non-Unicode system, the OS will use the system's multibyte code page to convert the application's UTF-16 Unicode string to a multibyte string prior to use. In order for strings to be correctly accessed in these scenarios, the application's UI language must be in agreement with the system's multibyte code page; otherwise, characters may be lost in the conversions.

There is no issue for a Windows Unicode application running on a later version of Windows (NT/2K/XP) where the native encoding is UTF-16 Unicode. Using the wide character functions will correctly input and output the wide characters.

See File I/O for information on reading and writing non-ASCII data to files and streams.

Click on a function for more information:

_cgets

_cputs

ecvt/_ecvt



	ecvt_r


	fcvt/_fcvt

	fcvt_r

        fgetc

        fgetc_unlocked

	fgetchar/_fgetchar

        fgets

        fgets_unlocked

        fputc

	fputc_unlocked

	fputchar/_fputchar

            fputs

	    fputs_unlocked

	gcvt/_gcvt

	_gcvt_s

	_getch

	_getche

            getc

            getc_unlocked

            getchar

            getchar_unlocked

            getdelim

            getline

            gets

            isalnum

            isalpha

            isascii/__isascii

            isblank

            iscntrl

            iscsym/__iscsym

            iscsymf/__iscsymf
            
            isdigit

            isgraph

            isleadbyte

            islower

            isprint

            ispunct

            isspace

        isupper

        isxdigit

	_i64toa

	itoa/_itoa

	_itoa_s

	ltoa/_ltoa

	_ltoa_s

        putc

        putc_unlocked

	_putch

        putchar

        putchar_unlocked

        puts

        qecvt

        qecvt_r

	qfcvt

	qfcvt_r

	qgcvt

        strfry

	__toascii

        tolower/_tolower

        toupper/_toupper


	_ui64toa

	_ui64toa_s

	ultoa/_ultoa

	_ultoa_s

        ungetc

        _ungetch

			 
			 Locale-Sensitive C++ Methods