File and Path Functions
Related Links
Link to Wide Character File and Path Functions.
Link to Windows Generic File and Path Functions.
Internationalization (I18n) Discussion:
This class of functions accepts single-byte or multibyte string arguments. In an
ANSI UTF-16/32 or Windows UTF-16 Unicode application, the
appropriate wide-character equivalent function should be used, if available.
In the case of a Windows Generic application, call the equivalent
generic function, and use the _MBCS or _UNICODE
defines to map to the correct multibyte or wide-character function.
Internationalized Paths
This class of functions accepts path names as arguments. Path names
may vary depending on the localized version of the machine. Hardcoded
path names must be carefully evaluated within the context of internationalization.
One possible solution is to use a locale-based
path structure. In some internationalized designs, it may be
prudent to externalize
the path name.
Pathnames
In an internationalized application, it is important to understand the relationship between the
target system's support of non-ASCII filenames and the C/C++ functions that are used to create and
access files and paths.
For example, a Windows 95/98/ME system does not support UTF-16 Unicode natively, but will support filenames
containing non-ASCII characters from the system's multibyte code page. A Windows Unicode application running on
this system will result in a conversion of the Unicode filename to a multibyte string, which could
result in the loss of character support. In the case of a Windows MBCS application, the multibyte
code page used by the application should be the same as that used by the OS for full support of
that character set.
In the case of a Windows NT/2K/XP system, Unicode is supported natively. Therefore, any
Windows Unicode application will have the full range of characters when creating or accessing
filenames and paths. However, in the case of a Windows MBCS application, the system's code page
will be used to convert the MBCS filename to a UTF-16 encoded filename. And, therefore, if
non-ASCII filenames are to be used in the MBCS application, the application's code page will need
to be the same as the system's code page, for the non-ASCII filenames to be converted correctly.
On Linux/Unix platforms, the filesystem often supports UTF-8 and so although wide-character strings
will need to be converted to UTF-8 strings, filenames and paths should work as expected.
File I/O
An important consideration in an internationalized application is how it handles reading and writing
non-ASCII data to files.
Windows Platforms
On Windows platforms, file I/O operations take place in one of two translation modes: text or
binary, depending on the mode in which the file is opened. Note that a file is assumed to have multibyte
characters when in text mode, and UTF-16 Unicode characters when in binary mode.
The default mode for files is text mode, though that can be changed by directly setting the
global variable _fmode in the program. Alternatively, the binary mode can be
specified when a file-open function is called, such as _open , fopen ,
freopen , or _fsopen , overriding the current default setting of
_fmode by specifying the appropriate argument to the function.
Note that the stdin , stdout , and stderr streams always
open in text mode by default, though you can also override this default when opening any of these
files. Use _setmode to change the translation mode using the file descriptor after the file is open.
In a Windows Unicode application, if the stream I/O routines, such as fwprintf , fwscanf ,
fgetwc , fputwc , fgetws , and fputws ,
operate on a file that is open in the default text mode, there are two kinds of character
conversion that will take place:
Unicode-to-MBCS or MBCS-to-Unicode conversion. As mentioned above, when a Unicode stream-I/O
function operates in text mode, the source or destination stream is assumed to be a sequence of
multibyte characters. Therefore, the Unicode stream-input functions convert multibyte characters
to wide characters (as if by a call to the mbtowc function). For the same reason,
the Unicode stream-output functions convert wide characters to multibyte characters (as if by a
call to the wctomb function).
In the case of a carriage return/linefeed (CR-LF) translation, the operating system will convert the
two multibyte characters to a single linefeed character before the MBCS-to-Unicode conversion for
input functions, and the single linefeed character back to a CR-LF combination after the
Unicode-to-MBCS conversion for output functions.
When the Windows platform file is open in binary mode, it is assumed to be UTF-16 Unicode, and thus, no CR-LF translation or
character conversion occurs during input or output. To correctly use wcin , the global
input stream as a wide stream, or wcout , the global output stream as a wide stream,
call
_setmode(_fileno(stdin), _O_BINARY)
or
_setmode(_fileno(stdout), _O_BINARY) , respectively.
ANSI Platforms
Similar to Windows, the orientation of a stream needs to be set properly to handle either UTF-8 Unicode
characters, or UTF-16/32 wide characters. In the case of UTF-8 multibyte characters, a narrow orientation
is desired; for wide characters, a wide orientation is required. The orientation of the stream will
be set in one of three ways:
By making a call to one of the narrow I/O function calls: this will set the stream orientation to narrow.
By making a call to one of the wide-character I/O function calls: this will set the stream orientation to wide.
By calling fwide(stream, mode) to set the orientation to narrow (pass in negative mode ),
or wide (pass in positive mode ). Pass in 0 for mode to query
the stream's current orientation.
Note that it is important to never mix the use of wide and narrow operations
on a stream, as the behavior will be unpredictable. In addition, in the case of wide-character
stream I/O, it is important to set the locale for the LC_CTYPE category prior to
opening the stream.
This is because, unless the character set is specified via the mode argument (using the
ccs=CHARSET string) when the stream is opened, it will be taken from the LC_CTYPE
category of the current locale, and the associated conversion functions to convert to and from the
internal wchar_t characters will be loaded. Once set, the conversion functions will not change even if the locale selected for
the LC_CTYPE category is changed.
Click on a function for more information:
access/_access
basename
canonicalize_file_name
chdir/_chdir
chmod/_chmod
chown
chroot
creat/_creat
creat64
ctermid
dirname
fdopen/_fdopen
_findfirst/_findfirst64/_findfirsti64
_findnext/_findnext64/_findnexti64
fopen
fopen64
freopen
freopen64
_fsopen
_fullpath
fwide
_get_current_dir_name
getcwd/_getcwd
_getdcwd
getwd
lchown
link
lstat
lstat64
lutimes
_makepath
mkdir/_mkdir
mkdtemp
mkstemp
mktemp/_mktemp
open/_open
open64
opendir
pathconf
popen/_popen
readlink
realpath
remove
rename
rmdir/_rmdir
_searchenv
sopen/_sopen
_splitpath
stat/_stat
stat64/_stat64
_stati64
symlink
tempnam/_tempnam
tmpnam
tmpnam_r
truncate
truncate64
ttyname
ttyname_r
unlink/_unlink
utime/_utime
_utime64
utimes
Locale-Sensitive C++ Methods
|