Writing programs that access large files
AIX® supports files that are larger than 2 gigabytes (2 GB). This section assists programmers in understanding the implications of large files on their applications and to assist them in modifying their applications. Application programs can be modified, through programming interfaces, to be aware of large files. The file system programming interfaces generally are based on the off_t data type.
Implications for existing programs
The 32-bit application environment that all applications used prior to AIX 4.2 remains unchanged. However, existing application programs cannot handle large files.
For example, the st_size field in the stat structure, which is used to return file sizes, is a signed, 32-bit long. Therefore, that stat structure cannot be used to return file sizes that are larger than LONG_MAX. If an application attempts to use the stat subroutine with a file that is larger than LONG_MAX, the stat subroutine will fail, and errno will be set to EOVERFLOW, indicating that the file size overflows the size field of the structure being used by the program.
This behavior is significant because existing programs that might not appear to have any impacts as a result of large files will experience failures in the presence of large files even though the file size is irrelevant.
The errno EOVERFLOW can also be returned by an lseek pointer and by the fcntl subroutine if the values that need to be returned are larger than the data type or structure that the program is using. For lseek, if the resulting offset is larger than LONG_MAX, lseek will fail and errno will be set to EOVERFLOW. For the fcntl subroutine, if the caller uses F_GETLK and the blocking lock's starting offset or length is larger than LONG_MAX, the fcntl call will fail, and errno will be set to EOVERFLOW.
Open protection
Many existing application programs could have unexpected behavior, including data corruption, if allowed to operate on large files. AIX uses an open-protection scheme to protect applications from this class of failure.
In addition to open protection, a number of other subroutines offer protection by providing an execution environment, which is identical to the environment under which these programs were developed. If an application uses the write family of subroutines and the write request crosses the 2 GB boundary, the write subroutines will transfer data only up to 2 GB minus 1. If the application attempts to write at or beyond the 2GB -1 boundary, the write subroutines will fail and set errno to EFBIG. The behavior of the mmap, ftruncate, and fclear subroutines are similar.
The read family of subroutines also participates in the open-protection scheme. If an application attempts to read a file across the 2 GB threshold, only the data up to 2 GB minus 1 will be read. Reads at or beyond the 2GB -1 boundary will fail, and errno will be set to EOVERFLOW.
Open protection is implemented by a flag associated with an open file description. The current state of the flag can be queried with the fcntl subroutine using the F_GETFL command. The flag can be modified with the fcntl subroutine using the F_SETFL command.
Because open file descriptions are inherited across the exec family of subroutines, application programs that pass file descriptors that are enabled for large-file access to other programs should consider whether the receiving program can safely access the large file.
Porting applications to the large file environment
- Define _LARGE_FILES, which carefully redefines all of the relevant data types, structures, and subroutine names to their large-file enabled counterparts. Defining _LARGE_FILES has the advantage of maximizing application portability to other platforms because the application is still written to the normal POSIX and XPG interfaces. It has the disadvantage of creating some ambiguity in the code because the size of the various data items cannot be determined from looking at the code.
- Recode the application to explicitly call the large-file enabled subroutines. Recoding the application has the disadvantages of requiring more effort and reducing application portability. It can be used when the redefinition effect of _LARGE_FILES would have a considerable negative impact on the program or when it is desirable to convert only a very small portion of the program.
In either case, the application program must be carefully audited to ensure correct behavior in the new environment.
Using _LARGE_FILES
In the default compilation environment, the off_t data type is defined as a signed, 32-bit long. If the application defines _LARGE_FILES before the inclusion of any header files, then the large-file programming environment is enabled and off_t is defined to be a signed, 64-bit long long. In addition, all of the subroutines that deal with file sizes or file offsets are redefined to be their large-file enabled counterparts. Similarly, all of the data structures with embedded file sizes or offsets are redefined.
The following table shows the redefinitions that occur in the _LARGE_FILES environment:
Entity | Redefined as | Header file |
---|---|---|
off_t Object | long long | <sys/types.h> |
fpos_t Object | long long | <sys/types.h> |
struct stat Structure | struct stat64 | <sys/stat.h> |
stat Subroutine | stat64() | <sys/stat.h> |
fstat Subroutine | fstat64() | <sys/stat.h> |
lstat Subroutine | lstat64() | <sys/stat.h> |
mmap Subroutine | mmap64() | <sys/mman.h> |
lockf Subroutine | lockf64() | <sys/lockf.h> |
struct flock Structure | struct flock64 | <sys/flock.h> |
open Subroutine | open64() | <fcntl.h> |
creat Subroutine | creat64() | <fcntl.h> |
F_GETLK Command Parameter | F_GETLK64 | <fcntl.h> |
F_SETLK Command Parameter | F_SETLK64 | <fcntl.h> |
F_SETLKW Command Parameter | F_SETLKW64 | <fcntl.h> |
ftw Subroutine | ftw64() | <ftw.h> |
nftw Subroutine | nftw64() | <ftw.h> |
fseeko Subroutine | fseeko64() | <stdio.h> |
ftello Subroutine | ftello64() | <stdio.h> |
fgetpos ubroutine | fgetpos64() | <stdio.h> |
fsetpos Subroutine | fsetpos64() | <stdio.h> |
fopen Subroutine | fopen64() | <stdio.h> |
freopen Subroutine | freopen64() | <stdio.h> |
lseek Subroutine | lseek64() | <unistd.h> |
ftruncate Subroutine | ftruncate64() | <unistd.h> |
truncate Subroutine | truncate64() | <unistd.h> |
fclear Subroutine | fclear64() | <unistd.h> |
pwrite Subroutine | pwrite64() | <unistd.h> |
pread Subroutine | pread64() | <unistd.h> |
struct aiocb Structure | struct aiocb64 | <sys/aio.h> |
aio_read Subroutine | aio_read64() | <sys/aio.h> |
aio_write Subroutine | aio_write64() | <sys/aio.h> |
aio_cancel Subroutine | aio_cancel64() | <sys/aio.h> |
aio_suspend Subroutine | aio_suspend64() | <sys/aio.h> |
aio_return Subroutine | aio_return64() | <sys/aio.h> |
aio_error Subroutine | aio_error64() | <sys/aio.h> |
liocb Structure | liocb64 | <sys/aio.h> |
lio_listio Subroutine | lio_listio64() | <sys/aio.h> |
Using 64-bit file system subroutines
<sys/types.h>
typedef long long off64_t;
typedef long long fpos64_t;
<fcntl.h>
extern int open64(const char *, int, ...);
extern int creat64(const char *, mode_t);
#define F_GETLK64
#define F_SETLK64
#define F_SETLKW64
<ftw.h>
extern int ftw64(const char *, int (*)(const char *,const struct stat64 *, int), int);
extern int nftw64(const char *, int (*)(const char *, const struct stat64 *, int,struct FTW *),int, int);
<stdio.h>
extern int fgetpos64(FILE *, fpos64_t *);
extern FILE *fopen64(const char *, const char *);
extern FILE *freopen64(const char *, const char *, FILE *);
extern int fseeko64(FILE *, off64_t, int);
extern int fsetpos64(FILE *, fpos64_t *);
extern off64_t ftello64(FILE *);
<unistd.h>
extern off64_t lseek64(int, off64_t, int);
extern int ftruncate64(int, off64_t);
extern int truncate64(const char *, off64_t);
extern off64_t fclear64(int, off64_t);
extern ssize_t pread64(int, void *, size_t, off64_t);
extern ssize_t pwrite64(int, const void *, size_t, off64_t);
extern int fsync_range64(int, int, off64_t, off64_t);
<sys/flock.h>
struct flock64;
<sys/lockf.h>
extern int lockf64 (int, int, off64_t);
<sys/mman.h>
extern void *mmap64(void *, size_t, int, int, int, off64_t);
<sys/stat.h>
struct stat64;
extern int stat64(const char *, struct stat64 *);
extern int fstat64(int, struct stat64 *);
extern int lstat64(const char *, struct stat64 *);
<sys/aio.h>
struct aiocb64
int aio_read64(int, struct aiocb64 *):
int aio_write64(int, struct aiocb64 *);
int aio_listio64(int, struct aiocb64 *[],
int, struct sigevent *);
int aio_cancel64(int, struct aiocb64 *);
int aio_suspend64(int, struct aiocb64 *[]);
struct liocb64
int lio_listio64(int, struct liocb64 *[], int, void *);
Common pitfalls in using the large file environment
Porting of application programs to the large-file environment can expose a number of different problems in the application. These problems are frequently the result of poor coding practices, which are harmless in a 32-bit off_t environment, but which can manifest themselves when compiled in a 64-bit off_t environment. Some of the more common problems and solutions are discussed in this section.
Improper use of data types
A common source of problems with application programs is a failure to use the proper data types. If an application attempts to store file sizes or file offsets in an integer variable, the resulting value will be truncated and lose significance. To avoid this problem, use the off_t data type to store file sizes and offsets.
Incorrect:
int file_size;
struct stat s;
file_size = s.st_size;
Better:
off_t file_size;
struct stat s;
file_size = s.st_size;
When you are passing 64-bit integers to functions as arguments or when you are returning 64-bit integers from functions, both the caller and the called function must agree on the types of the arguments and the return value.
Passing a 32-bit integer to a function that expects a 64-bit integer causes the called function to misinterpret the caller's arguments, leading to unexpected behavior. This type of problem is especially severe if the program passes scalar values to a function that expects to receive a 64-bit integer.
You can avoid problems by using function prototypes carefully. In the code fragments below, fexample() is a function that takes a 64-bit file offset as a parameter. In the first example, the compiler generates the normal 32-bit integer function linkage, which would be incorrect because the receiving function expects 64-bit integer linkage. In the second example, the LL specifier is added, forcing the compiler to use the proper linkage. In the last example, the function prototype causes the compiler to promote the scalar value to a 64-bit integer. This is the preferred approach because the source code remains portable between 32-bit and 64-bit environments.
Incorrect:
fexample(0);
Better:
fexample(0LL);
Best:
\est: