This chapter introduces files, the most important abstraction in the Unix environment, and file I/O, the basis of the Linux programming mode. This chapter covers reading from and writing to files, along with other basic file I/O operations. The chapter culminates with a discussion on how the Linux kernel implements and manages files.
Opening Files
Reading via read( )
Return Values:
Writing with write( )
Synchronized I/O
Direct I/O
Closing Files
Seeking with lseek( )
Positional Reads and Writes
Truncating Files
Opening Files
Reading via read( )
#include <unistd.h>
ssize_t read (int fd, void *buf, size_t len);
Return Values:
- The call returns a value equal to len. All len read bytes are stored in buf. The results are as intended.
- The call returns a value less than len, but greater than zero. The read bytes are stored in buf.
- The call returns 0. This indicates EOF. There is nothing to read.
- The call blocks because no data is currently available. This won’t happen in non-blocking mode.
- The call returns -1, and errno is set to EINTR. This indicates that a signal was received before any bytes were read. The call can be reissued.
- The call returns -1, and errno is set to EAGAIN. This indicates that the read would block because no data is currently available, and that the request should be reissued later. This happens only in nonblocking mode.
- The call returns -1, and errno is set to a value other than EINTR or EAGAIN. This indicates a more serious error.
ssize_t ret;
while (len != 0 && (ret = read (fd, buf, len)) != 0) {
if (ret == -1) {
if (errno == EINTR)
continue;
perror ("read");
break;
}
len -= ret;
buf += ret;
}
Writing with write( )
Synchronized I/O
Direct I/O
Closing Files
Seeking with lseek( )
Positional Reads and Writes
Truncating Files
Multiplexed I/O:
Multiplexed I/O allows an application to concurrently block on multiple file descriptors, and receive notification when any one of them becomes ready to read or write without blocking. Multiplexed I/O thus becomes the pivot point for the application, designed similarly to the following:
- Multiplexed I/O: Tell me when any of these file descriptors are ready for I/O.
- Sleep until one or more file descriptors are ready.
- Woken up: What is ready?
- Handle all file descriptors ready for I/O, without blocking.
- Go back to step 1, and start over.
Select
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>
int select (int n,
fd_set *readfds,
fd_set *writefds,
fd_set *exceptfds,
struct timeval *timeout);
FD_CLR(int fd, fd_set *set);
FD_ISSET(int fd, fd_set *set);
FD_SET(int fd, fd_set *set);
FD_ZERO(fd_set *set);
A call to select( ) will block until the given file descriptors are ready to perform I/O,or until an optionally specified timeout has elapsed.
The timeout parameter is a pointer to a timeval structure, which is defined as follows:
#include <sys/time.h>
struct timeval {
long tv_sec; /* seconds */
long tv_usec; /* microseconds */
};
FD_ISSET tests whether a file descriptor is part of a given set.FD_ISSET is used after a call from select( ) returns to test whether a given file descriptor is ready for action:
if (FD_ISSET(fd, &readfds))
/* 'fd' is readable without blocking! */
pselect( )
#define _XOPEN_SOURCE 600
#include <sys/select.h>
int pselect (int n,
fd_set *readfds,
fd_set *writefds,
fd_set *exceptfds,
const struct timespec *timeout,
const sigset_t *sigmask);
FD_CLR(int fd, fd_set *set);
FD_ISSET(int fd, fd_set *set);
FD_SET(int fd, fd_set *set);
FD_ZERO(fd_set *set);
There are three differences between pselect( ) and select( ):
- pselect( ) uses the timespec structure, not the timeval structure, for its timeout parameter. The timespec structure uses seconds and nanoseconds, not seconds and microseconds, providing theoretically superior timeout resolution. In practice, however, neither call reliably provides even microsecond resolution.
- A call to pselect( ) does not modify the timeout parameter. Consequently, this parameter does not need to be reinitialized on subsequent invocations.
- The select( ) system call does not have the sigmask parameter. With respect to signals, when this parameter is set to NULL, pselect( ) behaves like select( ).
The poll( ) system call is System V’s multiplexed I/O solution. It solves several defi-ciencies in select( ), although select( ) is still often used (again, most likely out of habit, or in the name of portability):
#include <sys/poll.h>Unlike select( ), with its inefficient three bitmask-based sets of file descriptors, poll( ) employs a single array of nfds pollfd structures, pointed to by fds. The structure is defined as follows:
int poll (struct pollfd *fds, unsigned int nfds, int timeout);
#include <sys/poll.h>POLLIN | POLLPRI is equivalent to select( )’s read event, and POLLOUT | POLLWRBAND is equivalent to select( )’s write event. POLLIN is equivalent to POLLRDNORM | POLLRDBAND, and POLLOUT is equivalent to POLLWRNORM.
struct pollfd {
int fd; /* file descriptor */
short events; /* requested events to watch */
short revents; /* returned events witnessed */
};
poll( ) Versus select( )
Although they perform the same basic job, the poll( ) system call is superior to
select( ) for a handful of reasons:
- poll( ) does not require that the user calculate and pass in as a parameter the value of the highest-numbered file descriptor plus one.
- poll( ) is more efficient for large-valued file descriptors. Imagine watching a single file descriptor with the value 900 via select( )—the kernel would have to check each bit of each passed-in set, up to the 900th bit.
- select( )’s file descriptor sets are statically sized, introducing a tradeoff: they are small, limiting the maximum file descriptor that select( ) can watch, or they are inefficient. Operations on large bitmasks are not efficient, especially if it is not known whether they are sparsely populated.* With poll( ), one can create an array of exactly the right size. Only watching one item? Just pass in a single structure.
- With select( ), the file descriptor sets are reconstructed on return, so each sub-sequent call must reinitialize them. The poll( ) system call separates the input (events field) from the output (revents field), allowing the array to be reused without change.
- The timeout parameter to select( ) is undefined on return. Portable code needs to reinitialize it. This is not an issue with pselect( ), however.
- select( ) is more portable, as some Unix systems do not support poll( ).
- select( ) provides better timeout resolution: down to the microsecond. Both ppoll( ) and pselect( ) theoretically provide nanosecond resolution, but in practice, none of these calls reliably provides even microsecond resolution.