2009年5月23日 星期六

LINUX System Programming -- Introduction and Essential Concepts



System Call

System programming starts with system calls. System calls (often shorted to syscalls).
  • to execute and with what parameters via machine registers.
  • As a system programmer, you usually do not need any knowledge of how the kernel handles system call invocation. That knowledge is encoded into the standard calling conventions for the architecture, and handled automatically by the compiler and the C library.

The C Library

The GNUC library provides more than its name suggests. In addition to implementing the standard C library, glibc provides wrappers for system calls, threading support, and basic application facilities.

The C Compiler

APIs and ABIs --Both define and describe the interfaces between different pieces of computer software.
  • APIs
    • Application programming interface
    • An API defines the interfaces by which one piece of software communicates with another at the source level.
    • A real-world example is the API defined by the C standard and implemented by the standard C library.
  • ABIs
    • Whereas an API defines a source interface, an ABI defines the low-level binary
      interface between two or more pieces of software on a particular architecture.
    • An ABI ensures binary compatibility, guaranteeing that a piece of object code will function on any system with the same ABI, without requiring recompilation.
    • ABIs are concerned with issues such as calling conventions, byte ordering, register use, system call invocation, linking, library behavior, and the binary object format.
    • The ABI is enforced by the toolchain—the compiler, the linker, and so on
Standards


  • POSIX
    • In the mid-1980s, the Institute of Electrical and Electronics Engineers (IEEE) spearheaded an effort to standardize system-level interfaces on Unix systems. Richard Stallman, founder of the Free Software movement, suggested the standard be named POSIX (pronounced pahz-icks), which now stands for Portable Operating System Interface.
  • C Language Standards
    • Dennis Ritchie and Brian Kernighan’s famed book, The C Programming Language
    • In 1990, the International Organization for Standardization (ISO) ratified ISO C90
    • 1995 -- ISO C95
    • This was followed in 1999 with a large update to the language, ISO C99, that introduced many new features, including inline functions, new data types, variable-length arrays, C++-style comments, and new library functions.
Linux and the Standards



Concepts of Linux Programming


Files and the Filesystem
  • Regular files
    • A regular file contains bytes of data, organized into a linear array called a byte stream.
    • Any of the bytes within a file may be read from or written to. These operations start at a specific byte, which is one’s conceptual “location” within the file. This location is called the file position or file offset.
    • Writing a byte to a file position beyond the end of the file will cause the intervening bytes to be padded with zeros.
    • it is not possible to write bytes to a position before the beginning of a file.
    • Writing a byte to the middle of a file overwrites the byte previously located at that offset. Thus, it is not possible to expand a file by writing into the middle of it.
    • The size of a file is measured in bytes, and is called its length.
    • a file is referenced by an inode (originally information node), which is assigned a unique numerical value. This value is called the inode number, often abbreviated as i-number or ino.
    • files are always opened from user space by a name, not an inode number.

  • Directories and links
    • A directory acts as a mapping of human-readable names to inode numbers. A name and inode pair is called a link.
    • Initially, there is only one directory on the disk, the root directory. This directory is usually denoted by the path /.
    • A pathname that starts at the root directory is said to be fully qualified, and is called an absolute pathname.
    • Some pathnames are not fully qualified; instead, they are provided relative to some other directory (for example, todo/plunder). These paths are called relative pathnames. When provided with a relative pathname, the kernel begins the pathname resolution in the current working directory.
    • Hard links
      • When multiple links map different names to the same inode, we call them hard links.
      • Hard links allow for complex filesystem structures with multiple pathnames pointing to the same data.
      • Deleting a file involves unlinking it from the directory structure, which is done simply by removing its name and inode pair from a directory.
      • When a pathname is unlinked, the link count is decremented by one; only when it reaches zero are the inode and its associated data actually removed from the filesystem.
    • Symbolic links (symlinks)
      • Hard links cannot span filesystems because an inode number is meaningless
        outside of the inode’s own filesystem. To allow links that can span filesystems, and that are a bit simpler and less transparent, Unix systems also implement symbolic links (often shortened to symlinks).
      • A symbolic link that points to a nonexistent file is called a broken link.
  • Special files
    • Special files are kernel objects that are represented as files.
    • Linux supports four: block device files, character device files, named pipes, and Unix domain sockets.
    • Special files are a way to let certain abstractions fit into the filesystem, partaking in the everything-is-a-file paradigm. Linux provides a system call to create a special file.
    • Device files may be opened, read from, and written to, allowing user space to access and manipulate devices (both physical and virtual) on the system.
    • Unix devices are generally broken into two groups: character devices and block devices.
      • A character device is accessed as a linear queue of bytes. The device driver places bytes onto the queue, one by one, and user space reads the bytes in the order that they were placed on the queue.
      • A block device, in contrast, is accessed as an array of bytes. The device driver maps the bytes over a seekable device, and user space is free to access any
        valid bytes in the array, in any order
    • Named pipes (often called FIFOs, short for “first in, first out”) are an interprocess
      communication
      (IPC) mechanism that provides a communication channel over a file descriptor, accessed via a special file.
    • Sockets are an advanced form of IPC that allow for communication between two different processes, not only on the same machine, but on two different machines. Unix domain sockets use a special file residing on a filesystem, often simply called a socket file.
  • Filesystems and namespaces
    • Linux, like all Unix systems, provides a global and unified namespace of files and directories.
    • A filesystem is a collection of files and directories in a formal and valid hierarchy.
    • Filesystems usually exist physically (i.e., are stored on disk), although Linux also supports virtual filesystems that exist only in memory, and network filesystems that exist on machines across the network.
    • media-specific filesystems (for example, ISO9660), network filesystems (NFS), native filesystems (ext3), filesystems from other Unix systems (XFS), and even
      filesystems from non-Unix systems (FAT).
    • The smallest addressable unit on a block device is the sector. A block device cannot transfer or access a unit of data smaller than a sector
    • The smallest logically addressable unit on a filesystem is the block. The
      block is an abstraction of the filesystem
Processes


  • Processes are object code in execution: active, alive, running programs. But they’re more than just object code—processes consist of data, resources, state, and a virtualized computer.
  • Processes begin life as executable object code, which is machine-runnable code in an executable format that the kernel understands (the format most common in Linux is ELF).
  • The most important and common sections are the text section, the data
    section
    , and the bss section.
  • Processes typically request and manipulate resources only through system calls.
  • A process’ resources, along with data and statistics related to the process, are stored inside the kernel in the process’ process descriptor.
  • Threads
    • Each process consists of one or more threads of execution (usually just called threads).
    • A thread is the unit of activity within a process, the abstraction responsible for executing code, and maintaining the process’ running state.
    • Most processes consist of only a single thread; they are called single-threaded.
      Processes that contain multiple threads are said to be multithreaded.
    • A thread consists of a stack (which stores its local variables, just as the process stack does on nonthreaded systems), processor state, and a current location in the object code (usually stored in the processor’s instruction pointer). The majority of the remaining parts of a process are shared among all threads.
    • Internally, the Linux kernel implements a unique view of threads: they are simply normal processes that happen to share some resources (most notably, an address space). In user space, Linux implements threads in accordance with POSIX 1003.1c (known as pthreads). The name of the current Linux thread implementation, which is part of glibc, is the Native POSIX Threading Library (NPTL).
  • Process hierarchy
    • Each process is identified by a unique positive integer called the process ID (pid). The pid of the first process is 1, init process.
    • New processes are created via the fork( ) system call. This system call creates a duplicate of the calling process. The original process is called the parent; the new process is called the child.
    • If a parent process terminates before its child, the kernel will reparent the child to
      the init process.
    • A process that has terminated, but not yet been waited upon, is
      called a zombie.
  • Users and Groups
    • Authorization in Linux is provided by users and groups. Each user is associated with a unique positive integer called the user ID (uid).
    • Each process is in turn associated with exactly one uid, which identifies the user running the process, and is called the process’ real uid.
    • In addition to the real uid, each process also has an effective uid, a saved uid,
      and a filesystem uid.
    • Each user may belong to one or more groups, including a primary or login group, listed in /etc/passwd, and possibly a number of supplemental groups, listed in /etc/group.
    • Each process is therefore also associated with a corresponding group ID (gid), and has a real gid, an effective gid, a saved gid, and a filesystem gid.
  • Permissions
    • Table 1-1. Permission bits and their values
  • Signals
    • Signals are a mechanism for one-way asynchronous notifications. A signal may be sent from the kernel to a process, from a process to another process, or from a process to itself.
    • Handled signals cause the execution of a user-supplied signal handler function. The program jumps to this function as soon as the signal is received, and (when the signal handler returns) the control of the program resumes at the previously interrupted instruction.
  • Interprocess Communication
    • Allowing processes to exchange information and notify each other of events is one of an operating system’s most important jobs.
    • IPC mechanisms supported by Linux include pipes, named pipes, semaphores, message queues, shared memory, and futexes (short for "fast userspace mutex", Futex are Tricky).

Headers

Linux system programming revolves around a handful of headers. Both the kernel itself and glibc provide the headers used in system-level programming. These headers include the standard C fare (for example, ), and the usual Unix offerings (say, ).

Error Handling

Table 1-2. Errors and their descriptions


沒有留言: