This is the Linux kernel capabilities FAQ Its history, to the extent that I am able to reconstruct it is that v2.0 was posted to the Linux kernel list on 1999/04/02 by Boris Tobotras. Thanks to Denis Ducamp for forwarding me a copy. Cheers Andrew Linux Capabilities FAQ 0.2 ========================== 1) What is a capability? The name "capabilities" as used in the Linux kernel can be confusing. First there are Capabilities as defined in computer science. A capability is a token used by a process to prove that it is allowed to do an operation on an object. The capability identifies the object and the operations allowed on that object. A file descriptor is a capability. You create the file descriptor with the "open" call and request read or write permissions. Later, when doing a read or write operation, the kernel uses the file descriptor as an index into a data structure that indicates what operations are allowed. This is an efficient way to check permissions. The necessary data structures are created once during the "open" call. Later read and write calls only have to do a table lookup. Operations on capabilities include copying capabilities, transferring capabilities between processes, modifying a capability, and revoking a capability. Modifying a capability can be something like taking a read-write filedescriptor and making it read-only. A capability often has a notion of an "owner" which is able to invalidate all copies and derived versions of a capability. Entire OSes are based on this "capability" model, with varying degrees of purity. There are other ways of implementing capabilities than the file descriptor model - traditionally special hardware has been used, but modern systems also use the memory management unit of the CPU. Then there is something quite different called "POSIX capabilities" which is what Linux uses. These capabilities are a partitioning of the all powerful root privilege into a set of distinct privileges (but look at securelevel emulation to find out that this isn't necessary the whole truth). Users familiar with VMS or "Trusted" versions of other UNIX variants will know this under the name "privileges". The name "capabilities" comes from the now defunct POSIX draft 1003.1e which used this name. 2) So what is a "POSIX capability"? A process has three sets of bitmaps called the inheritable(I), permitted(P), and effective(E) capabilities. Each capability is implemented as a bit in each of these bitmaps which is either set or unset. When a process tries to do a privileged operation, the operating system will check the appropriate bit in the effective set of the process (instead of checking whether the effective uid of the process i 0 as is normally done). For example, when a process tries to set the clock, the Linux kernel will check that the process has the CAP_SYS_TIME bit (which is currently bit 25) set in its effective set. The permitted set of the process indicates the capabilities the process can use. The process can have capabilities set in the permitted set that are not in the effective set. This indicates that the process has temporarily disabled this capability. A process is allowed to set a bit in its effective set only if it is available in the permitted set. The distinction between effective and permitted exists so that processes can "bracket" operations that need privilege. The inheritable capabilities are the capabilities of the current process that should be inherited by a program executed by the current process. The permitted set of a process is masked against the inheritable set during exec(). Nothing special happens during fork() or clone(). Child processes and threads are given an exact copy of the capabilities of the parent process. 3) What about other entities in the system? Users, Groups, Files? Files have capabilities. Conceptually they have the same three bitmaps that processes have, but to avoid confusion we call them by other names. Only executable files have capabilities, libraries don't have capabilities (yet). The three sets are called the allowed set, the forced set, and the effective set. The allowed set indicates what capabilities the executable is allowed to receive from an execing process. This means that during exec(), the capabilities of the old process are first masked against a set which indicates what the process gives away (the inheritable set of the process), and then they are masked against a set which indicates what capabilities the new process image is allowed to receive (the allowed set of the executable). The forced set is a set of capabilities created out of thin air and given to the process after execing the executable. The forced set is similar in nature to the setuid feature. In fact, the setuid bit from the filesystem is "read" as a full forced set by the kernel. The effective set indicates which bits in the permitted set of the new process should be transferred to the effective set of the new process. The effective set is best thought of as a "capability aware" set. It should consist of only 1s if the executable is capability-dumb, or only 0s if the executable is capability-smart. Since the effective set consists of only 0s or only 1s, the filesystem can implement this set using a single bit. NOTE: Filesystem support for capabilities is not part of Linux 2.2. Users and Groups don't have associated capabilities from the kernel's point of view, but it is entirely reasonable to associate users or groups with capabilities. By letting the "login" program set some capabilities it is possible to make role users such as a backup user that will have the CAP_DAC_READ_SEARCH capability and be able to do backups. This could also be implemented as a PAM module, but nobody has implemented one yet. 4) What capabilities exist? The capabilities available in Linux are listed and documented in the file /usr/src/linux/include/linux/capability.h. 5) Are Linux capabilities hierarchical? No, you cannot make a "subcapability" out of a Linux capability as in capability-based OSes. 6) How can I use capabilities to make sure Mr. Evil Luser (eluser) can't exploit my "suid" programs? This is the general outline of how this works given filesystem capability support exists. First, you have a PAM module that sets the inheritable capabilities of the login-shell of eluser. Then for all "suid" programs on the system, you decide what capabilities they need and set the _allowed_ set of the executable to that set of capabilities. The capability rules new permitted = forced | (allowed & inheritable) means that you should be careful about setting forced capabilities on executables. In a few cases, this can be useful though. For example the login program needs to set the inheritable set of the new user and therefore needs an almost full permitted set. So if you want eluser to be able to run login and log in as a different user, you will have to set some forced bits on that executable. 7) What about passing capabilities between processes? Currently this is done by the system call "setcap" which can set the capabilities of another process. This requires the CAP_SETPCAP capability which you really only want to grant a _few_ processes. CAP_SETPCAP was originally intended as a workaround to be able to implement filesystem support for capabilities using a daemon outside the kernel. There has been discussions about implementing socket-level capability passing. This means that you can pass a capability over a socket. No support for this exists in the official kernel yet. 8) I see securelevel has been removed from 2.2 and are superceeded by capabilities. How do I emulate securelevel using capabilities? The setcap system call can remove a capability from _all_ processes on the system in one atomic operation. The setcap utility from the libcap distribution will do this for you. The utility requires the CAP_SETPCAP privilege to do this. The CAP_SETPCAP capability is not enabled by default. libcap is available from ftp://ftp.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.2/ 9) I noticed that the capability.h file lacks some capabilities that are needed to fully emulate 2.0 securelevel. Is there a patch for this? Actually yes - funny you should ask :-). The problem with 2.0 securelevel is that they for example stop root from accessing block devices. At the same time they restrict the use of iopl. These two changes are fundamentally different. Blocking access to block devices means restricting something that usually isn't restricted. Restricting access to the use of iopl on the other hand means restricting (blocking) access to something that is already blocked. Emulating the parts of 2.0 securelevel that restricts things that are normally not restricted means that the capabilites in the kernel has to have a set of capabilities that are usually _on_ for a normal process (note that this breaks the explanation that capabilities are a partitioning of the root privileges). There is an experimental patch at ftp://ftp.guardian.no/pub/free/linux/capabilities/patch-cap-exp-1 which implements a set of capabilities with the "CAP_USER" prefix: cap_user_sock - allowed to use socket() cap_user_dev - allowed to open char/block devices cap_user_fifo - allowed to use pipes These should be enough to emulate 2.0 securelevel (tell me if we need something more). 10) Seems I need a CAP_SETPCAP capability that I don't have to make use of capabilities. How do I enable this capability? Change the definition of CAP_INIT_EFF_SET and CAP_INIT_INH_SET to the following in include/linux/capability.h: #define CAP_INIT_EFF_SET { ~0 } #define CAP_INIT_INH_SET { ~0 } This will start init with a full capability set and not with CAP_SETPCAP removed. 11) How do I start a process with a limited set of capabilities? Get the libcap library and use the execcap utility. The following example starts the update daemon with only the CAP_SYS_ADMIN capability. execcap 'cap_sys_admin=eip' update 12) How do I start a process with a limited set of capabilities under another uid? Use the sucap utility which changes uid from root without loosing any capabilities. Normally all capabilities are cleared when changing uid from root. The sucap utility requires the CAP_SETPCAP capability. The following example starts updated under uid updated and gid updated with CAP_SYS_ADMIN raised in the Effective set. sucap updated updated execcap 'cap_sys_admin=eip' update [ Sucap is currently available from ftp://ftp.guardian.no/pub/free/linux/capabilities/sucap.c. Put it in the progs directory of libcap to compile.] 13) What are the "capability rules" The capability rules are the rules used to set the capabilities of the new process image after an exec. They work like this: pI' = pI (***) pP' = fP | (fI & pI) pE' = pP' & fE [NB. fE is 0 or ~0] I=Inheritable, P=Permitted, E=Effective // p=process, f=file ' indicates post-exec(). Now to make sense of the equations think of fP as the Forced set of the executable, and fI as the Allowed set of the executable. Notice how the Inheritable set isn't touched at all during exec(). 14) What are the laws for setting capability bits in the Inheritable, Permitted, and Effective sets? Bits can be transferred from Permitted to either Effective or Inheritable set. Bits can be removed from all sets. 15) Where is the standard on which the Linux capabilities are based? There used to be a POSIX draft called POSIX.6 and later POSIX 1003.1e. However after the committee had spent over 10 years, POSIX decided that enough is enough and dropped the draft. There will therefore not be a POSIX standard covering security anytime soon. This may lead to that the POSIX draft is available for free, however. -- Best regards, -- Boris.