Separation Kernel for a Secure Real-Time Operating System

By Rance J. DeLong, LynuxWorks

The separation kernel (SK), first proposed by Rushby is a combination of hardware and software that permits multiple functions to be realised on a common set of physical resources without unwanted mutual interference. The MILS initiative, a coalition of leaders from government, industry, and academia, has promulgated the vision of a marketplace for standardized commercial components serving as modular building blocks for high-assurance secure systems. The technical foundation adopted for the so-called MILS architecture is a separation kernel. MILS encourages an aggressive factoring of components and subsystems into units that are small enough to be subject to rigorous scrutiny, and possibly differential degree and type of scrutiny.

The modern definition of a separation kernel is captured in the security functional and assurance requirements set forth in the US Government protection profile for separation kernels in environments requiring high robustness, version 1.03 (SKPP). The SKPP, written within the framework established by the common criteria provides criteria for a systematic assessment of COTS separation kernels under the oversight of the common criteria evaluation and validation scheme (CCEVS) in the United States. High robustness corresponds approximately with the common criteria’s EAL 6/7, while medium robustness corresponds approximately with EAL 4/5.

Any COTS avionics partitioning kernel would partially meet the functional requirements of a minimal interpretation of the SKPP, that is, to provide static, coarse-grained partitioning. Such a kernel would then need to be extended to meet the information flow policy enforcement requirements, and other security functional requirements of the SKPP. The bigger challenge would be to meet the security assurance requirements of the Common Criteria for high assurance, and those of the SKPP in particular. These requirements include the use of formal methods to specify the security policy model and the functional specification (at a minimum) and to prove their correspondence. It is beyond dispute that formal methods are best applied during the development of a product rather than after the fact.

The conventional wisdom since circa 1970 has been that it is costly and risky to attempt to retrofit security into an existing product. In the context of the common criteria, EAL 4 is widely considered to be the highest assurance level at which a pre-existing product could be practically and cost-effectively evaluated. LynuxWorks™ estimated that the cost of a security retrofit would amount to many millions of dollars, and would yield a less than optimal result in terms of quality, extensibility, and future maintenance cost. There is considerable cost, schedule and technical risk in a plan that is founded on the hope of successfully evaluating a pre-existing product not developed under the security assurance requirements of the SKPP. This is because the SKPP assurance requirements demand particular characteristics of the internal structure of a product, which are unlikely to have been met serendipitously before the requirements had even been written. The clinching assurance consideration is the strict requirement that not only are certain supporting artifacts required to be presented in support of the evaluation, but evidence is required to demonstrate that the development of the artifacts coincided with the development of the product. This requirement is intended to eliminate the loophole of developing high-assurance artifacts after the product is built, when they would have little or no influence on the intrinsic assurance of the product.

Apart from the daunting issues related to the assurance of pre-existing products, functionality was another area of concern. As in the case of the assurance requirements, the SKPP contains functional requirements that are not met by any pre-existing product, though the SKPP does admit a degenerate separation kernel model that existing products could find within reach of appropriate functional modifications. One of the most well-known and respected principles of secure system design is the principle of least privilege, which states that an entity should not be granted more privilege than is necessary for it to perform its function. In this way the damage that the entity could do, should it fail, is limited. To put this principle into practice in a computer system requires that the system provide a sufficiently fine granularity of privilege. A well-known weakness of traditional UNIX® systems is the root user, which can break all the rules. The solution is to factor the privileges of the all-powerful root into many small privileges. The principle of least privilege may be applied in many dimensions of secure system design.

The figure illustrates LynxSecure running three different paravirtualised guest operating systems in virtual machines and multiple instances of the minimal runtime implementing high-assurance applications (or parts of such applications) in small partitions.

In a separation kernel the principle of least privilege demands that the SK can support a sufficiently fine granularity of resources and control of the interactions of active entities with resources and with each other. As mentioned previously, the SKPP permits a degenerate case of resource definition and information flow control wherein the SK creates only partitions and controls the flow of data between partitions. However, the SKPP encourages a more granular approach wherein the SK creates subjects, resources, and partitions, and can control the flow of data among subjects and resources and/or among partitions. This so-called least privilege separation kernel (LPSK) is described at length in the literature by several of the SKPP authors. In a LPSK a partition consists of a set of subjects and a set of resources, either of which may be empty. The information flow policy may be defined at the level of subjects and resources.

A real-time system and a safety-critical system in particular requires determinism of operation and service to the applications it supports. Tasks must receive their allocated portions of memory and processor time with high reliability. An added consideration in a security-critical system is that of covert channels, whereby colluding users can convey information between resources in a way that would be forbidden by the policy. To achieve these ends, RTOSs typically provide a fixed schedule consisting of a major frame that allocates fixed time slices to subjects/partitions and can make the scheduled changes of control very accurately.

A static fixed schedule is most useful when the respective duty cycles of the tasks are predictable and a schedule can be devised that meets the needs of each task. As a system is used for less predictable workloads, or is used for combinations of predictable and unpredictable workloads, its fixed schedule becomes a millstone, and processor capacity is wasted by over-provisioning the tasks having unpredictable workloads. When tasks demanding determinism share resources with non-deterministic tasks we are motivated to find a way to meet the needs of both—the flexible scheduling of LynxSecure does this.

Traditional deeply embedded applications have been well served by a partitioning kernel with a configuration and schedule fixed at design time. Higher level applications, servers, workstations, and future information appliances, however, will exhibit an increasing need for dynamism. The changing shape of coalitions and the changing workload of components, due to reconfiguration at the level of systems-ofsystems, will place greater demands for dynamism upon infrastructure components. Successful building blocks for systems of the future will be those possessing hardware flexibility, software flexibility, and dynamic reconfigurability.

Resource configuration is the process of constructing resources from a collection of primitive resources and giving them identities. Dynamic reconfiguration is the process of reconstituting resources and changing their identities within the bounds of a subset of the collection of primitive resources. A configuration change agent may change the number, identity, and protection attributes of resources such as memory regions, and allocate those regions to partitions within a domain. As with scheduling, dynamic reconfiguration must be performed by a trusted agent. If the agent were permitted system-wide configuration privileges then it would have to be trusted at the level of the kernel itself, because every activity running on the kernel could potentially be disrupted by a faulty or malicious configuration change agent.

Memory loaning is a functionality that LynuxWorks has devised in response to the requirement for efficient support of MILS middleware, specifically zero-copy inter-partition data movement needed for the partitioning communications system (PCS). Memory loaning is a response to the zero-copy objective, and is itself one behavior of an even more general cluster of features that enables the creation of trusted object managers.