Building in RTOS Support for Safety- & Security-Critical Systems

2011-08-27T12:36:41+00:00

Safety- and security-critical systems are both focused on building technology to protect peoples’ lives. But despite the similar critical nature of the two systems, they are designed to function in very different worlds.  With different environmental assumptions, different problem spaces, and different governing legislation, each system has its own unique requirements when it comes to selecting the foundations for the technology, such as platform operating systems.

Safety- and security-critical systems are by no means mutually exclusive, but when building critical systems it is essential to have the right tool for the job. There is no such thing as a one-size-fits-all solution for both realms without sacrificing some aspect of safety or security. Therefore, LynuxWorks offers two independent RTOS products: LynxOS-178 for safety-critical platforms and LynxSecure for security-critical platforms.

LynxOS-178 for Safety-critical Systems
The primary objective for safety-critical systems is to maintain operation. Whether in avionics, medical, automotive, or SCADA systems, the slightest flaw can ripple down to a system failure and loss of life. Any technology integrated into these systems must be highly scrutinized to ensure system reliability is preserved. While robustness is a primary concern, these systems are constantly evolving to increase functionality, reduce footprint, and save operational cost. In order to meet these demands systems become more software intensive and more complex, making robustness difficult to achieve.

In an airplane, for example, multiple functions of different criticality levels are located on the same platform to reduce the cost of maintaining separate platforms. An aircraft’s cabin entertainment function might be hosted on the same operating system as its flight control function. If the cabin entertainment system corrupts the flight control function’s memory, it could lead to a deadly failure condition of the aircraft’s ailerons.

To mitigate this problem and increase system reliability, a new class of operating systems emerged: partitioning real-time operating systems or p-RTOS. P-RTOSes create isolated computing zones or “partitions” for software such that software running in one partition does not impact the behavior or performance of software running in another partition. LynxOS-178 (Figure 1 below) is a partitioning RTOS that allows safety-critical functions of varying degrees to run simultaneously on the same platform without having to worry about functions corrupting underlying platform resources or starving out other timing critical functions. LynxOS-178 achieves this partitioning framework through time, space, and resource partitioning.

Time partitioning allocates a strict budget of CPU time for each partition. Using these CPU budgets, a system timer and interrupt and a schedule, LynxOS-178 operates in a privileged mode to control the order in which a partition executes on the CPU and the amount of CPU time software in each partition is granted. If a function within a partition attempts to overrun its CPU budget, the timer interrupt fires, LynxOS-178 takes control of the CPU, preempts the offending function, and allows functions within the next partition in the schedule to run.

Space partitioning allocates an independent quota of memory (for example, RAM and stack space) to each partition. LynxOS-178 uses the CPU’s Memory Management Unit (MMU) to enforce this partitioning.

Resource partitioning grants each partition is explicit access (read/write, write-only, read-only, and none) to resources (interrupts and I/O devices), and LynxOS-178 enforces these access controls.

 

Figure 1LynxOS-178 Partitioning RTOS

LynxOS-178 is named after the aviation safety-critical software development guidelines, “DO-178B/C”, authored by the RTCA (Radio Technical Commission for Aeronautics) and accepted by the FAA as the standard for approving all new aviation software. DO-178B/C defines Design Assurance Levels (DAL) A through E to assign to software components. Software components go through safety assessment process and hazard analysis to determine which level is assigned to the software component. Assigning a higher level (A being the highest) to a software component increases the process rigor required to verify software correctness, which drastically increases cost and program schedule.

Using LynxOS-178’s partitioning platform, software components at varying criticality levels can be broken apart and placed into isolated partitions which increases system reliability, simplifies the system architecture, and ultimately reduces the cost of system development and certification.

LynxSecure for Security-critical Systems
National security relies on the explicit control of information, and the goal of all security-critical systems is to ensure that access to all system information is authorized. The slightest flaw in a system’s security enforcing function, such as a crypto-algorithm or random number generator, is a national threat.

Traditionally, sensitive information is protected on completely isolated infrastructures, ensuring that no unauthorized user gains access to these infrastructures and that no unauthorized data infiltrates or exfiltrates these infrastructures. But as government organizations mobilize, mission success puts higher demands on the ability to access and share information.

In response to this demand, government, industry, and academia have developed the Multiple Independent Levels of Security (MILS) specification. MILS allows a single information system to simultaneously process data of different security domains while maintaining isolation between the domains. This capability offers a multitude of benefits in both productivity and cost savings. One of the first demonstrations of a MILS system was a user PC that hosted multiple Operating Systems of different security levels. This gave users the ability to access their data of separate security levels from a single device while at the same time reducing the number of previously required machines.

The foundation of the MILS architecture is a separation kernel (SK), which is an RTOS that explicitly controls platform resources to create isolated computing partitions, ensuring that all information processed within a partition remains in that partition and does not leak out through underlying side channels.

LynxSecure (Figure 2 below) is a MILS Separation Kernel capable of OS para-virtualization and full-virtualization, making it both a separation kernel and hypervisor. Using LynxSecure, various security-critical information systems can be composed by simply creating protection boundaries around security-critical software components and explicitly controlling the flow of information between these boundaries. Within a protection boundary, users can run software as simple as a message guard or as complex as a fully virtualized instance of Microsoft Windows 7. In either case the security architecture of the system remains the same.

 

Figure 2LynxSecure Separation Kernel & Hypervisor

LynxSecure achieves its separation properties through similar means as LynxOS-178 by providing time, space, and resource partitioning between security domains. However, the two RTOSs are not related. Abiding by the principle that security must be built-in rather than bolted on, LynuxWorks made the decision that in order to develop an RTOS for the foundation of security-critical systems it must be built from the ground up.

Therefore LynxSecure was written from scratch, designed and implemented with different requirements under different environmental assumptions than LynxOS-178.

LynxSecure assumes that it operates in a malicious environment, that software both inside and out of the security domains is trying to breach and bypass the separation enforced by LynxSecure. Given these assumptions, additional protection mechanisms are in place to actively protect the integrity of LynxSecure security enforcing functions.

LynxSecure ensures that all code with privileged access to platform resources is sufficient and necessary to host guest partitions, maintain separation, and control information flow. This greatly differs from the design of LynxOS-178, which includes extra privileged functionality such as a network stack, device drivers, and real-time multithreading API. By only implementing the essential Separation Kernel and virtualization logic, LynxSecure drastically reduces its attack surface.

LynxSecure was designed to achieve the most stringent Common Criteria Evaluation Assurance Level, EAL 7, such that the security enforcing components can be proven correct according to the security requirements defined in the Separation Kernel Protection Profile (SKPP). In order to achieve EAL 7, LynxSecure was designed to be small and simple enough to undergo the process of mathematical formal verification.

Summary
As stated, safety-critical and security-critical systems are not mutually exclusive. Security is becoming a greater concern for safety-critical systems. As systems are further integrated and remotely managed the threat level rises for safety-critical systems, making information security a fundamental requirement. In these circumstances, LynxSecure, with its virtualization capabilities, can be used as a foundational layer to provide protection boundaries around entire safety-critical systems by hosting multiple instances of LynxOS-178, offering both a safety and security-critical solution.

Will Keegan is a security software specialist at LynuxWorks.  He has over 5 years of experience working in security-critical and safety-critical industries. He previously served as a product and sales engineer for OIS where he worked on the development and marketing of various high assurance cryptographic network and embedded middleware products. His was also a network engineer for USAA, maintaining a world-class data center. He graduated from the University of Texas at Austin, earning a B.S. in Computer Science. He can be contacted at wkeegan@lnxw.com.