Now Any Embedded System Can Be Hi-Reliability

The distinction between ‘hi-reliability’ and ‘standard’ embedded software design is being blurred as the commercial and human consequences of embedded system failure escalate. Whether in avionics, medical, automotive, or SCADA systems, the slightest flaw can ripple down to a system failure and potential injury or loss of life.

Technology integrated into these systems must be highly scrutinised to ensure system reliability is preserved. While robustness is a primary concern, these systems are constantly evolving to increase functionality, reduce footprint, and save operational cost.

In order to meet these demands systems become more software intensive and more complex, making robustness difficult to achieve.

At the same time, the increasingly connected world of commercial embedded systems leaves them vulnerable to threats from the outside, leading to increased demand to design with security in mind.

High reliability systems like power and lighting are as likely to be brought down by malware as they are by software failures. For example security services warned the London 2012 Olympic authorities about a potential cyber-attack on their power supply just days before the Games opening ceremony, resulting in urgent checks on a backup system and emergency overload tests on lighting and communications networks.

As the Guardian newspaper commented: “A successful attack on the computerised power system would have caused chaos, stopped the ceremony, and blocked all communications.” It is not just human lives at risk, but almost civilisation itself.

The trend towards multi-core processors makes the twin objectives of safety and security harder to achieve. Furthermore, few commercial embedded software developers have the budget for the escalating requirements of certification.

Commercially available off-the-shelf solutions originally developed for aerospace designs can provide the answer, by offering secure partitioning, virtualization, and pre-certification.

Secure partitioning
In industrial, medical and infrastructure applications, multiple functions of different criticality levels are located on the same platform to improve size, weight and power characteristics as well as reducing the cost of maintaining separate platforms.

Just as an aircraft’s cabin entertainment function might be hosted on the same operating system as its flight control systems, the same situation might be applied to medical monitoring and patient infotainment systems, or premises security and lighting. In all these examples it is important to ensure that failure of the non-critical function cannot compromise the critical one.

A new class of operating systems – described as partitioning real-time operating systems or p-RTOS – has emerged to mitigate this problem and thereby increase system reliability. P-RTOSes create isolated computing zones or “partitions” such that software running in one partition does not affect the performance of software running in another.

A partitioning RTOS such as LynxOS-178 allows safety-critical functions of varying degrees to run simultaneously on the same platform without having to worry about functions corrupting underlying platform resources or starving out other timing critical functions. The RTOS achieves this framework by enforcing a CPU time budget, memory allocation space, and access to interrupts and I/O resources independently for each partition.

Virtualisation
Complementing the partitioning approach, operating systems that support virtualisation can isolate critical processes running on different virtual machines. Protection and separation of processes, resources and partitions are enabled through the use of a separation kernel; and further security is offered by using a virtual machine manager (or hypervisor), which permits multiple operating systems to run concurrently on the host computer.

Software such as the LynxSecure Separation Kernel Hypervisor (SKH) combines both functions in one product. The hypervisor maintains an abstraction layer between the hardware resources of the target system on one hand and the hosted operating systems and software on the other.

Meanwhile the separation kernel partitions and isolates the software and hardware resources from each other. This approach allows multiple subjects to run concurrently on the same hardware, while strictly enforcing policies of isolation and information flow control. Critical processes are totally isolated from vulnerable applications such as Web browsers or mail servers.

Certification
Partitioning and security software need to undergo extensive testing and proof requirements as part of the overall certification process. It is particularly onerous in a sector that traditionally builds its applications from scratch and is only slowing moving to modular software and reusable components.

Developers no longer have the budgets to navigate the complex certification process, gather all the evidence required to make the claim, and pay for its examination by the certification authority. It is even more costly if errors are found late in the development cycle or, worse, during certification. Developers do need to get it right the first time, or at the very least catch any mistakes early.

The burden of compliance is being eased by modular COTS operating systems that encourage reuse and Open Source standards.