30 years of QNX: First multicore-ready RTOS

This story starts in 1997. But to set some context, let's fast-forward to 2010 and look at a snapshot of CPU usage on my quad-core PC:

When you look at this snapshot, two things stand out. First, it appears that my PC has 8 CPU cores, not 4. That's because each core supports simultaneous multithreading; i.e. it can perform two computing tasks at the same time. As a result, the operating system sees each core as not one, but two, processors.

Second, each core shows a brief but intense spike in CPU usage. Now, this phenomenon could have two explanations: 1) multiple software applications suddenly required a lot of CPU cycles at almost exactly the same time, or 2) a single application spawned multiple execution threads to split its workload across the various cores.

In this case, it's door number 2: A multi-threaded image-processing application harnessed the compute power of every core to render an 18-megapixel photo at very high speed — much faster than if the application had used a single core.

To perform this magic, a computer needs more than a multicore processor. It also needs an OS that supports symmetric multiprocessing, or SMP.

SMP for embedded systems?
Rewind to 1997. Back then, SMP was still the domain of operating systems for large servers and other compute-intensive applications. These systems didn't have multicore processors, of course, but did they have multiple discrete processors running on the same board.

The size, cost, and power consumption of these multi-processor systems put them beyond the reach of most embedded systems, where the prevailing design mantra isn’t “do more with more”, but "do more with less." Thus, the idea of adding SMP to an operating system for embedded systems seemed exotic at best. Yet, that is exactly what QNX did back in 1997.

It turned out to be a prescient move. First of all, networking equipment companies were starting to realize that SMP could help address several of their computing problems, such as maintaining routing tables that contained hundreds of thousands of entries. As a result, these companies embraced QNX SMP big time in their high-end routers. And, when multicore chips started to become available, these same customers found the migration process very natural — they simply continued to use the same code and the same OS as they had before.

SMP in a car?
This early support also allowed QNX SMP to become very mature by the time multicore chips started to move downmarket into cost- and power-sensitive devices. Audi, for example, is using QNX SMP to drive an ARM Cortex-A9 multi-core processor in its next-generation in-car infotainment systems.

SMP in a car. Who would have thought?

Blast from the past
Now here's something that hasn't seen the light of day since the late 1990s: The press release introducing QNX's support for SMP.


QNX Brings Power of SMP to Embedded Telecom Systems

Telecommunications developers can now combine the raw performance of SMP systems with the hard realtime determinism of the QNX/Neutrino RTOS.

SUPERCOMM’97 Conference, New Orleans LA, June 3, 1997 - QNX Software Systems (QSSL) demonstrated today that QNX/Neutrino delivers both realtime determinism and a near-linear speedup in processing power with the addition of CPUs on SMP machines. QNX achieves this high performance along with robustness and fine-grained parallelism with their new incredibly small kernel (35K). As a result, QSSL is bringing the power of SMP down to embedded systems.

“Our SMP version of QNX/Neutrino is ideal for embedded environments where system capacity is stressed,” says Dan Dodge, Vice President of R&D at QNX Software Systems. “Although Neutrino is optimized for deeply embedded systems, it’s also fully scalable. You don’t have to change any application code to transform a single-processor system into a high-end SMP cluster  simply add more processors and restart the system.”

Scale Beyond a Single SMP Machine, Build Immense Systems
QNX/Neutrino supports up to 8 CPUs per SMP machine and allows the networking of multiple SMP machines (each with up to 8 CPUs) to create architectures with immense processing power. Using standard, off-the-shelf hardware, designers can link hundreds of machines into a single QNX/Neutrino system!

QNX/Neutrino’s native message-passing Inter-Process Communication (IPC) seamlessly and transparently turns the network of independent SMP machines into a single, logical kernel.

Optimized for Very High-End Applications
Since QNX/Neutrino offers the best possible utilization of available CPU cycles, it’s ideal for very high-end realtime applications such as high-capacity telecom switches, image processing, and aircraft simulators.

Scale Systems in the Field
QNX/Neutrino applications can run on uniprocessor systems, multiprocessor systems, and network-connected SMP machines. With this range of flexibility, developers can ship systems with one processor, then expand the system’s processing power as the need arises.

In both uniprocessor and SMP systems, QNX/Neutrino’s realtime scheduling ensures that the highest priority threads are run on the available CPUs; whenever possible, a thread is dispatched to the CPU it ran on previously to optimize cache performance. Since the SMP version of QNX/Neutrino supports a processor “affinity mask,” designers can further optimize performance by selecting which CPU(s) each thread may run on.

“We expect to see a variety of applications for our ZT 5520, the market’s first 2-slot CompactPCI SBC with dual Pentium Pro processors,” says Rob Davidson, CompactPCI Product Manager of Ziatech Corporation. “Teamed with the SMP version of QNX/Neutrino, the ZT 5520 provides highly reliable symmetric multiprocessing, yielding the maximum available up-time and the maximum amount of processing for embedded applications.”

Lean Microkernel Allows Simplified Locks for Faster Performance
Because traditional monolithic kernels contain the bulk of all operating system services, they require numerous performance-robbing spin-locks in the main code paths to support SMP. In contrast, QNX/Neutrino’s lean microkernel architecture requires few locks, resulting in faster performance.

Access to data structures shared between threads and processes across CPUs is protected using standard POSIX mutexes, condition variables, and semaphores. Synchronized access to structures shared between threads and interrupt handlers across CPUs is provided through an exclusion lock available to both the thread and the interrupt handler.

Full Memory Protection
For very large systems, up to 4G of memory addressing is supported. To match the size and complexity of each target system, QNX/Neutrino offers four levels of memory protection ranging from no protection (for systems without MMU hardware) to full memory protection between programs. With memory protection, embedded PCs can intelligently recover from software failures without a system shutdown and field technicians can perform detailed postmortem diagnostics.


No comments: