3/30/2013

What has the QNX auto team been up to?

Well, let's see...


3/25/2013

Why does all the cool stuff happen while I'm away?

Now appearing in both 
Fortune and Daily Planet
Do you ever get the feeling that the party starts the minute you leave the room? Well, it just happened to me. I was on vacation only a few days last week, but while I was away, Fortune magazine and Daily Planet both did pieces on QNX. What's up with that?

But seriously, this is cool. The Fortune article covers several bases: the history of QNX in mission-critical embedded systems, the leadership that QNX enjoys in automotive, and the new QNX concept car that made its debut at 2013 CES. Meanwhile, the Daily Planet video puts you in the front seat of the concept car for a tour of its many features — from voice control and video conferencing to the virtual mechanic. (Is it just me, or do the coolest features all start with the letter 'v'?)

Read the Fortune article here (you'll need a subscription to access it). And view the Daily Planet video here.

3/11/2013

The isolation imperative: protecting software components in an ISO 26262 system

Software components can be impolite, if not downright delinquent. For instance, a component might:

  • rob other components of CPU time
  • rob other components of file descriptors and other system resources
  • access the private memory of other components
  • corrupt data shared with other components
  • create a deadlock or livelock situation with other components

Shameful, I know. But in all seriousness, this sort of behavior can wreak havoc in a safety-critical system. For instance, let's say that a component starts to perform a CPU-intensive calculation just as the system enters a failure condition. Will that component hog the CPU and prevent an alarm process from running?

The answer, of course, is that it damn well better not.

It becomes important, then, to prevent components from interfering with one another. In fact, this principle is baked into the ISO 26262 functional safety standard for road vehicles, which defines interference as:

    "...the presence of cascading failures from a sub-element with no ASIL [Automotive Safety Integrity Level] assigned, or a lower ASIL assigned, to a sub-element with a higher ASIL assigned leading to the violation of a safety requirement of the element”

To put it crudely, less important stuff can't stop more important stuff from happening.

So how do you prevent interference? One approach is through isolation. For instance, a system may implement spatial isolation between application processes. This would include mechanisms for interprocess communication and interprocess locking that prevent one process from inadvertently affecting another.

Mind you, there are multiple types of interference, so you need to implement multiple forms, or axes, of isolation. Time for a picture:




In general, you need to determine what does, and what doesn't, need to be isolated. You also need to identify which components are apt to be delinquent and build a cage around them to protect more critical components. Which brings me to a recent paper by my inestimable colleagues Chris Hobbs and Yi Zheng. It's titled "Protecting Software Components from Interference in an ISO 26262 System," and it explores techniques that can help you:

  • implement the component isolation required by ISO 26262
  • demonstrate that such isolation has been implemented

And while you're at it, check out the other titles in our "safe" whitepaper series. These include "The Dangers of Over-Engineering a Safe System" and "Ten Truths about Building Safe Embedded Software Systems."

And don't worry: there's nothing delinquent about downloading all of them.

This post originally appeared in the QNX auto blog.

3/07/2013

Can a safety-critical system be over-engineered?

Too much of a good thing?
It's a rhetorical question, of course. But hear me out.

As you can imagine, many safe systems must be designed to handle scenarios outside their intended scope. For instance, in many jurisdictions, passenger elevators must be capable of handling 11 times more weight than their recommended maximum — you just never know what people will haul into an elevator car. So, if the stated limit for a passenger elevator is 2000 pounds, the actual limit is closer to 22,000 pounds. (Do me a favor and avoid the temptation to test this for yourself.)

Nonetheless, over-engineering can sometimes be too much of a good thing. This is especially true when an over-engineered component imposes an unanticipated stress on the larger system. In fact, focusing on a specific safety issue without considering overall system dependability can sometimes yield little or no benefit — or even introduce new problems. The engineer must always keep the big picture in mind.

Case in point: the SS Eastland. In 1915 this passenger ship rolled over, killing more than 840 passengers and crew. The Eastland Memorial Society explains what happened:

    "...the Eastland's top-heaviness was largely due to the amount and weight of the lifeboats required on her... after the sinking of the Titanic in 1912, a general panic led to the irrational demand for more lifesaving lifeboat capacity for passengers of ships.
    Lawmakers unfamiliar with naval engineering did not realize that lifeboats cannot always save all lives, if they can save any at all. In conformance to new safety provisions of the 1915 Seaman’s Act, the lifeboats had been added to a ship already known to list easily... lifeboats made the Eastland less not more safe..."

There you have it. A well-intentioned safety feature that achieved the very opposite of its intended purpose.

Fast forward to the 21st century. Recently, my colleague Chris Hobbs wrote a whitepaper on how a narrow design approach can subtly work its way into engineering decisions. Here's the scenario he uses for discussion:

    "The system is a very simple, hypothetical in-cab controller (for an equally hypothetical) ATO system running a driverless Light Rapid Transit (LRT) system...
    Our hypothetical controller has already proven itself in Rome and several other locations. Now a new customer is considering it for an LRT ATO in the La Paz-El Alto metropolitan area in Bolivia. La Paz-El Alto has almost 2.5 million inhabitants living at an elevation that rises above 4,100 meters (13,600 ft.—higher than Mount Erebus). This is a significant change in context, because the threat of soft and hard memory errors caused by cosmic rays increases with elevation. The customer asks for proof that our system can still meet its safety requirements when the risk of soft memory errors caused by radiation is included in our dependability estimates..."

So where should the engineer go from here? How can he or she ensure that the right concerns are being addressed? That is what Chris endeavours to answer. (Spoiler alert: The paper determines that, in this hypothetical case, software detection of soft memory errors isn't a particularly useful solution.)

Highly recommended.