Showing posts with label Software programming. Show all posts
Showing posts with label Software programming. Show all posts

6/30/2015

Developing software for safety-critical systems? Have I got a book for you

Chris Hobbs is the only person I know who holds a math degree with a specialization in mathematical philosophy. In fact, before I met him, I didn’t know such a thing even existed. But guess what? That’s one of the things I really like about Chris. The more I hang out with him, the more I learn.

Come to think of it, helping people learn has become something of a specialty for Chris. He is, for example, a flying instructor and the author of Flying Beyond: The Canadian Commercial Pilot Textbook. And, as a software safety specialist at QNX Software Systems, he regularly provides advice to customers building systems that must comply with functional safety standards like IEC 61508, EN 5012x, and ISO 26262.

Chris has already written a number of papers on software safety, some of which I have had the great privilege to edit. You can find several of them on the QNX website. But recently, Chris upped the ante and wrote an entire book on the subject, titled Embedded Software Development for Safety-Critical Systems. The book:

  • covers the development of safety-critical systems under ISO 26262, IEC 61508, EN 50128, and IEC 62304
  • helps readers understand and apply remarkably esoteric development practices and be prepared to justify their work to external auditors
  • discusses the advantages and disadvantages of architectural and design practices recommended in the standards, including replication and diversification, anomaly detection, and so-called “safety bag” systems
  • examines the use of open-source components in safety-critical systems

I haven’t yet had a chance to review the book, but at 358 pages, it promises to be a substantial read.

Interested? Well, you can’t get the book just yet. But you can pre-order it today and get one of the first copies off the press. It’s scheduled for release September 1.

A version of this post appeared in the QNX Auto Blog.

1/19/2015

Breaking up is hard to do

Separation can be painful. But often, the failure to separate can result in even more pain over the long haul.

No, I’m not talking love, marriage, or other affairs of the human heart. I am talking software design. In particular, the design of complex software systems that must perform safety-critical functions. The software, for example, in a medical device, automotive ADAS unit, or train-control system.

In systems like these, separation is critical: software components must be cleanly isolated from one another. Otherwise, you risk the chance that the behavior of one component will inadvertently interfere with the behavior of another. For this reason, component isolation is a key thrust of functional safety standards like IEC 61508 and ISO 26262.

Several forms of interference, all undesirable.
Interference can take many forms. For instance, a component could improperly use file descriptors or flash memory needed by other components. Or it could enter a tight loop under a failure condition and starve a more-critical component of CPU time. Or it could write to the private memory of another component.

You could, of course, run every component on separate hardware. But that becomes an expensive proposition. Moreover, the market trend is toward hardware consolidation, which, for reasons of economy, merges previously discrete systems onto a single platform.

It’s important, then, to embrace software-based separation techniques. These include OS mechanisms to prevent resource deprivation, time starvation, data corruption, and so on. For instance, the adaptive time partitioning provided by the QNX Neutrino OS can ensure that a software component always gets a minimum percentage of CPU time, whenever it needs it. That way, other components can't prevent it from running, either unintentionally or maliciously.

Software separation is as much art as science. In fact, my colleague Yi Zheng goes further than that. She argues that there is as yet no precise methodology for separating system functions. There are no textbooks, no pat answers.

So is separation only a matter of asking the right questions? That would be an oversimplification, of course. Skill also comes into play, as does experience, not to mention a good dose of thoroughness. But really, you should read Yi’s article, “The Art of Separation”, in Electronic Design and judge for yourself.

5/14/2014

The end of software testing? No, not really

Testing: no longer about establishing
the correctness of a system
A few years ago, I penned a whitepaper that contained these words:

    "No amount of testing can fully eliminate the bugs and security holes in a complex software system, as no test suite could possibly anticipate every scenario the system may encounter."

As it turns out, I wasn't whistling dixie. My colleague Chris Hobbs, who has forgotten more about software design that I could hope to learn in multiple lifetimes, notes that:

    "... a modern, pre-emptible, embedded operating system with about 800 assembler instructions in its core has more than 10300 possible internal states. To put this into perspective, the Eddington Number (the number of protons in the observable universe) is about 1080.

Don't know about you, but those numbers far exceed what my brain can grasp. And if that's not enough, the 10300 figure applies only to the OS core — it doesn't account for the huge number of additional states that are introduced when you start running applications and their supporting libraries.

So why bother with testing when you can only hope to exercise, say,
0.00000000000000000000000000000000000000001% of the system's possible states? It all has to do with a concept called confidence from use.

Rather than attempt an explanation here, I invite you to read a paper that Chris has published, titled "Testing as a road to confidence-from-use". Chris not only explores the concept, but discusses the degree to which confidence-from-use data gathered on one version of a system can be applied to a slightly modified version. Recommended for anyone interested in software testing or reliability.

10/15/2013

Striking a balance between reliability and availability

Can you achieve one without
sacrificing the other?
Maybe it's just me, but a lot of people seem to use reliability and availability interchangeably. I often hear people say 99.999% reliability when, in fact, they are referring to availability.

So what is the difference between the two? And why is that difference important? I'm glad you asked. :-)

In a software-based system, availability refers to how often the system responds to events or stimuli in a timely manner; reliability, on the other hand, refers to how often the responses are correct. The distinction can be a matter of life or death. For instance, in some medical devices, it is preferable to have no response (where little or nothing happens to the patient) than a wrong response (where the device harms the patient irreparably). Whereas in other systems, any response of sufficient accuracy or quality may be preferable to no response at all.

But here's the thing. Regardless of whether a system is more sensitive to availability or reliability, it should still take pre-defined (and carefully considered) actions when a dangerous condition arises. For instance, if the control system for a high-speed train fails, it will move to its design safe state, which will probably involve applying the brakes.

So far, so good. The problem is, many systems are components of larger systems. So even when a component is avoiding a genuinely dangerous situation, its behavior may put stress on the larger system and lower that system's availability.

Moreover, the behavior of an overall system when an unanticipated condition occurs can be very difficult to predict, for the simple reason that the system depends on multiple, largely independent, components moving to their design safe states. None of those components, and their safe states, can be considered in isolation. For instance, in 1993, Lufthansa Flight 2904 overran a runway because the reverse thrust deployment system operated exactly to specification. Unfortunately, the system designers hadn't anticipated conditions during a cross-wind landing.

Enough from me. I invite you read the ECN article "Balancing reliability and availability", written by my colleague and senior software developer Chris Hobbs. Chris discusses how it's possible to strike a balance between reliability and availability — and why designing safe software can require the ability and willingness to think from the outside in.

3/11/2013

The isolation imperative: protecting software components in an ISO 26262 system

Software components can be impolite, if not downright delinquent. For instance, a component might:

  • rob other components of CPU time
  • rob other components of file descriptors and other system resources
  • access the private memory of other components
  • corrupt data shared with other components
  • create a deadlock or livelock situation with other components

Shameful, I know. But in all seriousness, this sort of behavior can wreak havoc in a safety-critical system. For instance, let's say that a component starts to perform a CPU-intensive calculation just as the system enters a failure condition. Will that component hog the CPU and prevent an alarm process from running?

The answer, of course, is that it damn well better not.

It becomes important, then, to prevent components from interfering with one another. In fact, this principle is baked into the ISO 26262 functional safety standard for road vehicles, which defines interference as:

    "...the presence of cascading failures from a sub-element with no ASIL [Automotive Safety Integrity Level] assigned, or a lower ASIL assigned, to a sub-element with a higher ASIL assigned leading to the violation of a safety requirement of the element”

To put it crudely, less important stuff can't stop more important stuff from happening.

So how do you prevent interference? One approach is through isolation. For instance, a system may implement spatial isolation between application processes. This would include mechanisms for interprocess communication and interprocess locking that prevent one process from inadvertently affecting another.

Mind you, there are multiple types of interference, so you need to implement multiple forms, or axes, of isolation. Time for a picture:




In general, you need to determine what does, and what doesn't, need to be isolated. You also need to identify which components are apt to be delinquent and build a cage around them to protect more critical components. Which brings me to a recent paper by my inestimable colleagues Chris Hobbs and Yi Zheng. It's titled "Protecting Software Components from Interference in an ISO 26262 System," and it explores techniques that can help you:

  • implement the component isolation required by ISO 26262
  • demonstrate that such isolation has been implemented

And while you're at it, check out the other titles in our "safe" whitepaper series. These include "The Dangers of Over-Engineering a Safe System" and "Ten Truths about Building Safe Embedded Software Systems."

And don't worry: there's nothing delinquent about downloading all of them.

This post originally appeared in the QNX auto blog.

3/07/2013

Can a safety-critical system be over-engineered?

Too much of a good thing?
It's a rhetorical question, of course. But hear me out.

As you can imagine, many safe systems must be designed to handle scenarios outside their intended scope. For instance, in many jurisdictions, passenger elevators must be capable of handling 11 times more weight than their recommended maximum — you just never know what people will haul into an elevator car. So, if the stated limit for a passenger elevator is 2000 pounds, the actual limit is closer to 22,000 pounds. (Do me a favor and avoid the temptation to test this for yourself.)

Nonetheless, over-engineering can sometimes be too much of a good thing. This is especially true when an over-engineered component imposes an unanticipated stress on the larger system. In fact, focusing on a specific safety issue without considering overall system dependability can sometimes yield little or no benefit — or even introduce new problems. The engineer must always keep the big picture in mind.

Case in point: the SS Eastland. In 1915 this passenger ship rolled over, killing more than 840 passengers and crew. The Eastland Memorial Society explains what happened:

    "...the Eastland's top-heaviness was largely due to the amount and weight of the lifeboats required on her... after the sinking of the Titanic in 1912, a general panic led to the irrational demand for more lifesaving lifeboat capacity for passengers of ships.
    Lawmakers unfamiliar with naval engineering did not realize that lifeboats cannot always save all lives, if they can save any at all. In conformance to new safety provisions of the 1915 Seaman’s Act, the lifeboats had been added to a ship already known to list easily... lifeboats made the Eastland less not more safe..."

There you have it. A well-intentioned safety feature that achieved the very opposite of its intended purpose.

Fast forward to the 21st century. Recently, my colleague Chris Hobbs wrote a whitepaper on how a narrow design approach can subtly work its way into engineering decisions. Here's the scenario he uses for discussion:

    "The system is a very simple, hypothetical in-cab controller (for an equally hypothetical) ATO system running a driverless Light Rapid Transit (LRT) system...
    Our hypothetical controller has already proven itself in Rome and several other locations. Now a new customer is considering it for an LRT ATO in the La Paz-El Alto metropolitan area in Bolivia. La Paz-El Alto has almost 2.5 million inhabitants living at an elevation that rises above 4,100 meters (13,600 ft.—higher than Mount Erebus). This is a significant change in context, because the threat of soft and hard memory errors caused by cosmic rays increases with elevation. The customer asks for proof that our system can still meet its safety requirements when the risk of soft memory errors caused by radiation is included in our dependability estimates..."

So where should the engineer go from here? How can he or she ensure that the right concerns are being addressed? That is what Chris endeavours to answer. (Spoiler alert: The paper determines that, in this hypothetical case, software detection of soft memory errors isn't a particularly useful solution.)

Highly recommended.

2/07/2013

10 truths about building safe embedded software systems

I wish I could remember his exact words. But it has been a long time — 20 years — and my memory has probably added words that he never wrote and removed words that he did write. That said, this is how I remember it:

    "We all strive to write bug-free code. But in the real world, bugs can and do occur. Rather than pretend this isn't so, we should adopt a mission-critical mindset and create software architectures that can contain errors and recover from them intelligently."

The "he" in question is my late (and great) colleague Dan Hildebrand. I'm sure that Dan's original sentences were more nuanced and to the point. But the important thing is that he grokked the importance of "culture" when it comes to designing software for safety-critical systems. A culture in which the right attitudes and the right questions, not just the right techniques, are embraced and encouraged.

Which brings me to a paper written by my colleagues Chris Hobbs and Yi Zheng. It's titled "Ten truths about building safe embedded software systems" and, sure enough, the first truth is about culture. I quote:

    "A safety culture is not only a culture in which engineers are permitted to raise questions related to safety, but a culture in which they are encouraged to think of each decision in that light..."

I was particularly delighted to read truth #5, which echoes Dan's advice with notable fidelity:

    "Failures will occur: build a system that will recover or move to its design safe state..."

I also remember Dan writing about the importance of software architectures that allow you to diagnose and repair issues in a field-deployed system. Which brings us to truth #10:

    "Our responsibility for a safe system does not end when the product is released. It continues until the last device and the last system are retired."

Dan argued for the importance of these truths in 1993. If anything, they are even more important today, when so much more depends on software. If you care about safe software design, you owe it to yourself to read the paper.

9/23/2012

Which OS for IEC 62304 medical systems?

The question, to some degree, is rhetorical. I work for an OS company, that company has developed a 62304-compliant OS for medical device manufacturers... you see where this is going.

But don't go yet. This week, my colleague Chris Ault will present a webinar on this very topic, and the content he'll cover should prove useful to anyone choosing an OS for a medical device — or, for that matter, any device that must operate reliably and safely.

In case you're wondering, the Linux question will definitely come up. Linux does lots of things very well, but does it belong in a safety-critical device? Knowing Chris, he'll offer a suitably unambiguous answer — and some solid reasoning to back it up.

Okay, enough from me. To learn more about the webinar, which will be held this
Thursday, September 27, at 2 pm eastern, visit the QNX website.

2/15/2012

Vector's software testing tools now support QNX Neutrino RTOS Certified Plus

Learn how you can become
eligible
to win this cool T-shirt
This just in: Vector Software, a provider of software tools for testing safety-critical embedded applications, has announced that its VectorCast suite now supports QNX Neutrino RTOS Certified Plus, an OS that combines the benefits of the QNX Neutrino RTOS Safe Kernel and the QNX Neutrino RTOS Secure Kernel.

According to the press release, the "The VectorCAST product suite has supported the QNX Neutrino RTOS since 2009... this latest integration helps our customers accelerate time-to-market by streamlining product planning, design, and validation."

QNX Neutrino RTOS Certified Plus offers both IEC 61508 certification at Safety Integrity Level 3 (SIL 3) and Common Criteria ISO/IEC 15408 certification at Evaluation Assurance Level 4+ (EAL 4+). Its certification credentials — combined with its microkernel architecture, POSIX-compliant API, and adaptive partitioning technology — make Certified Plus well-suited to systems that have both functional safety and security requirements.

To read Vector's press release, click here.
 

2/14/2012

Multicore webinar coming to a screen near you

If you're developing software for an embedded system equipped with a multicore processor, have we got a webinar for you.

On Wednesday, February 15, the "two Jeffs" — Jeff Schaffer of QNX and and Jeff Logan of Freescale — will present a webinar on achieving maximum performance on multicore chips. Topics include threading models for creating multiple concurrent tasks, design patterns for increasing parallelism, and techniques for optimizing cache usage.

Did I mention? If you attend the live webinar, you'll be eligible to win a QorIQ P2020RDB-PCA reference design board through Arrow Electronics. Cool, that.

So what are you waiting for? Click the link and register. The webinar happens Wednesday February 15, at 2:00 p.m. EST.
 

1/03/2012

An (automotive) developer's perspective on HTML5

HTML5 may represent the future of automotive infotainment, but it won't exist in isolation. HTML5 apps will, for the foreseeable future, need to communicate with native apps, such as navigation programs written in OpenGL. Some will even need to communicate with lower-level services, such as vehicle bus drivers. The easier these communications can be implemented, the faster automakers can adopt HTML5.

Which brings me to a new installment in the QNX video series on HTML5. In this video, Sheridan Ethier, who manages the company's automotive development team, discusses integration between HTML5 and native apps. He also explains how HTML5 can simplify development efforts and deliver the performance needed for graphically rich infotainment systems.

Roll the tape...


 

12/12/2011

Meet a true "Hiro" of robotic research

If developing next-gen robotics is your thing, Hiro's your man.

A couple of years ago, I introduced you to Hiro, a QNX-based robot designed for research and teaching programs in university labs. Even if you didn't read about Hiro here, he may still seem familiar, what with his appearances on Engadget, übergizmo, and other über-popular sites.

Kawada Industries, the company that created Hiro, describes him as a starter set for research into humanoid robots. To that end, Hiro comes equipped with a stereo vision camera, speech recognition, hands with 15 degrees of freedom, hand-mounted cameras, and a repeat positioning accuracy of less than 20 micrometers — that's 20 one-thousandths of a millimeter.

Since my last post, Kawada has uploaded some videos to demonstrate Hiro's chops. For instance, here's a clip showing how he has all the right moves:



And here's a clip showing how he can listen to voice commands:



If Hiro's role is to serve as a platform for next-gen robotics, he is succeeding. Recently, Osamu Hasegawa, a professor at the Tokyo Insitute of Technology, used Hiro as the basis for a new "thinking" robot. The robot, also dubbed Hiro (confusing, I know), employs a Self-Organizing Incremental Neural Network algorithm to adapt to its environment and learn new tasks.

For instance, in this video, a researcher asks Hiro to pour a cup of water. Hiro has never done this before, but figures out how to do it. That's some algorithm!



For more information on Hiro and his manufacturer, Kawada industries, click here.
 

12/05/2011

LDRA, QNX help medical device developers gear up on IEC 62304 standard

Image courtesy LDRA
Until a few weeks ago, I had never heard of LDRA.

My bad. LDRA has been in business for more than 35 years, developing tools that automate code analysis and software testing for safety-, mission-, security- and business- critical systems. (A lot of hyphens, I know, but did you really want me to say "critical" four times? :-)  In other words, LDRA has been helping systems work reliably for even longer than QNX.

Fortunately, my colleague Bob Monkman isn't as clued out as I am. In fact, he recently got together with LDRA to develop a new webinar, "Optimizing the Development of Certified Medical Devices".

The webinar, which happens this Wednesday at 2:00 p.m. EST, covers several topics, including:
  • Using IEC 62304 development templates
  • Specifying requirements to ensure requirements traceability through all phases of development
  • Leveraging safe design training courses and pre-audit consulting
  • Securing code — 70% of security vulnerabilities rise from programming errors
  • Scheduling code inspections — early inspections eliminate errors
  • Gaining IEC 62304 compliance using qualifiable and certified products from LDRA and QNX
     
Unified tooling
Don't go just yet. I also want to mention that LDRA recently ported their tool suite — which includes tools for lifecycle software testing for all phases of development — to the QNX Momentics Tool Suite and QNX Neutrino RTOS.

This makes for nice integration between LDRA tools and QNX tools. For instance, if the LDRA tool suite identifies a code violation, you can view the error interactively from within the QNX Momentics IDE — no need to switch tooling environment. Good, that.


Using the QNX Momentics IDE to inspect a violation caught by the LDRA tool suite.

To view two full-size screen captures showing LDRA-QNX integration, visit the Hughes Communications website.

And for more details on the LDRA suite for QNX, check out the press release.

11/16/2011

30 years of QNX: Celebrating a decade of Eclipse

Correct me if I'm wrong, but until Eclipse came along, the software industry didn't have a standard platform for developing applications in C and C++. This was certainly true in the embedded market, where almost every OS vendor offered their own proprietary development environment.

As a wise man once said, what a dumb approach. Vendors wasted time reinventing the wheel, when they could have focused on innovative tools that offered real value. Meanwhile, developers had to learn a new toolset every time they worked with a different OS. Doh!

Folks at QNX knew this situation had to change. Which explains why Dan Dodge, the company's CEO, became a founding steward of Eclipse.org, the consortium responsible for creating the Eclipse open-source tooling platform. It also explains why Sebastien Marineau, the company's VP of engineering, became the first project lead of CDT, the C and C++ development environment for Eclipse.

QNX's contribution didn't stop there. The company also donated a large amount of source code and developer time to the CDT project. As a result of these and other community efforts, Eclipse CDT subsequently became the C/C++ platform of choice for IBM, Ericsson, Texas Instruments, and other multi-billion dollar organizations.

Eclipse CDT also formed the basis of a major new QNX product, the QNX Momentics Tool Suite. More importantly, the platform gave QNX more freedom to innovate, particularly when it came to tools for debugging and optimizing multi-core systems. In fact, these multi-core tools garnered several awards, including:
  • Eclipse community awards, best developer tool, 2007
  • EDN China innovation award, 2007
  • Embedded World embedded AWARD, 2006
Here, for example, is a screen capture of the system profiler for the QNX Momentics Suite. The profiler is displaying CPU activity for the 4 cores of a quad-core processor:



Eclipse is ten years old this month. If you're interested in its history, or in crashing an Eclipse birthday party, check out out the Eclipse website.
 

7/18/2011

The new skin is in! Before and after shots of the Corvette head unit

A glimpse of the Corvette's
virtual mechanic, in the
original skin.
A few months ago, the QNX concept development team pimped out a stock Corvette with a multimedia head unit and a digital instrument cluster based on the QNX CAR Application Platform. Now, QNX has always claimed that automakers can easily re-skin the platform with their own look-and-feel. But how easy is it, really?

To answer that question, we created the 30-day UI challenge. In a nutshell, we gave Lixar, a mobile UI company, a month to create new skins for the Corvette's head unit.

But here's the thing: Lixar didn't have any experience using the QNX OS. Nor did they have any experience in the automotive market. If a small team at Lixar could pull this off, the argument went, so could any automotive customer with good UI developers on hand.

I've already posted several articles on this project and received lots of great feedback. One reader even drew cool mockups (see here and here) to show how he would redesign the UI!

Keep in mind, however, that we didn't ask Lixar to re-think the UI, but rather, to re-skin it — to give it a fresh look that captures the spirit of the Corvette. This is exactly the kind of thing an automaker would often want to do: Take an existing UI and tune it to match the brand image of multiple vehicle models. Re-use rather than re-invent.

So without further ado, here are some before and after screenshots of the head unit. I've had to shrink the screenshots to fit the layout of my blog, so they aren't quite as sharp and as smooth as the originals. A fair representation, nonetheless:

Main menu, before...



Main menu, after:





HVAC controls, before...



HVAC controls, after:





MP3 player, before...



MP3 player, after:



I don't know about you, but to me, the new skins seem punchier and easier to navigate, visually speaking. That said, you be the judge.



Before I go, here's the "making of" video filmed to document the project:



 

4/05/2011

Whitepaper: Building functional safety into complex software systems, Part II

Recently, our local town councillor called a neighborhood meeting to discuss the construction of a new apartment building at the top of my street. Halfway through the meeting, one of my neighbors stood up and made this demand:

    “I want a 100% guarantee that the blasting required for this project won’t damage the foundation of my house.”

Let’s face it. When it comes to anything that could threaten our property, our family, or our health, we all want a 100% guarantee. Problem is, the one constant in life is that there are no absolute guarantees.

This rule applies to software as much as to anything else. Just try to create a software system that is both reasonably useful and absolutely dependable. It’s well-nigh impossible. Unfortunately, the same rule also applies to software validation methods: no method is absolutely foolproof. The more complex a software system becomes, the more this rule applies.

A difficult pill to swallow? You bet. But acknowledging it is key to designing a system that successfully achieves functional safety.

Which brings me to Chris Hobbs’ latest paper, “Building Functional Safety into Complex Software Systems, Part II.” For Chris, functional safety must be built into a software system from day one. Moreover, all work should follow from the premise that software always contains faults and that these faults may lead to failures. We must, as a result, include multiple lines of defense when designing a system:

  • isolate safety-critical processes
  • reduce faults
  • prevent faults from becoming errors
  • prevent errors from becoming failures


  • All this begins with the best available expertise and a crystal-clear definition of the system’s dependability requirements — what Chris refers to as “sufficient dependability.” This definition is essential: It not only provides an accurate measure for validating the system’s functional safety, but also eliminates vague (and therefore meaningless) requirements.

    We must also follow rigorous standards and practices throughout the system design and development, and implement a comprehensive validation program that includes not only traditional state-based testing at the module level, but also statistical testing and design verification.

    I’m just scratching the surface of Chris’s paper. For the full story, download the paper here.
     

    3/02/2011

    QNX unveils first RTOS to offer both safety and security certification

    A couple of days ago, I mentioned that QNX always likes to make a big splash at the annual embedded world conference. Well, the big splash for this year is now public: QNX has taken the covers off the first RTOS product to provide both safety and security certification.

    Yesterday, QNX announced QNX Neutrino RTOS Certified Plus, which offers both IEC 61508 certification at Safety Integrity Level 3 (SIL 3) and Common Criteria ISO/IEC 15408 certfication at Evaluation Assurance Level 4+ (EAL 4+).

    The goal of this product is simple: To help developers of railway control systems, medical devices, automotive systems, wind turbines, and other mission-critical applications reduce the time and expense of certifying their end-products.

    You see, safety and security certification at the system level can cost millions of dollars and take years to achieve. Using a pre-certified OS can help cut that cost and accelerate certification efforts. A few operating systems provide safety or security certification, but not both. QNX Neutrino RTOS Certified Plus is the first OS to fill this gap.

    Whitepapers
    Chris Hobbs, a kernel developer at QNX, has authored several papers on creating applications that meet rigorous reliability and functional safety requirements, including IEC 61508 SIL 3. For a list of these papers, see my previous blog post.
     

    2/27/2011

    QNX VP showcases multitasking prowess of BlackBerry PlayBook

    Hey, check out this video of Sebastien Marineau, VP of engineering at QNX, as he explains how the multi-core capabilities of the QNX Neutrino OS allow the BlackBerry PlayBook to run multiple apps simultaneously:



    Technically speaking, QNX Neutrino's advanced support for symmetric multiprocessing, or SMP, makes this multitasking possible. A large variety of systems, including the world's largest Internet routers, have used QNX SMP for well over a decade. Which means that the PlayBook, with its ability to deliver a full web browsing experience, uses the same technology that helps power the Web itself. Now that's pretty cool.
     

    2/07/2011

    Whitepaper: Building functional safety into complex software systems

    My colleague Chris Hobbs writes books, designs software, sings Schubert, teaches pilots, and, if all that isn't enough, pens papers on functional safety. Speaking of which, I've just started reading Chris's latest paper, "Building Functional Safety into Complex Software Systems, Part I," which contains the following anecdote:

      "Thirty-seven seconds after it was launched on June 4 1996, the European Space Agency’s (ESA) new Ariane 5 rocket rained back to earth in pieces. This failure was rather costly: some US $370 million, and a stinging embarrassment for ESA.

      It has become one of the best known instances of software that had been exhaustively tested and even field proven — in this case, more accurately, sky-proven — ceasing to function correctly though it had not been changed. What had changed was the context in which the software ran..."


    This story highlights the paper's thesis: that the functional safety of today’s complex, multi-threaded software systems cannot be validated by traditional, state-based testing alone.

    In theory, such systems are deterministic. And in theory, all of their states and state transitions can be identified. But in practice, these states and transitions are so numerous that they cannot be counted, let alone tested.

    Does this mean we must throw up our collective hands in despair? Not at all, says Chris. He emphasizes that it is still possible to build functionally safe complex software systems — but since I don't want to spoil the story, I'll stop talking now and invite you to read the paper.

    And while you're at it, I invite you to check out other papers Chris has written on safety-critical systems and software:

  • Fault Tree Analysis with Bayesian Belief Networks for Safety-Critical Software


  • Using an IEC 61508-Certified RTOS Kernel for Safety-Critical Systems


  • Protecting Applications Against Heisenbugs


  •  

    8/04/2010

    Protect your software against Heisenbugs

    By definition, Heisenbugs are sensitive to being observed: They appear sporadically during normal operation, but disappear when the developer attempts to track them down in debug mode. The very act of debugging eliminates the subtle timing interactions or other conditions that trigger these bugs into action.

    It's no surprise, then, that Heisenbugs are often difficult, if not impossible, to eradicate. Nonetheless, developers can create applications that are resilient to these maddeningly elusive defects. They can, for example, use virtually synchronous replication (VSR), a technique described in a recent whitepaper by Chris Hobbs, a kernel developer at QNX.

    A few days ago, QNX posted this paper on its website, along with a paper on memory analysis (co-authored by yours truly) and two papers on developing in-car telematics systems. Here they are: