Dependable Mission Critical Software
In this post, Embedded.com – Software for dependable systems, Jack Gannsle introduced me to the book: Software for Dependable Systems–Sufficient Evidence?. It was written by the “Committee on Certifiably Dependable Software Systems” and it’s available for free pdf download.
Despite being written by a committee (blech!), and despite the bland title (yawn), I agree with Jack in that it’s a riveting geek read. It’s understandable to field-hardened practitioners and it’s filled with streetwise wisdom about building dependability into large, mission critical software systems that can kill people or cause massive financial loss if they collapse under stress. Essentially, it says that all the bloated, costly, high-falutin safety and security and certification processes in existence today don’t guarantee squat – except jobs for self-important bureaucrats and wanna-be-engineers. They don’t say it THAT way of course, but that’s my warped and unprofessional interpretation of their message.
Here are a few gems from the 149 page pdf:
As is well known to software engineers, by far the largest class of problems arises from errors made in eliciting, recording, and analysis of requirements.
Undependable software suffers from an absence of a coherent and well articulated conceptual model.
Today’s certification regimes and consensus standards have a mixed record. Some are largely ineffective, and some are counterproductive. (<- This one is mind blowing to me)
The goal of certifiably dependable software cannot be achieved by mandating particular processes and approaches regardless of their effectiveness in certain situations.
In addition to lampooning the “way things are currently done” for certifying software-centric dependability, the committee dudes actually make some recommendations for improving the so-called state of art. Stunningly, they don’t prescribe yet another costly, heavyweight process of dubious effectiveness. They recommend any process comprised of best practices; as long as there is scrutable connectivity from phase to phase and from start to end to “preserve the chain of evidence” for a claim of dependability that vendors of such software should be required to make. Where there is a gap between links in the chain of scrutability, they recommend rigorous analysis to fill it.
To make the transition to the new mindset of scrutable connectivity, they say that these skills, which are rare today and difficult to acquire, will be required in the future:
- True systems thinking (not just specialized, localized, algorithmic thinking that’s erroneously praised as systems thinking by corpocracies) of the properties of the system as a whole and the interactions among its components.
- The art of simplifying complex concepts, which is difficult to appreciate since the awareness of the need for simplification usually only comes (if it DOES come at all) with bitter experience and the humility gained from years of practice.
Drum roll please, because my absolute favorite entry in the book, which tugs at my heart, is as follows:
To achieve high levels of dependability in the foreseeable future, striving for simplicity is likely to be by far the most cost-effective of all interventions. Simplicity is not easy or cheap but its rewards far outweigh its costs.
That passage resonates deeply with me because, even though I’m not good at it, that’s what my primary professional goal has been for 20+ years. Clueless companies that put complexifying and obfuscating experts that nobody can understand up on a pedestal, deserve what they get:
- incomprehensible, unmaintainable, and undependable products
- a disconnected and apathetic workforce
- low (if any) profit margins.
As my Irish friend would say, they are all fecked up. They’re innocent and ignorant, but still fecked up.


Another part that interests me is the testing of software. Most software is tested to the requirements. In mission critical software how do you know if the requirements cover all the situations the software may be in. (page 65 Airbus crash in Warsaw) Most engineering failures until now have been mechanical ones – bridges falling, roofs collapsing, engines falling off. These are all investigated and processes are put in place to lessen the possibility of the same problem occurring again. Since software has less standardization (like bolt strength and such) than other engineering disciples how will any safety updates be applied? This is an area that the software developers should begin to address because of the use of software in more mission critical areas of life.
There are follow-on topics in this area that are interesting to look at. I found that it takes talent and time to develop a simple understandable design. The problem is in the software world many managers want to see results (i.e. coding and testing) not someone designing. So I have seen software developers start coding before the design is ready, then getting credit for mountains of code, the result of the this effort being a hard to maintain buggy mess. Then these same people get more credit for fixing the bugs. Software is still measured by the number of lines and developers are rewarded for the same.
The first paragraph of my comment was cut, I guess my comment was too long. Here is my first paragraph.
This area of software development is of great interest to me so I down loaded the book you linked to and read it. I am a software engineer/code monkey (whatever gets the job and the girl) in a company that makes mission critical products. I find that most software starts out as a simple design and evolves in complexity, to a ball of mud. (http://codinghorror.com/blog/archives/001003.html or http://www.laputan.org/mud/). Poor requirements to start with causes the software to evolve faster into the mud ball. So having good requirements up front delays the evolution.
Thanks for sharing the gr8 article. I also found this article worth while.
http://www.experts-linked.com/content/runtime-error-critical-softwares
Thanks Anshu