Archive

Archive for the ‘technical’ Category

TDD Overhype

November 8, 2011 Leave a comment

There is much to like about unit-level testing and its extreme cousin, Test Driven Design (TDD). However, like with any tool/technique/methodology, beware of overhyping by snake oil salesman.

Cédric Beust is the author of the book, “Next Generation Java Testing“. He is also the founder and lead developer of TestNG, the most widely used Java testing framework outside of JUnit. In a Dr. Dobb’s guest post titled “Breaking Away From The Unit Test Group Think”, I found it refreshing that a renowned unit testing expert, Mr. Beust, wrote about the down side of the current obsession with unit testing:

  • Unit tests are a luxury. Sometimes, you can afford a luxury; and sometimes, you should just wait for a better time and focus on something more essential — such as writing a functional test.
  • There are two specific excesses regarding code coverage: focusing on an arbitrary percentage and adding useless tests.
  • TDD code also tends to focus on the very small picture: relentlessly testing individual methods instead of thinking in terms of classes and how they interact with each other. This goal is further crippled by another precept called YAGNI, (You Aren’t Going to Need It), which suggests not preparing code for features that you don’t immediately need.
  • Obsessing over numbers in the absence of context leads developers to write tests for trivial pieces of their code in order to increase this percentage with very little benefit to the end user (for example, testing getters).

Being a designer and developer of real-time, multi-threaded code that runs 24 X 7, I’ve found that unit testing is not nearly as cost effective as functional testing. As soon as I can, I get a skeletal instance of the (non-client-server) system that I’m writing up and running and I pump deterministic, canned data streams through the system from end-to-end to find out how the beast is behaving.  As I add functionality to the code base, I rerun the data streams through the system again and again and I:

  • try to verify that the new functionality inserted into the threading structure works as expected
  • try to ensure that threads don’t die,
  • try to verify that there are no deadlocks or data races in the quagmire

If someone can show me how unit testing helps with these issues, I’m all ears. No blow hard pontificating, please.

Persistent Discomfort

November 3, 2011 Leave a comment

As part of the infrastructure of the distributed, multi-process, multi-threaded system that my team is developing, a parameterized, mutex protected, inter-thread message queue class has been written and dropped into a general purpose library. To unburden application component developers from having to do it, the library-based queue class manages a reusable pool of message buffers that functionally “flow” from one thread to the next.

On the “push” side of the queue, usage is as follows:

  • Thread acquires a handle to the next empty Message buffer
  • Thread fills Message buffer
  • Thread returns handle to the queue (push)

On the “pop” side of the queue, usage is as follows:

  • Thread acquires a handle to the next full Message buffer (pop)
  • Thread processes the Message content
  • Thread returns handle to the queue

So far, so good, right? I thought so too – at the beginning of the project. But as I’ve moved forward during the development of my application component, I’ve been experiencing a growing and persistent discomfort. D’oh!

Using the figure below, I’m gonna share the cause of my “inner thread” discomfort with you.

In order to functionally process an input message and propagate it forward, the inner thread must do the following work:

  • Acquire a handle to the next input Message buffer from queue 1 (pop)
  • Acquire a handle to the next empty output Message buffer from queue 2
  • Utilize the content of the Message from queue 1 to compute/fill in the Message to queue 2
  • Return the handle of the input message to queue 1
  • Return the handle of the output message to queue 2 (push)

For small messages and/or when the messages are of different types, I don’t see much wrong with this inter-thread message passing approach. However, when the messages are big and of the same type, my discomfort surfaces. In this case (as we shall see), the “utilize” bullet amounts to an unnecessary copy. The more “inner” threads there are in the pipeline, the more performance degradation there is from unnecessary copies.

So, how can the copies be eliminated and system performance increased? One way, as the figure below shows, is to move message buffer management responsibility out of the local queue class and into a global, shared message pool class.

In this memory-less queue design, the two pipeline end point threads explicitly assume the responsibility of acquiring and releasing the Message buffer handles from the mutex protected, shared message pool. The first thread “acquires” and the last thread “releases” message buffer handles. Each inner thread, i, in the pipeline performs the following work:

  • Pop the handle to the next input Message buffer from queue i-1
  • Process the message
  • Push the Message buffer handle to queue i

The key to avoiding unessential inner thread copies is that the messages must be intentionally designed to be of the same type.

As soon as I get some schedule breathing room (which may be never), I’m gonna refactor my application infrastructure design and rewrite the code to implement the memoryless queue + global message pool approach. That is, unless someone points out a fatal flaw in my reasoning and/or presents a superior inter-thread message communication pattern.

Is It Safe?

November 1, 2011 Leave a comment

Assume that we place a large, socio-technical system designed to perform some safety-critical mission into operation. During the performance of its mission, the system’s behavior at any point in time (which emerges from the interactions between its people and machines), can be partitioned into 2 zones: safe and unsafe.

Since the second law of dynamics, like all natural laws, is impersonal and unapologetic, the system’s behavior will eventually drift into the unsafe system behavior zone over time. Combine the effects of the second law with poor stewardship on behalf of the system’s owner(s)/designer(s), and the frequency of trips into scary-land will increase.

Just because the system’s behavior veers off into the unsafe behavior zone doesn’t guarantee that an “unacceptable” accident will occur. If mechanisms to detect and correct unsafe behaviors are baked into the system design at the outset, the risk of an accident occurring can be lowered substantially.

Alas, “safety“, like all the other important, non-functional attributes of a system, is often neglected during system design. It’s unglamorous, it costs money, it adds design/development/test time to the sacred schedule. and the people who “do safety” are under-respected. The irony here is that after-the-accident, bolt-on, safety detection and control kludges cost more and are less trustworthy than doing it right in the first place.

Design Disclosure

October 31, 2011 Leave a comment

Recently, I had to publicly disclose the design of the multi-threaded CSCI (Computer Software Configuration Item) that I’m currently bringing to life with a small group of colleagues. The figure below shows the entities (packages, CSCs (Computer Software Components), Classes) and the inter-entity relationship schema that I used to create the CSCI design artifacts.

As the figure illustrates, the emerging CSCI contains M packages and K top level CSCs . Each package contains from 1 to N 2nd level CSCs that associate with each other via the “communicates” (via messages) relationship. Each CSC is of the “passive” or “active” class type, where “active” means that the CSC executes within its own thread of control.

Using the schema, I presented the structural and behavioral aspects of the design as a set of views:

Like any “real” design effort (and unlike the standard sequential design-code-test mindset of “authorities“), I covertly used the incremental and iterative PAYGO methodology (design-a-little, code-a-little, test-a-little, document-a-little) in various mini sequences that, shhhh – don’t tell any rational thinker, just “felt right” in the moment.

As we speak, the effort is still underway and, of course, the software is 90% done. Whoo Hoo! Only 10 more percent to go.

Unfriendly Fire

October 29, 2011 2 comments

In Nancy Leveson’s new book, “Engineering A Safer World“, she analyzes (in excruciating detail) all the little screw-ups that occurred during an accident in Iraq where two F-15 fighters shot down two friendly black hawk helicopters – killing all aboard. To set the stage for dispassionately explaining the tragedy, Ms. Leveson provides the following hierarchical command and control model of the “system” at the time of the fiasco:

Holy shite! That’s a lot of levels of “approval required, no?

In typical BD00 fashion, the dorky figure below dumbs down and utterly oversimplifies the situation so that he can misunderstand it and jam-fit it into his flawed UCB mental model. Holy shite! That’s still a lot of levels of “ask me for permission before you pick your nose“, no?

So, what’s the point here? It’s that every swingin’ dick wants to be an esteemed controller and not a low level controlleee. Why? Because….

“Work is of two kinds: first, altering the position of matter at or near the earth’s surface relatively to other such matter; second, telling other people to do so. The first kind is unpleasant and ill paid; the second is pleasant and highly paid.” – Bertrand Russell

People who do either kind of work can be (but perhaps shouldn’t be) judged as bozos, or non-bozos. The bozo to non-bozo ratio in the “pleasant” form of work is much higher than the “unpleasant” form of work. – BD00

Shocking!

October 28, 2011 1 comment

In the brilliant YouTube video, “An Introduction to Erlang (for Python programmers) “, the narrator presents the following facts about Erlang that are shocking to “normal” programmers trained in object-oriented languages:

So, why would anyone want to learn Erlang? Because….

In my case, the fact that concurrency, distribution, fault-tolerance, and hot-swapping are built in to the language make me a huge Erlang fan. On the downside, I’m not a fan of “managed” languages.  However, Erlang’s positives far outweigh its negatives. What do you think?

Awhile ago, I started a focused effort to learn how to program idiomatically in Erlang. However, because:

1 I’m a slooow learner.

2 C++/Java are the languages that my company’s products are written in.

3 ……

I’ve abandoned  the “deep dive” into the language. However, I’m definitely still watching from the periphery because my gut tells me that Erlang will continue its rise into prominence.

Dysfunctional Interactions

October 27, 2011 Leave a comment

In “Engineering A Safer World“, Nancy Leveson states that dysfunctional interactions between system parts play a bigger role in accidents than individual part failures. Relative to yesterday’s systems, today’s systems contain many more parts. But because of manufacturing advances, each part is much more reliable than it used to be.

A consequence of adding more parts to a system is that the numbers of potential connections and interactions between parts starts exploding fast. Hence, there’s a greater chance of one dysfunctional interaction crashing the whole system – even whilst the individual parts and communication links continue to operate reliably.

Even with a “simple” two part system, if its designed-in purpose requires many rich and interdependent interactions to be performed over the single interface, watch out. A single dysfunctional interaction can cause the system to seize up and stop producing the emergent behavior it was designed to provide:

So, what’s the lesson here for system designers? It’s two-fold. Minimize the number of interfaces in your design and, more importantly, limit the number, types, and exchanges over each interface to only those that are required to fulfill the system’s purpose. Of course, if no one knows what’s required (which is the number one cause of unsuccessful systems), then you’re hosed no matter what. D’oh!

Out With The Old And In With The New

October 26, 2011 Leave a comment

Nancy Leveson has a new book coming out that’s titled “Engineering A Safer World” (the full draft of the book is available here: EASW). In the beginning of the book, Ms. Leveson asserts that the conventional assumptions, theory, and techniques (FMEA == Failure Modes and Effects Analysis, Fault Tree Analysis == FTA, Probability Risk Assessment == PRA) for analyzing accidents and building safe systems are antiquated and obsolete.

The expert, old-guard mindset in the field of safety engineering is still stuck on the 20th century notion that systems are aggregations of relatively simple, electro-mechanical parts and interfaces. Hence, the steadfast fixation on FMEA, FTA, and PRA. On the contrary, most of the 21st century safety-critical systems are now designed as massive, distributed, software-intensive systems.

As a result of this emerging, brave new world, Ms. Leveson starts off her book by challenging the flat-earth assumptions of yesteryear:

Note that Ms. Leveson tears down the former truth of reliability == system_safety. After proposing her set of new assumptions, Ms. Leveson goes on to develop a new model, theory, and set of techniques for accident analysis and hazard prevention.

Since the subject of safety-critical systems interests me greatly, I plan to write more about her novel approach as I continue to progress through the book. I hope you’ll join me on this new learning adventure.

The Ultimate External Controller

October 21, 2011 Leave a comment

Control theory postulates that in order to make effective goal-seeking decisions, one of the requirements imposed on an external controller, human or otherwise, is that it contain an accurate model of the “real” process to be controlled.

When an external controller’s process model matches the “real” process taking place, effective control may be, but isn’t guaranteed to be, achieved. When there’s a mismatch, all hope is lost.

The ultimate external controller is no external controller. It’s intimately integrated within, and distributed across, the process itself.

Invisible, Thus Irrelevant

October 19, 2011 Leave a comment

A system that has a sound architecture is one that has conceptual integrity, and as (Fred) Brooks firmly states, “conceptual integrity is the most important consideration in system design”. In some ways, the architecture of a system is largely irrelevant to its end users. However, having a clean internal structure is essential to constructing a system that is understandable, can be extended and reorganized, and is maintainable and testable.

The above paragraph was taken from Booch et al’s delightful “Object-Oriented Analysis And Design With Applications“. BD00’s version of the bolded sentence is:

In some ways, the architecture of a system is largely irrelevant to its end users, its developers, and all levels of management in the development organization.

If the architecture is invisible because of the lack of a lightweight, widely circulated, communicated, and understood, set of living artifacts, then it can’t be relevant to any type of stakeholder – even a developer. As the saying goes: “out of site, out of mind“.

Despite the long term business importance of understandability, extendability, reorganizability, maintainability, and testability, many revenue generating product architectures are indeed invisible – unlike short term schedulability and budgetability; which are always highly visible.