dds | Bulldozer00's Blog

Tickled Pink

April 22, 2013 bulldozer00 Leave a comment

Since I’m a fan of DDS, I was tickled pink to see this newsletter appear in my mailbox:

Since the above graphic is a bitmap, clicking on the links won’t work. In case you do want to read the linked-to pages, here they are:

Categories: technical Tags: corba, Data Distribution Service, dds, Distributed computing, rti

I recently dug up an old DDS tutorial pitch by distributed system middleware expert extraordinaire, Doug Schmidt. The last slide in the pitch shows a side-by-side, high-level feature comparison of CORBA and DDS:

High performance middleware technologies like CORBA and DDS are big, necessarily complex beasts that have high learning curves. Thus, I’m not so sure I agree with Doug’s assessment that complex software systems (like sensor-based command and control systems) need both. One can build a pub-sub mechanism on top of CORBA (using the notification, event, or messaging services) and one can build a client-server, request-response mechanism on top of DDS (using the TCP/IP-like “reliability” QoS). What do you think about the tradeoff? Fill the holes yourself with a tad of home-grown infrastructure code, or use both and create a two-headed, fire-breathing dragon?

Categories: technical Tags: corba, dds, Doug Schmidt, linkedin, middleware, software architecture

Message-Centric Vs. Data-Centric

August 13, 2012 bulldozer00 Leave a comment

The slide below, plagiarized from a recent webinar presented by RTI Inc’s CEO Stan Schneider, shows the evolution of distributed system middleware over the years.

At first, I couldn’t understand the difference between the message-centric pub-sub (MCPS) and data-centric pub-sub (DCPS) patterns. I thought the difference between them was trivial, superficial, and unimportant. However, as Stan’s webinar unfolded, I sloowly started to “get it“.

In MCPS, application tier messages are opaque to to the middleware (MW). The separation of concerns between the app and MW tiers is clean and simple:

In DCPS systems, app tier messages are transparent to the MW tier – which blurs the line between the two layers and violates the “ideal” separation of concerns tenet of layered software system design. Because of this difference, the term “message” is superceded in DCPS-based technologies (like the OMG‘s DDS) by the term “topic“. The entity formerly known as a “message” is now defined as a topic sample.

Unlike MCPS MW, DCPS MW supports being “told” by the app tier pub-sub components which Quality Of Service (QoS) parameters are important to each of them. For example, a publisher can “promise” to send topic samples at a minimum rate and/or whether it will use a best-effort UDP-like or reliable TCP-like protocol for transport. On the receive side, a subscriber can tell the MW that it only wants to see every third topic sample and/or only those samples in which certain data-field-filtering criteria are met. DCPS MW technologies like DDS support a rich set of QoS parameters that are usually hard-coded and frozen into MCPS MW – if they’re supported at all.

With smart, QoS-aware DCPS MW, app components tend to be leaner and take less time to develop because the tedious logic that implements the QoS functionality is pushed down into the MW. The app simply specifies these behaviors to the MW during launch and it gets notified by the MW during operation when QoS requirements aren’t being, or can’t be, met.

The cost of switching from an MCPS to a DCPS-based distributed system design approach is the increased upfront, one-time, learning curve (or more likely, the “unlearning” curve).

RTI Hosts “The Single Most Important Decision in Designing Your Distributed System” Complimentary Webinar (sys-con.com)

Categories: technical Tags: Data Distribution Service, dds, Distributed computing, linkedin, middleware, rti, software architecture, Stan Schneider

Centralized, Federated, Decentralized

June 13, 2011 bulldozer00 Leave a comment

1 Prelude

A colleague on LinkedIn.com pointed me toward this Doug Schmidt, et al, paper: “Evaluating Technologies for Tactical Information Management in Net-Centric Systems“. In it, Doug and crew qualitatively (scalability, availability, configurability) and quantitatively (latency, jitter) evaluate three different architectural implementations of the Object Management Group‘s (OMG) Data Distribution Service (DDS): centralized, federated, and decentralized.

The stacked trio of figures below model the three DDS architecture types. They’re slightly enhanced renderings of the sketches in the paper.

2 Quantitative Comparisons

DDS was specifically designed to meet the demanding latency and jitter (the standard deviation of latency) performance attributes that are characteristic of streaming, Distributed Real-Time Event (DRE) systems like defense and air traffic control radars. Unlike most client-server, request-response systems, if data required for human or computer decision-making is not made available in a timely fashion, people could die. It’s as simple and potentially horrible as that.

Applying the systems thinking idiom of “purposeful, selective ignorance“, the pics below abstract away the unimportant details of the pics above so that the architecture types can be compared in terms of latency and jitter performance.

By inspecting the figures, it’s a no brainer, right? The steady-state latency and jitter performance of the decentralized and centralized architectures should exceed that of the federated architecture. There is no “middleman“, a daemon, for application layer data messages to pass through.

Sure enough, on their two node test fixture (I don’t know why they even bothered with the one node fixture since that really isn’t a “distributed” system in my mind) , the Schmidt et al measurements indicate that the latency/jitter performance of the decentralized and centralized architectures exceed that of the federated architecture. The performance difference that they measured was on the order of 2X.

3 Qualitative Comparisons

In all distributed systems, both DRE and Client-Server types, achieving high operational availability is a huge challenge. Hell, when the system goes bust, fuggedaboud the timeliness of the data, no freakin’ work can get done and panic can and usually does set in. D’oh!

With that scary aspect in mind, let’s look at each of the three architectures in terms of their ability to withstand faults.

3.1 Decentralized Architecture Availability

In a decentralized architecture, there are no invasive daemons that “leak” into the application plane so we can’t talk about daemon crashes. Thus, right off the bat we can “arguably” say that a decentralized architecture is more resistant to faults than the federated or centralized architectures.

As the picture below shows, when a user application layer process dies, the others can continue to communicate with each other. Depending on what the specific application is required to do during operation, at least some work may be able to still get accomplished even though one or more app components go kaput.

3.2 Federated Architecture Availability

In a federated architecture, when a daemon process dies, a whole node and all the subscriber application user processes running on it are severed from communicating with the user processes running on the other nodes (see the sketch below) in the system. Thus, the federated architecture is “arguably” less fault tolerant than the decentralized and (as we’ll see) centralized architectures. However, through judicious “allocation” of user processes to nodes (the fewer the better – which sort of defeats the purpose of choosing a federation for per node intra-communication performance optimization), some work still may be able to get accomplished when a node’s daemon crashes.

If the node daemons stay viable but a user application dies, then the behavior of a federated architecture, DDS-based system is the same as that of a decentralized architecture

3.3 Centralized Architecture Availability

Finally, we come to the robustness of a centralized DDS architecture. As shown below, since the single daemon overlord in the system is not (or should not be) involved in inter-process application layer data communications, if it crashes, then the system can continue to do its full workload. When a user process crashes instead of, or in addition to, the daemon, then the system’s behavior is the same as a decentralized architecture.

4 BD00 Commentary

Because he works on data streaming DRE radar systems, Jimmy likes, I mean BD00 likes, the DDS pub-sub architectural style over broker-based, distributed communication technologies like C/S CORBA and JMS queues. It should be obvious that the latter technologies are not a good match for high availability and low latency DRE applications. Thus, trying to jam fit a new DRE application into a CORBA or JMS communication platform “just because we have one” is a dumb-ass thing to do and is sure to lead to high downstream maintenance costs and a quicker route to archeosclerosis.

Within the DDS space, BD00 prefers the decentralized architecture over the federated and centralized styles because of the semi-objective conclusions arrived at and documented in this post.

Categories: technical Tags: concurrency, Data Distribution Service, dds, Doug Schmidt, DRE systems, linkedin, Object Management Group, postaday2011, software architecture

Using The New C++ DDS API

June 10, 2011 bulldozer00 2 comments

PrismTech‘s Angelo Corsaro is a passionate and tireless promoter of the OMG’s Data Distribution Service (DDS) distributed system communication middleware technology. (IMHO, the real power of DDS over other messaging services and messaging-based languages like Erlang (which I love) is the rich set of Quality of Service (QoS) settings it supports). In his terrific “The Present and Future of DDS” pitch, Angelo introduces the new, more streamlined, C++ and Java DDS APIs.

The UML activity diagram below illustrates the steps for setting up and sending (receiving) topic samples over DDS: define a domain; create a domain participant; create a QoS configured, domain-resident publisher (or subscriber); create a QoS configured and type-safe topic; create a QoS configured, publisher-resident, and topic-specific dataWriter; and then go!

Concrete C++ implementations of the activity diagram, snipped from Angelo’s presentation, are presented below. Note the incorporation of overloaded insertion operators and class templates in the new API. Relatively short and sweet, no?

Even though the sample code shows non-blocking, synchronous send/receive usage, DDS, like any other communication service worth its salt, provides API support for blocked synchronous and asynchronous notification usage.

So, what are you waiting for? Skidaddle over to PrismTech’s OpenSplice DDS page, download the community edition libraries for your platform, and start experimenting!

Categories: C++, uml Tags: Angelo Corsaro, c++, concurrency, dds, linkedin, OpenSplice DDS, postaday2011, PrismTech, programming

New DDS Vendors

December 16, 2010 bulldozer00 Leave a comment

Since I’m a fan of the DDS (Data Distribution Service) inter-process communication infrastructure for distributed systems, I try to keep up with developments in the DDS space. Via an Angelo Corsaro slideshare presentation, I discovered two relatively new commercial vendors of the high performance, low latency, OMG! messaging standard: Twin Oaks Computing and Gallium Visual Systems. I don’t know how long the newcomers have been pitching their DDS implementations or how mature their products are, but I’ll be learning more about them in the weeks to come.

Last week, the four DDS vendors got together at the OMG DDS meeting in CA and they collaboratively executed 9 distributed system test scenarios to highlight the interoperability of the vendors’ products. The Angelo Corsaro slide snippets below show the conceptual context and the goals of the event.

The VP of marketing at RTI, Dave Barnett, published a short video showing that the demo was a success: DDS Interoperability Demo. Since I’m a distributed, real-time software developer geek, this stuff makes me giddy with enthusiasm. Sad, no?

Categories: technical Tags: Data Distribution Service, dds, linkedin, Object Management Group, Real-Time Innovations

OMG! Design By Committee

November 5, 2010 bulldozer00 4 comments

In Federico Biancuzzi’s terrific “Masterminds Of Programming“, Federico interviews the three Amigo co-creators of UML. In discussing the “advancement” of the UML after the Amigos freely donated their work to the OMG for further development, Jim Rumbaugh had this to say:

The OMG (Object Management Group) is a case study in how political meddling can damage any good idea. The first version of UML was simple enough, because people didn’t have time to add a lot of clutter. Its main fault was an inconsistent viewpoint—some things were pretty high-level and others were closely aligned to particular programming languages. That’s what the second version should have cleared up. Unfortunately, a lot of people who were jealous of our initial success got involved in the second version. – Jim Rumbaugh

LOL! Following up, Jim landed a second blow:

The OMG process allowed all kinds of special interests to stuff things into UML 2.0, and since the process is mainly based on consensus, it is almost impossible to kill bad ideas. So UML 2.0 became a bloated monstrosity, with far too much dubious content, and still no consistent viewpoint and no way to define one. – Jim Rumbaugh

Double LOL!

Another UML co-creator, Grady Booch, says essentially the same thing but without specifically mentioning the OMG cabal:

UML 2.0 to some degree, and I’ll say this a little bit harshly, suffered a bit of a second system effect in that there were great opportunities and special interest groups, if you will, clamoring for certain specific features which added to the bloat of UML 2.0. – Grady Booch

Triple LOL!

Mitchi Henning, a key player during the CORBA era, rants about the OMG in this controversial “The Rise And Fall Of CORBA” article. Mitchi enraged the corbaholic community by lambasting both CORBA and the dysfunctional OMG politburo that maintains it:

Over the span of a few years, CORBA moved from being a successful middleware that was hailed as the Internet’s next-generation e-commerce infrastructure to being an obscure niche technology that is all but forgotten. This rapid decline is surprising. How can a technology that was produced by the world’s largest software consortium fall from grace so quickly? Many of the reasons are technical: poor architecture, complex APIs, and lack of essential features all contributed to CORBA’s downfall. However, such technical shortcomings are a symptom rather than a cause. Ultimately, CORBA failed because its standardization process virtually guarantees poor technical quality. Seeing that other standards consortia use a process that is very similar, this does not bode well for the viability of other technologies produced in this fashion. – Mitchi Henning

Maybe the kings and queens of the OMG should add an exclamation point to the end of their acronym: OMG!

The reason the OMG! junta interests me is because I’ve been working hands-on with RTI‘s implementation of the OMG Data Distribution Service (DDS) standard to design and build the infrastructure for a distributed sensor data processing server that will be embedded in a safety-critical supersystem. At this point in time, since DDS was co-designed, tested, and fielded by two commercial companies and it wasn’t designed from scratch by a big OMG committee, I think it’s a terrific standard. Particularly, I think RTI’s version is spectacular relative to the other two implementations that I know about. I hope the OMG! doesn’t transform DDS into an abomination………

Categories: technical, uml Tags: committee, Common Object Request Broker Architecture, corba, Data Distribution Service, dds, linkedin, Object Management Group, omg, organizational behavior, rti, uml

Communication Layer Performance Benchmarking

October 30, 2010 bulldozer00 1 comment

Along with two outstanding and dedicated peers, I’m currently designing and writing (in C++) a large, distributed, multi-process, multi-threaded, scalable, real-time, sensor software system. Phew, that’s a lot of “see how smart I am” techno-jargon, no?

Since the performance and reliability of the underlying Inter-Process Communication (IPC) layer is critical to meeting our customer’s end-to-end system latency and throughput requirements, we decided to measure the performance of three different IPC candidates:

Real Time Innovations Inc.’s implementation of the Object Management Group’s Data Distribution Service (DDS) standard
The Apache Software Foundation’s ActiveMQ implementation of the Java Messaging System (JMS) standard
A homegrown brew built on top of the Boost Organization‘s Asio (Asychronous input output) portable C++ library.

The figure below shows the average CPU load vs throughput performance of the three distributed system messaging communication candidates. Notice that the centralized broker-based JMS approach yielded horrendous relative results.

Transmit batching, along with a whole bevy of “free” (to application layer programmers) tunable features in RTI’s DDS, consists of aggregating a bunch of application layer messages into one network packet to increase the throughput (at the expense of increased latency). Since batching isn’t available in AMQ JMS or our “homegrown” Boost.Asio comm layer candidate, only the DDS performance increase is shown on the graph.

Measurement Approach

One way to measure the CPU load imposed on a processor node by an IPC layer candidate in a data streaming, real-time, system is to quantize time into discrete slices and measure the per slice processing time that it takes to send a fixed number of messages out via the comm software stack. Since other non-deterministic OS runtime functionality shares the CPU with the application processes and the comm software stack, measuring and averaging the normalized CPU time across a large number of slices can give some quantitative feel for the load imposed on the processor.

The figure below shows the approach that was taken to measure the CPU load versus throughput performance of the three communication layer candidates. To implement this strategy, I wrote a small C++ test application that is designed to operate in a time sliced mode, where the time slice size (default = 50 msecs) is user selectable via the command line.

During runtime, the test app generates and publishes a stream of “canned” messages at a user specified rate and for a user-defined test run duration. Upon the start of each time slice, the current time is “grabbed” and stored for later use. At the end of each tight, K-message, generate-and-publish loop, the end time is retrieved from the OS and then the percent CPU load for the slice is calculated in accordance with the simple equations below. At the end of the test run, the first 1000 sample points are averaged, and the result, along with the max and min loads measured during the run are printed to the console and a date stamped log file.

Of course, to ensure that the comm layer candidate wasn’t dropping or corrupting application messages during test runs, I wrote a subscriber app to provide a “resistor load” on the performance measuring publisher app process. By comparing the number and integrity of messages received to the number and integrity of those transmitted, the measurements were given higher credibility. The figure below shows the test fixtures that I ran the performance tests on. For the AMQ JMS candidate, a broker process was running along side of the app processes, but his single-point-of-failure component is not shown in the diagram.

Categories: technical Tags: ActiveMQ, Boost.Asio, dds, jms, linkedin, rti, software architecture, software development

ICONIX SysML Training Preview

January 18, 2010 bulldozer00 3 comments

During the week of January 25-29, I, along with 19 other enginerds will be attending a SysML training course given by Doug Rosenberg of ICONIX. At this point in time, I think the specific course is named “SysML JumpStart Training with Enterprise Architect“. I love the following “no-candyasses-allowed” warning at the bottom of the web page:

JumpStart Training is not a class for uncommitted students or uneasy beginners. Because of the rapid-learning atmosphere, JumpStart Training is ideal for those looking for an intense and highly-focused training course which will get the project up and running, fast!

I’m grateful for the opportunity and excited to attend the course because I’m a big fan of both the UML and its SysML profile. Over the past several years, I’ve been teaching myself at a leisurely pace how to apply these two increasingly important technologies to the software design and specification work that I do.

Because of the powerful and sprawling nature of the UML family, unless you’re an Einsteinian genius, it’s my uncredentialed opinion that you can’t learn UML or SysML overnight. The symbology, syntax, and semantics of the languages are rich and necessarily complex because they’re designed to tackle big hairball software-centric systems problems. Based on my personal experience, I don’t think very many people can acquire a deep understanding of the UML’s 13 diagrams or SysML’s 9 diagrams in 5 days, but you gotta start somewhere and formal training is a good start. We’ll see how it goes.

While surfing the ICONIX web site, I stumbled upon this “MDG Technology For DDS” page (MDG = Model Driven Generation, DDS = Data Distribution Service). Interestingly, there seems to be no equivalent “MDG Technology For CORBA” page on the ICONIX web site. It’s interesting because I’m currently fighting a CORBA vs DDS battle in house. It’s even more interesting in that this InformIT author page states that Doug Rosenberg has previously developed and taught a CORBA class. I’ll be sure to ask Doug about this anomaly when I meet him.

Categories: sysml Tags: corba, dds, ICONIX, linkedin, sysml, uml

Data-Centric, Transaction-Centric

December 24, 2009 bulldozer00 Leave a comment

The market (ka-ching!, ka-ching!) for transaction-centric enterprise IT (Information Technology) systems dwarfs that for real-time, data-centric sensor control systems. Because of this market disparity, the lion’s share of investment dollars is naturally and rightfully allocated to creating new software technologies that facilitate the efficient development of behemoth enterprise IT systems.

I work in an industry that develops and sells distributed, real-time, data-centric sensor systems and it frustrates me to no end when people don’t “see” (or are ignorant of) the difference between the domains. With innocence, and unfortunately, ignorance embedded in their psyche, these people try to jam-fit risky, transaction-centric technologies into data-centric sensor system designs. By risky, I mean an elevated chance of failing to meet scalability, real-time throughput and latency requirements that are much more stringent for data-centric systems than they are for most transaction-centric systems. Attributes like scalability, latency, capacity, and throughput are usually only measurable after a large investment has been made in the system development. To add salt to the wound, re-architecting a system after the mistake is discovered and ( more importantly) acknowledged, delays release and consumes resources by amounts that can seriously damage a company’s long term viability.

As an example, consider the CORBA and DDS OMG standard middleware technologies. CORBA was initially designed (by committee) from scratch to accommodate the development of big, distributed, client-server, transaction-centric systems. Thus, minimizing latency and maximizing throughout were not the major design drivers in its definition and development. DDS was designed to accommodate the development of big, distributed, publisher-subscriber, data-centric systems. It was not designed by a committee of competing vendors each eager to throw their pet features into a fragmented, overly complex quagmire. DDS was derived from the merger of two fielded and proven implementations, one by Thales (distributed naval shipboard sensor control systems) and the other by RTI (distributed robotic sensor and control systems). In contrast, as a well meaning attempt to be all things to all people, publish-subscribe capability was tacked on to the CORBA beast after the fact. Meanwhile, DDS has remained lean and mean. Because of the architecture busting risk for the types of applications DDS targets, no client-server capability has been back-fitted into its design.

Consider the example distributed, data-centric, sensor system application layer design below. If this application sits on top of a DDS middleware layer, there is no “intrusion” of a (single point of failure) CORBA broker into the application layer. Each application layer system component simply boots up, subscribes to the topic (message) streams it needs, starts crunching them with it’s app-specific algorithms, and publishes its own topic instances (messages) to the other system components that have subscribed to the topic.

Now consider a CORBA broker-based instantiation of this application example (refer to the figure below). Because of the CORBA requirement for each system component to register with an all knowing centralized ORB (Object Request Broker) authority, CORBA “leaks” into the application layer design. After registration, and after each service has found (via post-registration ORB lookup) the other services it needs to subscribe to, the ORB can disappear from the application layer – until a crash occurs and the system gets hosed. DDS avoids leakage into the value-added application layer by avoiding the centralized broker concept and providing for fully distributed, under-the-covers, “auto-discovery” of publishers and subscribers. No DDS application process has to interact with a registrar at startup to find other components – it only has to tell DDS which topics it will publish and which topics it needs to subscribe to. Each CORBA component has to know about the topics it needs and the services it needs to subscribe to.

The more a system is required to accommodate future growth, the more inapplicable a centralized ORB-based architecture is. Relative to a fully distributed coordination and auto-discovery mechanism that’s transparent to each application component, attempting to jam fit a single, centralized coordinator into a large scale distributed system so that the components can find and interact with each other reduces robustness, fault tolerance, and increases long term maintenance costs.