The Cost Challenges of System of Systems

The Department of Defense (DoD) has migrated from a platform-based acquisition strategy to one focused on delivering capabilities. Instead of delivering a fighter aircraft or an unmanned air vehicle, contractors are now being asked to deliver the right collection of hardware and software to meet specific wartime challenges. This means that much of the burden associated with conceptualizing, architecting, integrating, implementing, and deploying complex capabilities into the field has shifted from desks in the Pentagon to desks at Lockheed Martin, Boeing, Rockwell, and other large aerospace and defense contractors.

In “The Army’s Future Combat Systems’ [FCS] Features, Risks and Alternatives,” the Government Account-ing Office states the challenge as:

…14 major weapons systems or platforms have to be designed and integrated simultaneously and within strict size and weight limitations in less time than is typically taken to develop, demonstrate, and field a single system. At least 53 technologies that are considered critical to achieving critical performance capabilities will need to be matured and integrated into the system of systems. And the development, demonstration, and production of as many as 157 complementary systems will need to be synchronized with FCS content and schedule. [1]
The planning, management, and execution of such projects will require changes in the way organizations do business. This article reports on ongoing research into the cost challenges associated with planning and executing a system of systems (SOS) project. Because of the relatively immature nature of this acquisition strategy, there is not nearly enough hard data to establish statistically significant cost-estimating relationships. The conclusions drawn to date are based on what we know about the cost of system engineering and project management activities in more traditional component system projects augmented with research on the added factors that drive complexities at the SOS level.

The article begins with a discussion of what an SOS is and how projects that deliver SOS differ from those projects delivering stand-alone systems. Following this is a discussion of the new and expanded roles and activities associated with SOS that highlight increased involvement of system engineering resources. The focus then shifts to cost drivers for delivering the SOS capability that ties together and optimizes contributions from the many component systems. The article concludes with some guidelines for using these cost drivers to perform top-level analysis and trade-offs focused on delivering the most affordable solution that will satisfy mission needs.

Related Research
Extensive research has been conducted on many aspects of SOS by the DoD, academic institutions, and industry. Earlier research focused mainly on requirements, architecture, test and evaluation, and project management [2, 3, 4, 5, 6, 7, 8]. As time goes on and the industry gets a better handle on the technological and management complexities of SOS delivery, the research expands from a focus on the right way to solve the problem to a focus on the right way to solve the problem affordably. In the forefront of this cost-focused research is the University of Southern California’s Center for Software Engineering [9], the Defense Acquisition University [10], Carnegie Mellon’s Software Engineering Institute [11], and Cranfield University [12].

What Is an SOS?
An SOS is a configuration of component systems that are independently useful but synergistically superior when acting in concert. In other words, it represents a collection of systems whose capabilities, when acting together, are greater than the sum of the capabilities of each system acting alone.

According to Mair [13], an SOS must have most, if not all, of the following characteristics:

Operational independence of component systems.
Managerial independence of component systems.
Geographical distribution.
Emergent behavior.
Evolutionary development processes.
For the purposes of this research, this definition has been expanded to explicitly state that there be a network-centric focus that enables these systems to communicate effectively and efficiently.

Today, there are many platforms deployed throughout the battlefield with limited means of communication. This becomes increasingly problematic as multiple services are deployed on a single mission as there is no consistent means for the Army to communicate with the Navy or the Navy to communicate with the Air Force. Inconsistent and unpredictable means of communication across the battlefield often results in unacceptable time from detection of a threat to engagement. This can ultimately endanger the lives of our service men and women.

How Different Are SOS Projects?
How much different is a project intended to deliver an SOS capability from a project that delivers an individual platform such as an aircraft or a submarine? Each case presents a set of customer requirements that need to be elicited, understood, and maintained. Based on these requirements, a solution is crafted, implemented, integrated, tested, verified, deployed, and maintained. At this level, the two projects are similar in many ways. Dig a little deeper and differences begin to emerge. The differences fall into several categories: acquisition strategy, software, hardware, and overall complexity.

The SOS acquisition strategy is capability-based rather than platform-based. For example, the customer presents a contractor with a set of capabilities to satisfy particular battlefield requirements. The contractor then needs to determine the right mix of platforms, the sources of those platforms, where existing technology is adequate, and where invention is required. Once those questions are answered, the contractor must decide how best to integrate all the pieces to satisfy the initial requirements. This capability-based strategy leads to a project with many diverse stakeholders. Besides the contractor selected as the lead system integrator (LSI), other stakeholders that may be involved include representatives from multiple services, Defense Advanced Research Projects Agency, prime contractor(s) responsible for supplying component systems as well as their subcontractors. Each of these stakeholders brings to the table different motivations, priorities, values, and business practices – each brings new people management issues to the project.

Software is an important part of most projects delivered to DoD customers. In addition to satisfying the requirements necessary to function independently, each of the component systems needs to support the interoperability required to function as a part of the entire SOS solution. Much of this interoperability will be supplied through the software resident in the component systems. This requirement for interoperability dictates that well-specified and applied communication protocols are a key success factor when deploying an SOS. Standards are crucial, especially for the software interfaces. Additionally, because of the need to deliver large amounts of capability in shorter and shorter timeframes, the importance of commercial off-the-shelf (COTS) software in SOS projects continues to grow.

With platform-based acquisitions, the customer generally has a fairly complete understanding of the requirements early on in the project with a limited amount of requirements growth once the project commences. Because of the large scale and long-term nature of capability-based acquisitions, the requirements tend to emerge over time with changes in governments, policies, and world situations. Because requirements are emergent, planning and execution of both hardware and software contributions to the SOS project are impacted.

SOS projects are also affected by the fact that the hardware components being used are of varying ages and technologies. In some cases, an existing hardware platform is being modified or upgraded to meet increased needs of operating in an SOS environment, while in other instances brand new equipment with state-of-the-art technologies is being developed. SOS project teams need to deal with components that span the spectrum from the high-tech, but relatively untested to the low-tech, tried-and-true technologies and equipment.

Basically, a project to deliver an SOS capability is similar in nature to a project intended to deliver a specific platform except that overall project complexity may be increased substantially. These complexities grow from capability-based acquisition strategies, increased number of stakeholders, increased overall cost (and the corresponding increased political pressure), emergent requirements, interoperability, and equipment in all stages from infancy to near retirement.

New and Expanded Roles and Activities
Understanding the manifestation of these increased complexities on a project is the first step to determining how the planning and control of an SOS project differs from that of a project that delivers one of the component systems. One of the biggest and most obvious differences in the project team is the existence of an LSI. The LSI is the contractor tasked with the delivery of the SOS that will deliver the capabilities the DoD customer is looking for. The LSI can be thought of as the super prime or the prime of prime contractors. He or she is responsible for managing all the other primes and contractors and ultimately for fielding the required capabilities. The main areas of focus for the LSI include:

Requirements analysis for the SOS.
Design of SOS architecture.
Evaluation, selection, and acquisition of component systems.
Integration and test of the SOS.
Modeling and simulation.
Risk analysis, avoidance, and mitigation.
Overall program management for the SOS.
One of the primary jobs of the LSI is completing the system engineering tasks at the SOS level.

Focus on System Engineering
The following is according to the “Encyclopedia Britannica”:

“… system engineering is a technique of using knowledge from various branches of engineering and science to introduce technological innovations into the planning and development stages of systems. Systems engineering is not as much a branch of engineering as it is a technique for applying knowledge from other branches of engineering and disciplines of science in an effective combination. [14]
System engineering as a discipline first emerged during World War II as technology improvements collided with the need for more complex systems on the battlefield. As systems grew in complexity, it became apparent that it was necessary for there to be an engineering presence well versed in many engineering and science disciplines to lend an understanding of the entire problem a system needed to solve. To quote Admiral Grace Hopper, “Life was simple before World War II. After that, we had systems [15].”

With this top-level view, the system engineers were able to grasp how best to optimize emerging technologies to address the specific complexities of a problem. Where an electrical engineer would concoct a solution focused on the latest electronic devices and a software engineer would develop the best software solution, the system engineer knows enough about both disciplines to craft a solution that gets the best overall value from technology. Additionally, the system engineer has the proper understanding of the entire system to perform validation and verification upon completion, ensuring that all component pieces work together as required.

Today, a new level of complexity has been added with the emerging need for SOS, and once again the diverse expertise of the system engineers is required to overcome this complexity. System engineers need to comprehend the big picture problem(s) whose solution is to be provided by the SOS. They need to break these requirements down into the hardware platforms and software pieces that best deliver the desired capability, and they need to have proper insight into the development, production, and deployment of the systems to ensure not only that they will meet their independent requirements, but also that they will be designed and implemented to properly satisfy the interoperability and interface requirements of the SOS. It is the task of the system engineers to verify and validate that the component systems, when acting in concert with other component systems, do indeed deliver the necessary capabilities.

Large Software Systems. Back to Basics

Today when we launch a software project, its likelihood of success is inversely proportional to its size. The Standish Group reports that the probability of a successful software project is zero for projects costing $10 million or more [1]. This is because the complexity of the problem exceeds one person’s ability to comprehend it. According to The Standish Group, small projects succeed because they reduce confusion, complexity, and cost. The solution to the problem of building large systems is to employ those same techniques that help small projects succeed—minimize complexity and emphasize clarity.
The goals, constraints, and operating environment of a large software system, along with its high-level functional specification, describe the requirements of the systems. Assuming we have good requirements, we can decompose our system into smaller subsystems. Decomposition proceeds until we have discrete, coherent modules. The modules should be understandable apart from the system and represent a single idea or concept. When decomposition is finished, the modules can be incorporated into an architecture.
Frederick P. Brooks said that the conceptual integrity of the architecture is the most important factor in obtaining a robust system [2]. Brooks observed that it can only be achieved by one mind, or a very small number of resonant minds. He also made the point that architectural design must be separated from development. In his view, a competent software architect is a prerequisite to building a robust system.
An architecture is basically the framework of the system, detailing interconnections, expected behaviours, and overall control mechanisms. If done right, it lets the developers concentrate on specific module implementations by freeing them of the need to design and implement these interconnections, data flow routines, access synchronisation mechanisms, and other system functions. Developers typically expend a considerable amount of energy on these tasks, so not doing them is a considerable savings of time and effort [3].
A robust architecture is one that is flexible, changeable, simple yet elegant. If done right and documented well, it reduces the need for interteam communication and facilitates successful implementation of complex modules. If done well, it is practically invisible; if done poorly, it is a never-ending source of aggravation, cost, and needless complexity.
Architecture flows from the requirements and the functional specification. The requirements and functional specification need to be traced to the architecture and its modules, and the modules in the architecture should be traced to the requirements and functional specification. The requirements must necessarily be correct, complete, unambiguous, and, where applicable, measurable. Obtaining requirements with these qualities is the responsibility of the architect. It must be his highest priority. He does this by interacting closely with the customers and domain experts. If necessary, he builds prototypes to validate and clarify the requirements The architect acts as the translator between the customers and the developers. The customers do not know how to specify their needs in the unambiguous language that developers need, and the developers do not always have the skills to do requirements analysis.
The architect communicates his desires to the developers by specifying black-box descriptions of the modules. Black boxes are abstract entities that can be understood, and analyzed independently of the rest of the system. The process of building black-box models is called abstraction. Abstraction is used to simplify the design of a complex system by reducing the number of details that must be considered at the same time, thus reducing confusion and aiding clarity of understanding [4]. For safety-critical, military-critical, and other high-integrity systems, black boxes can be specified unambiguously with mathematical logic using formal methods. Supplemented with natural language descriptions, this is probably the safest way to specify a system. It is usually more expensive and time consuming, as well. In the future, however, all software architects should know how to mathematically specify a module.
A robust architecture is necessary for a high-quality, dependable system. But it is not sufficient. A lot depends on how the developers implement modules handed to them by the architect.
The Rest of the Solution
Developers need to build systems that are dependable and free from faults. Since they are human, this is impossible. Instead they must strive to build systems that minimize faults by using best practices, and they must use modern tools that find faults during unit test and maintenance. They should also be familiar with the concepts of measuring reliability and how to build a dependable system. (A dependable system is one that is available, reliable, safe, confidential, has high integrity, and is maintainable [5].) In order for the system to be dependable, the subsystems and modules must be dependable.
Fault prevention starts with clear, unambiguous requirements. The architect should provide these so the developer can concentrate on implementation. If the architecture is robust, the developer can concentrate on his particular module, free of extraneous details and concerns. The architect’s module description tells the developer what to implement, but not how to implement it. The internals of the implementation are up to him. To ensure dependability, the developer needs to use sound software engineering principles and best practices, as these are his chief means of of minimizing complexity. Two best practices are coding standards and formal inspections.
Coding standards are necessary because every language has problem areas related to reliability and understandability. The best way to avoid the problem areas is to ban them, using an enforceable standard. Les Hatton describes why coding standards are important for safety and reliability and how to introduce a coding standard [6]. A key point he stresses is to never incorporate stylistic information into the standard. It will be a never-ending source of acrimony and debate. Such information, he says, should be placed in a style guide. Coding standards can be enforced with automatic tools that check the code, and by formal inspections. The benefits of formal inspections for defect prevention are well-known and well-documented. They are also invaluable for clarifying issues related to the software.
Developers need to measure their code to ensure its quality. This provides useful feedback to the developer on his coding practices, and it provides reassurance to the system’s acquirers and users. Many static metrics can be used to assess the code. Among these are purity ratio, volume, functional density, and cyclomatic complexity. As a doctor uses a battery of tests to gauge a person’s health, relying on more than one metric and covering all his bases, a developer using static analysis tools can do the same [7].
A good metric, for example, is cyclomatic complexity. A large value is a sign of complex code, which may be an indication of poor thought given to the design and implementation. It is also a sign that the code will be difficult to test and maintain.
Fault detection by proper unit testing is vitally important. To be done right, it requires the use of code coverage and path analysis tools. Unfortunately, this type of testing is usually overlooked. Many managers say they cannot afford them. Somehow, though, they can afford to fix the problems after the software has been fielded. This is penny-wise and pound-foolish. It is axiomatic that fixing software faults after the code has been deployed can be up to 100 times more expensive than finding and fixing the during development [8].
Besides path analysis and code coverage tools, automatic testing tools should be used. Human testers cannot hope to match the computer on indefatigability or thoroughness. In large systems, if testing is not automated, it is not done, or done rarely. For example, regression testing, used in systems undergoing modification and evolution, is essential to ensure that errors are not injected into code undergoing change, a very common problem in complex systems. Without automation, the process is onerous and time consuming. It rarely gets done, if at all.
Developing quality code is not simple or easy. It requires discipline and rigor, state-of-the-art tools, and enlightened managers willing to support developers by paying up-front costs, such as giving developers more time to automate and test their code. Developers take pride in their work. When they get the support they need, they know that their managers want them to produce quality code. This makes the work satisfying and rewarding.
Managing and limiting complexity and promoting clarity is fundamental to developing large software systems. The key ingredient is a robust architecture. The conceptual integrity of the  architecture, its elegance and clarity, depends on a single mind. Developers build upon the architecture and ensure its robustness by rigorous application of basic software engineering principles and best practices in their code development.