 |
| Scope |
The relentless
pressure to keep up with "Internet Time" results in
most organizations using ad hoc approaches to survive on a daily
basis, with no time or energy left for long-term investments
in surviving the coming months and years. While such an approach
can be made to work in the short term, it is inherently inadequate
at addressing trends over the span of years or decades. Instead,
it is vital that a concerted effort be made to prepare for downstream
problems in a number of key areas.
The long-term
scope will evolve as appropriate to address the hard, long-term
problems facing us. Current areas include:
- Use
of off-the-shelf components:
Most systems now rely heavily on the use of commercial off-the-shelf
(COTS) technology for hardware and/or software for reasons of
cost and time to market. Many current approaches to creating
dependable systems assume complete control and understanding
of system components---an assumption that is simply not representative
of the majority of systems that must be built. And, even if
complete understanding of components were possible, the marketplace
is such that components become obsolete and are replaced many
times over during the production and deployment life of many
critical systems. New techniques are urgently needed to create
highly dependable systems from "black-box" components
that continually change. Previously useful approaches and simpler forms of analysis (e.g., old notions
of creating components based on separation of concerns and creating
systems based on synthesis rather than component composition
no longer work for every situation).
- Use
of complex, non-dependable components:
Achieving high confidence is becoming more difficult as systems
become more complex. Today's trends of large-scale use of component technology, increased integration,
continuous evolution, and larger scale are yielding more complex
systems. Furthermore, such systems are often build of complex
components that are not inherently dependable. Not only is it
difficult to get such systems to work in the first place, but
furthermore such systems frequently exhibit unpredictable emergent
behaviors at inopportune moments. New ways to create dependable
systems from complex components are urgently needed.
- Hostile
operating environments:
Lacking adequate protection, today's information and communications
systems are being subjected to numerous malicious attacks. New
and advanced techniques are required to achieve required levels
of system integrity and availability. Protection against both
active and insider threats must be developed. Methods are needed
for system monitoring, detection, response, and recovery.
- Embedded
Systems:
Embedded computer systems are arguably both more difficult to
make dependable, and more in need of complete dependability.
Because they often do not have a human operator acting as a
safety net, embedded systems must achieve absolutely bulletproof
operation over years or decades of time. But, because the actual
amount of computational power used is small, such systems are
often perceived as easy to build and are often created by engineers
or technicians with no formal training in software engineering
or critical system design. Whereas desktop computers are built
in the tens of millions per year, embedded microcontrollers
are produced in the billions---soon to be tens of billions per
year. The challenge is how to scale high assurance methods down
to the budgets, timelines, and skill sets prevalent in the embedded
system world.
- Ubiquitous
critical systems:
The days of critical systems being a niche market are over.
Many everyday safety critical systems will soon have or already
have software in them. Consider, for example, a domestic hot
water heating system, which can cause scalding burns if it drifts
even a few degrees higher than its set point. Or, consider an
Internet-based stock trading system that can bankrupt a user
who (foolishly) depends on typical response times being available
during a stock market meltdown. As we entrust our lives and
livelihoods to computers, many systems will effectively become
critical. A challenge here is how to proliferate good practice
in highly dependable system design to everyday practitioners
rather than a few select critical system designers in niche
fields such as nuclear power and aerospace applications.
- Indirectly
critical systems:
As computer systems are becoming highly complex, so is our society.
While the number of critical systems is growing, the number
of indirectly critical systems also grows. For example, the
software that routes messages for a personal pager system becomes
indirectly critical when it transmits the page for an emergency
room physician to respond to a crisis. Similarly, database software
becomes indirectly critical when it identifies owners of vehicles
subject to an urgent recall notice or is used to look up emergency
contact information. Even a simple word processor can become
mission critical if it crashes a few minutes before the courier
pickup deadline for a proposal submission. It is vital that
even everyday, seemingly non-critical, applications be raised
to a higher level of dependability to reduce the enormous hidden
costs their unreliability levies on businesses and individuals.
- International
markets:
The U.S. is not alone in its growing dependence on computing
throughout industries having safety-critical aspects. This is
especially true in transportation, health care, energy, and
manufacturing sectors. However, many areas do not have the technical
and labor infrastructures to support critical system operation.
It will be imperative to create dependable systems that can
operate properly even with shortages of repair parts, scarce
availability of skilled operators/maintainers, and erratically available infrastructure
support.
|
|
Activities
|
Six
research and education activities will contribute to the HDCC
strategic goals:
1. Provide
a sound theoretical, scientific and technological basis for
assured construction of safe, secure systems.
To meet
this goal, the research agenda
must:
- achieve
the capability to specify,
- compose,
analyze, and assess system behavioral properties,
- furnish
the capability to enforce specific behavioral properties,
- and furnish
the capability to be more predictably tolerant of specified
behavioral failures including malicious attack.
These are still
hot topics in universities despite the general acceptance of
C (and perhaps, someday, Java) as do-everything programming
languages. Ultimately, the proper and reliable functioning
of a system depends upon people describing their designs in
a formal specification, namely a language. When the language
is shaky, the entire edifice will be built on a soft foundation.
Special areas of interest include applications of logic, techniques
for designing and implementing programming languages, and formal
specification and verification of hardware and software systems.
It is important to apply these techniques to problems of realistic
scale and complexity, for example: implementation of high speed
network communication software and application of type theoretic
principles in the construction of compilers for proof carrying
code. For Carnegie Mellon activities in principles of programming
see http://www.cs.cmu.edu/Groups/pop/pop.html
2. Develop
hardware, software, and system engineering tools that incorporate
ubiquitous, application-based, domain-based, and risk-based
assurance.
To meet
this goal the HDCC research agenda must:
- furnish
the methods, tools, and environments necessary for the design,
construction, and evaluation of behavioral enforcement mechanisms;
- and establish
indicators and characteristics of overall system confidence
in the achieved behavioral properties gained through the application
of such methods, tools and environments.
Software
Engineering has grown into a field of Computer Science in its
own right. Its aim is that systems constructed from software
can attain the same reliability and predictability as bridges
and other symbols of engineering excellence. At Carnegie Mellon
much of the research and education in this field is conducted
by the Institute for Software Research (http://www.isri.cs.cmu.edu/)
and the Software Engineering Institute (http://www.sei.cmu.edu/).
3. Reduce
the effort, time, and cost of assurance and quality certification
processes.
To meet
this goal, the HDCC research agenda must:
- furnish
the means to improve the productivity of information system
design, development, and analysis,
- while simultaneously
improving the levels of confidence that can be achieved through
such productivity enhancements.
The industrial
use of system analysis and verification tools has been limited,
but university researchers have made considerable progress in
producing tools that find bugs in real hardware and software.
So far, most of the success has been in hardware where complexity
is lower and specifications cleaner; but there have been promising
successes in software as well.
For Carnegie Mellon activities
in formal systems see
http://www.cs.cmu.edu/Groups/
formal-methods/formal-methods.html
4. Understand
the human problems in creating, maintaining, and using computer
systems.
This has
become a vital area of research as computers have become ubiquitous.
Seat-of-the-pants design might have been sufficient when the
users of computers were engineers, scientists, and programmers;
but now a deep understanding of human capabilities must be built
into design because the users are often very different from
the designers. "Pilot error" is the most frequently
cited cause of airline mishaps, and "programmer error"
is similarly often the purported cause of software defects,
except in the frequent case in which problems are blamed on
"user error". We need to understand and account for
the capabilities of both the designers and end users of systems.
For Carnegie Mellon activities in human-computer interaction
see http://www.hcii.cmu.edu/.
5. Provide
measures of results.
To meet
this goal, the HDCC research agenda must:
- develop
measures of performance and measures of effectiveness for use
in quantifying and qualifying the progress of improvements in
system-level confidence that can be achieved through the application of HDCC
technologies.
- Further,
the agenda must show through such measures that the benefits
achieved are cost effective.
One reason
to do system fault discovery is to find a metric. Fault discovery
is only somewhat helpful as a debugging technique---it is much
more powerful as a quality assurance technique in support of
building dependable systems. For some Carnegie Mellon research
in this area see
http://www.ices.cmu.edu/ballista
6.Promote
software engineering education.
Currently,
de facto software engineers coming from universities are emerging
from departments of computer science and engineering. Unfortunately
the computer scientists are often too theoretical while the
engineers are often too hardware-oriented. What is needed is
professional education akin to what medical doctors receive,
but nobody is doing it. Both software engineering research
and education must have strong connections to practice:
education needs a practical setting to develop skill, and research
needs access to real problems that expose the deep issues involved
in
real-world development.
We should
create an institution that serves software engineering as a
teaching hospital serves medicine. Students would learn in the
context of real cases. Clinical faculty would both practice
and teach. Research would exploit access to real cases and data.
We would provide a development laboratory in which real software
developers produce real software for real clients. Developers
would interact with researchers to infuse the research agenda
with visibility into real problems, and developers can take
advantage of research results. Students would learn through
direct experience in a real---not just "realistic"---setting.
Clinical faculty would be skilled professional software developers
and have significant responsibilities for both teaching and
software production.
Reprinted
with kind permission from
Jim Morris, Dean, Carnegie Mellon School of Computer
Science, from his
essay on a High Dependability Computing Consortium in which
he suggests that universities, government, and industry should
initiate a long-term research and education program to make
computing and communication systems dependable enough for people
to trust with their everyday lives and livelihoods.
|
|