ISO/IEC JTC 1/SC 22/OWGV N 0155

Proposed Vulnerability Description on Concurrency

Contributed by Steve Michell

30 September 2008

6.Y Concurrency [CGW]

6.Y.1 Description of Application Vulnerability

Concurrency, or parallel programming, includes

Multiple programs executing on single cpu or on multiple cpus,
Multiple tasks (or threads) executing in the same process on single cpu or on multiple cpus,
Multiple tasks (or threads) executing in the multiple processes on single cpu or on multiple cpus, and
A single treaded task that handles interrupts or events asynchronously

Concurrency presents a significant challenge to program correctly, and has a large number of possible ways for failures to occur, quite a few known attack vectors, and many possible but undiscovered attack vectors.

Concurrency is a major challenge because the analysis of concurrent programs is so difficult. In the general case there are exceedingly large numbers (numbers in the ranges of 10**100 for even small programs) of possible interleavings of parallel portions. Static analysis of the general cases is impossible because the complexities are so very high and because such analysis requires modelling the complete program at once - an NP Hard problem. Dynamic analysis of the general cases is fruitless because a concurrent program will never behave in exactly the same way twice, even with identical inputs.

The previous discussion should not lead one to dispair, because there are ways of taming concurrency, and there are various paradigms that are known to be analyzable.

Where there is no dependency in storage, code base, or time between concurrent portions, and that they all terminate before the final result is collected.
CSP-compliant [ref hoare] intertask communication and scheduling with complete storage independence on one shot calculations. Such programs have been shown to be formally provable, as long as it can be shown (right down to the hardware level) that the locking mechanisms to implement the message channels is correct.
CSP-compliant intertask communication and scheduling with complete storage independence on some limited non-terminating programs (where formally provable using model checking), with the same caveats as above. Note, however that timing may not be verifiable because CSP tasks block on messages, sending or receiving, and not on time.
Some non terminating programs such as Rate Monotonic systems where all task scheduling has a common base (usually wakeup time as a function of urgency), intertask communication is nonblocking using monitors or protected objects and consistent with the ceiling priority protocol, unscheduled events are forbidden, and the lock management protocols for intertask communication has been shown to be correct.

Research is ongoing into more paradigms, such as variations of non-terminating programs on single cpus using the Ravenscar Tasking Profile.

The fact is, however, that we must work with concurrency in more general settings, and most do not satisfy the constraints listed above. Most concurrent programs are cyclic or non terminating, handle events from an OS or interrupts from hardware, work with multiple different levels of urgency (often called priority) between components, have concurrency behaviours that are coupled to data being processed, and have interesting data couplings between tasks.

Most real time systems are effectively concurrent systems, but must account for time, mixed urgencies and the consequences of failure to complete necessary computations.

It is not known if attacks based on knowledge of concurrency of a system can be more sophisticated than denials of service based on deadlock or livelock.

For safety-related systems, the sheer number of possible failure modes and unacceptable interactions create almost intolerable complexity. Many such systems handle these by using exclusively time-based cyclic systems, eliminating interrupts, and extensive of dedicated cpus. Some usage of very restrictive concurrency paradigms such as Ravenscar and similar microkernels has been seen.

6.Y.2 Cross References

6.Y.3 Mechanism of Failure

The basic classifications of concurrency related failures are:

Data corruption from one task writing shared data that is being being accessed by another task.
Data corruption from multiple tasks accessing system services simultaneously
Deadlock, where some or all tasks become blocked waiting for resource(s) that cannot be released because the task responsible for releasing is also terminally blocked
Priority inversions, where a high urgency task is waiting for execution resources while a lower urgency task is executing, to the point that calculations are wrong, late or missed altogether.
Livelock, where one or more tasks consume all available computational resources doing meaningless tasks (such as waiting for a lock that cannot be released) so that tasks capable of doing meaningful work cannot execute
Missing time constraints, because of processing overrun, resulting in late and therefore wrong errors

6.Y.4 Applicable Language Characteristics

Languages that permit concurrency within the language, or support libraries (such as POSIX) that provide hooks for concurrency control.
Languages that interface with OS's that provide event-driven behaviour to individual subprograms, such that multiple subprograms could be executing concurrently in the same program, or even in different programs where shared data is being manipulated.

6.Y.5 Avoiding the Vulnerability or Mitigating its Effects

Use the simplest and highest level of concurrency abstraction that it is possible to get as a startng point.
Avoid low level primitives such as Semaphores, except to build higher level abstractions, such as Monitors, Protected Objects, or CSP message channels.
Protect all data inside each task or inside shared objects protected from concurrent access (such as a monitor or a message channel)
Ensure that all protected objects account for priority and provide safety of access even when tasks of mixed priorities access that data.
Ensure that all possible cases of priority inversion are identified and analysed to develop guarantees on boundaries for priority inversion.
Expose the proposed system to extensive human review.
Where the number of concurrent items is low and the concurrency-related states can be extracted from the larger program and is also relatively small, use model checkers [SPIN, UPPAAL,] to verify safety, liveness and timing propertes. Model checkers such as SPIN cannot handle time, but timed automata such as UPPAAL can handle reasoning about time.

6.Y.6 Implications for Standardization

Provide concurrency abstractions that support static analysis, such as one of the models that are known to have safe properties.

6.Y. 7 Bibliography

Hoare A., "Communicating Sequential Processes", Prentice Hall, 1985
Holzmann G., "The SPIN Model Checker: Principles and Reference Manual"., Addison Wesley Professional. 2003
UPPAAL, available from www.uppaal.com,
Larsen, Peterson, Wang, "Model Checking for Real-Time Syetems"., Proceedings of the 10th International Conference on Fundamentals of Computation Theory, 1995
Ravenscar, specified in ISO/IEC 8652:1995 Ada with TC 1:2001 and AM 1:2007