ISO/IEC JTC 1/SC 22/OWGV N 0155
Proposed Vulnerability Description on Concurrency
Contributed by Steve Michell
30 September 2008
6.Y   Concurrency [CGW]
6.Y.1 Description of Application Vulnerability
Concurrency, or parallel programming, includes
  - Multiple programs executing on single cpu or on multiple
  cpus,
  
 - Multiple tasks (or threads) executing in the same process
  on single cpu or on multiple cpus,
  
 - Multiple tasks (or threads) executing in the multiple processes
  on single cpu or on multiple cpus, and
  
 - A single treaded task that handles interrupts or events asynchronously
 
Concurrency presents a significant challenge to program correctly,
and has a large number of possible ways for failures to occur,
quite a few known attack vectors, and many possible but undiscovered
attack vectors.
Concurrency is a major challenge because the analysis of concurrent
programs is so difficult. In the general case there are exceedingly
large numbers (numbers in the ranges of 10**100 for even small
programs)  of possible interleavings of parallel portions. Static
analysis of the general cases is impossible because the complexities
are so very high and because such analysis requires modelling
the complete program at once - an NP Hard problem. Dynamic analysis
of the general cases is fruitless because a concurrent program
will never behave in exactly the same way twice, even with identical
inputs.
The previous discussion should not lead one to dispair, because
there are ways of taming concurrency, and there are various paradigms
that are known to be analyzable.
  - Where there is no dependency in storage, code base, or time
  between concurrent portions, and that they all terminate before
  the final result is collected.
  
 - CSP-compliant [ref hoare] intertask communication and scheduling
  with complete storage independence on one shot calculations.
  Such programs have been shown to be formally provable, as long
  as it can be shown (right down to the hardware level) that the
  locking mechanisms to implement the message channels is correct.
  
 - CSP-compliant intertask communication and scheduling with
  complete storage independence on some limited non-terminating
  programs (where formally provable using model checking), with
  the same caveats as above. Note, however that timing may not
  be verifiable because CSP tasks block on messages, sending or
  receiving, and not on time.
  
 - Some non terminating programs such as Rate Monotonic systems
  where all task scheduling has a common base (usually wakeup time
  as a function of urgency), intertask communication is nonblocking
  using monitors or protected objects and consistent with the ceiling
  priority protocol, unscheduled events are forbidden, and the
  lock management protocols for intertask communication has been
  shown to be correct.
 
Research is ongoing into more paradigms, such as variations
of non-terminating programs on single cpus using the Ravenscar
Tasking Profile.
The fact is, however, that we must work with concurrency in
more general settings, and most do not satisfy the constraints
listed above. Most concurrent programs are cyclic or non terminating,
handle events from an OS or interrupts from hardware, work with
multiple different levels of urgency (often called priority) between
components, have concurrency behaviours that are coupled to data
being processed, and have interesting data couplings between tasks.
Most real time systems are effectively concurrent systems,
but must account for time, mixed urgencies and the consequences
of failure to complete necessary computations.
It is not known if attacks based on knowledge of concurrency
of a system can be more sophisticated than denials of service
based on deadlock or livelock. 
For safety-related systems, the sheer number of possible failure
modes and unacceptable interactions create almost intolerable
complexity. Many such systems handle these by using exclusively
time-based cyclic systems, eliminating interrupts, and extensive
of dedicated cpus. Some usage of very restrictive concurrency
paradigms such as Ravenscar and similar microkernels has been
seen.
6.Y.2 Cross References
6.Y.3 Mechanism of Failure
The basic classifications of concurrency related failures are:
  - Data corruption from one task writing shared data that is
  being being accessed by another task.
  
 - Data corruption from multiple tasks accessing system services
  simultaneously
  
 - Deadlock, where some or all tasks become blocked waiting
  for resource(s) that cannot be released because the task responsible
  for releasing is also terminally blocked
  
 - Priority inversions, where a high urgency task is waiting
  for execution resources while a lower urgency task is executing,
  to the point that calculations are wrong, late or missed altogether.
  
 - Livelock, where one or more tasks consume all available computational
  resources doing meaningless tasks (such as waiting for a lock
  that cannot be released) so that tasks capable of doing meaningful
  work cannot execute
  
 - Missing time constraints, because of processing overrun,
  resulting in late and therefore wrong errors 
 
6.Y.4 Applicable Language Characteristics
  - Languages that permit concurrency within the language, or
  support libraries (such as POSIX) that provide hooks for concurrency
  control. 
  
 - Languages that interface with OS's that provide event-driven
  behaviour to individual subprograms, such that multiple subprograms
  could be executing concurrently in the same program, or even
  in different programs where shared data is being manipulated.
 
6.Y.5 Avoiding the Vulnerability or Mitigating its Effects
  - Use the simplest and highest level of concurrency abstraction
  that it is possible to get as a startng point.
  
 - Avoid low level primitives such as Semaphores, except to
  build higher level abstractions, such as Monitors, Protected
  Objects, or CSP message channels. 
  
 - Protect all data inside each task or inside shared objects
  protected from concurrent access (such as a monitor or a message
  channel)
  
 - Ensure that all protected objects account for priority and
  provide safety of access even when tasks of mixed priorities
  access that data.
  
 - Ensure that all possible cases of priority inversion are
  identified and analysed to develop guarantees on boundaries for
  priority inversion.
  
 - Expose the proposed system to extensive human review. 
  
 - Where the number of concurrent items is low and the concurrency-related
  states can be extracted from the larger program and is also relatively
  small, use model checkers [SPIN, UPPAAL,] to verify safety, liveness
  and timing propertes. Model checkers such as SPIN cannot handle
  time, but timed automata such as UPPAAL can handle reasoning
  about time.
 
6.Y.6 Implications for Standardization
  - Provide concurrency abstractions that support static analysis,
  such as one of the  models that are known to have safe properties.
 
6.Y. 7 Bibliography
  - Hoare A., "Communicating Sequential Processes",
  Prentice Hall, 1985
  
 - Holzmann G., "The SPIN Model Checker: Principles and
  Reference Manual"., Addison Wesley    Professional. 2003
  
 - UPPAAL, available from www.uppaal.com, 
  
 - Larsen, Peterson, Wang, "Model Checking for Real-Time
  Syetems"., Proceedings of the 10th International Conference
  on Fundamentals of Computation Theory, 1995
  
 - Ravenscar, specified in ISO/IEC 8652:1995 Ada with TC 1:2001
  and AM 1:2007