Doc. no.	P10010R1
Date:	2018-03-16
Project:	Programming Language C++
Audience:	SG1 Parallelism and Concurrency
	Library Evolution Working Group
Reply to:	Alisdair Meredith <ameredith1@bloomberg.net>

Target Vectorization Policies from Parallelism V2 TS to C++20

Revision History

Revision 0
Revision 1

1 Introduction
2 Stating the problem

2.1 Conclusion of SG1 Review (Jacksonville 2018

3 Propose Solution
4 Other Directions
5 Formal Wording
6 Acknowledgements
7 References

Revision History

Revision 0

Original version of the paper for the 2018 Jacksonville meeting.

Revision 1

Revision of the paper following SG1 feedback at the 2018 Jacksonville meeting, proposing just the unsquenced policy for C++20.

1 Introduction

The Parallelism v2 PDTS has two new vector policies for the parallel algorithms that might better target C++20 directly.

We plan to send the Parallelism V2 working paper to PDTS ballot at the Jacksonville meeting. It seems reasonable to conclude that if we want feedback from the TS process, then anything in that TS vehicle will miss the merge deadline for C++20.

Of the (expected) three components to that TS, the Task Blocks feature depends on the exception_list class that is still underspecified for general use, and waiting feedback from such a TS process.

The simd library type has been going through rapid evolution and been subject to much change in the review process, so seems appropriate to seek feedback through the TS process.

Finally, there are two new vectorization policies for the parallel algorithms, which rely on the main <algorithm> header providing overloads for the new policies in the non-experimental std namespace in order to be usable. The algorithms can safely be implemented as reverting to serial or any other lesser parallel behavior without using the freedoms granted by the wavefront policies, so implementability is much less of a concern than QoI. There is no room in the design space for how we dispatch to the algorithms to be seeking TS feedback, so it is seems reasonable to suggest that if we want this feature, it would be more appropriate to target the C++20 standard directly, avoiding the awkward interaction with the std and experimental namespaces. We are still early enough in the C++20 cycle to respond to unexpected implementer feedback, and the current state of the C++17 parallel algorithms is that most standard library vendors are still starting or in the middle of their first implementation of the parallel algorithms, so it would be easier to tackle these extra policies now along with the rest of that work.

2.1 Conclusion of SG1 Review (Jacksonville 2018

Merging the unsequenced policy was entirely non-controversial, and is recommended for C++20 by SG1. It fills an important missing hole in the original specifciation.

Moving ahead with the vector wavefront policy seemed premature to the group. There is a desire to see that the new policy is generally useful, and provides a real optimization opportunity in practice. There are particular concerns that it appears useful in only a small subset of the parallel algorithms, and a more targeted approach for those few special cases might be more appropriate.

3 Propose Solution

Merge the wording for the unsequenced execution policy into C++20, but leave it in place in the Parallelism V2 TS as it may be useful as a fallback policy for algorirthms implementing the vector executiuon policy.

4 Other Directions

The original proposal recommended merging both vectorization policies, and their associated machinery, into C++20, and removing them from the Parallelism V2 TS. SG1 rejected this approach as there is a desire to see real benefits from the implementation of the vector wavefront policy in shipping libraries before adopting in the standard. There was a particular concern that very few existing algorithms are expected to see a benefit from this policy, other than the new algorithms specifically added in the TS for that policy. We may want to revisit the idea of whether that policy is a generic execution policy like the others, or whether we add just a few specific overloads explicitly taking this policy for the few algorithms that are expected to profit.

5 Formal Wording

23.19.2 Header synopsis [execution.syn]
namespace std {
  // 23.19.3, execution policy type trait
  template<class T> struct is_execution_policy;
  template<class T> inline constexpr bool is_execution_policy_v = is_execution_policy<T>::value;
}

namespace std::execution {
  // 23.19.4, sequenced execution policy
  class sequenced_policy;

  // 23.19.5, parallel execution policy
  class parallel_policy;

  // 23.19.6, parallel and unsequenced execution policy
  class parallel_unsequenced_policy;

  // 23.19.7, unsequenced execution policy
  class unsequenced_policy;

  // 23.19.78, execution policy objects
  inline constexpr sequenced_policy seq{ unspecified };
  inline constexpr parallel_policy par{ unspecified };
  inline constexpr parallel_unsequenced_policy par_unseq{ unspecified };
  inline constexpr unsequenced_policy unseq{ unspecified };
}
23.19.7 Unsequenced execution policy [parallel.execpol.unseq]
class unsequenced_policy{ unspecified };
The class unsequenced_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be vectorized, e.g., executed on a single thread using instructions that operate on multiple data items.

The invocations of element access functions in parallel algorithms invoked with an execution policy of type unsequenced_policy are permitted to execute in an unordered fashion in the calling thread, unsequenced with respect to one another within the calling thread. [ Note: This means that multiple function object invocations may be interleaved on a single thread. — end note ]

[ Note: This overrides the usual guarantee from 4.6 [intro.execution] that function executions do not overlap with one another. — end note ]

During the execution of a parallel algorithm with the execution::unsequenced_policy policy, if the invocation of an element access function exits via an uncaught exception, terminate() shall be called.

23.19.78 Execution policy objects [execpol.objects]
inline constexpr execution::sequenced_policy            execution::seq{ unspecified };
inline constexpr execution::parallel_policy             execution::par{ unspecified };
inline constexpr execution::parallel_unsequenced_policy execution::par_unseq{ unspecified };
inline constexpr execution::unsequenced_policy          execution::unseq{ unspecified };
The header <execution> declares global objects associated with each type of execution policy.

6 Acknowledgements

Thanks to all the troublemakers in Jacksonville who persuaded me there was time to write this late paper! Thanks to SG1 for finding the time to review it, and provide feedback for a more appropriate timetable :)

7 References

N4725 Working Draft, Technical Specification for C++ Extensions for Parallelism Version 2, Jared Hoberock
P0075R2 Template Library for Parallel For Loops, Pablo Halpern, Clark Nelson, Arch D. Robison, Robert Geva
P0076R4 Vector and Wavefront Policies, Arch Robison, Pablo Halpern, Robert Geva, Clark Nelson, Jens Maurer

Target Vectorization Policies from Parallelism V2 TS to C++20

Table of Contents

23.19.2 Header synopsis [execution.syn]

23.19.7 Unsequenced execution policy [parallel.execpol.unseq]

23.19.78 Execution policy objects [execpol.objects]