Updated <atomic> docs with three design options

git-svn-id: https://llvm.org/svn/llvm-project/libcxx/trunk@115791 91177308-0d34-0410-b5e6-96231b3b80d8
2024-11-23 20:09:41 +00:00 · 2010-10-06 16:15:10 +00:00 · 2010-10-06 16:15:10 +00:00 · 086b718734
commit 086b718734
parent e78d1f548b
4 changed files with 839 additions and 408 deletions
--- a/www/atomic_design.html
+++ b/www/atomic_design.html
@ -36,422 +36,22 @@
  <!--*********************************************************************-->

 <p>
-The <tt>&lt;atomic&gt;</tt> header is one of the most closely coupled headers to
-the compiler.  Ideally when you invoke any function from
-<tt>&lt;atomic&gt;</tt>, it should result in highly optimized assembly being
-inserted directly into your application ...  assembly that is not otherwise
-representable by higher level C or C++ expressions.  The design of the libc++
-<tt>&lt;atomic&gt;</tt> header started with this goal in mind.  A secondary, but
-still very important goal is that the compiler should have to do minimal work to
-faciliate the implementaiton of <tt>&lt;atomic&gt;</tt>.  Without this second
-goal, then practically speaking, the libc++ <tt>&lt;atomic&gt;</tt> header would
-be doomed to be a barely supported, second class citizen on almost every
-platform.
+There are currently 3 designs under consideration.  They differ in where most
+of the implmentation work is done.  The functionality exposed to the customer
+should be identical (and conforming) for all three designs.
 </p>

-<p>Goals:</p>
-
-<blockquote><ul>
-<li>Optimal code generation for atomic operations</li>
-<li>Minimal effort for the compiler to achieve goal 1 on any given platform</li>
-<li>Conformance to the C++0X draft standard</li>
-</ul></blockquote>
-
-<p>
-The purpose of this document is to inform compiler writers what they need to do
-to enable a high performance libc++ <tt>&lt;atomic&gt;</tt> with minimal effort.
-</p>
-
-<h2>The minimal work that must be done for a conforming <tt>&lt;atomic&gt;</tt></h2>
-
-<p>
-The only "atomic" operations that must actually be lock free in
-<tt>&lt;atomic&gt;</tt> are represented by the following compiler intrinsics:
-</p>
-
-<blockquote><pre>
-__atomic_flag__
-__atomic_exchange_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr)
-{
-    unique_lock&lt;mutex&gt; _(some_mutex);
-    __atomic_flag__ result = *obj;
-    *obj = desr;
-    return result;
-}
-
-void
-__atomic_store_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr)
-{
-    unique_lock&lt;mutex&gt; _(some_mutex);
-    *obj = desr;
-}
-</pre></blockquote>
-
-<p>
-Where:
-</p>
-
-<blockquote><ul>
+<ol type="A">
 <li>
-If <tt>__has_feature(__atomic_flag)</tt> evaluates to 1 in the preprocessor then
-the compiler must define <tt>__atomic_flag__</tt> (e.g. as a typedef to
-<tt>int</tt>).
+<a href="atomic_design_a.html">Minimal work for the library</a>
 </li>
 <li>
-If <tt>__has_feature(__atomic_flag)</tt> evaluates to 0 in the preprocessor then
-the library defines <tt>__atomic_flag__</tt> as a typedef to <tt>bool</tt>.
+<a href="atomic_design_b.html">Something in between</a>
 </li>
 <li>
-<p>
-To communicate that the above intrinsics are available, the compiler must
-arrange for <tt>__has_feature</tt> to return 1 when fed the intrinsic name
-appended with an '_' and the mangled type name of <tt>__atomic_flag__</tt>.
-</p>
-<p>
-For example if <tt>__atomic_flag__</tt> is <tt>unsigned int</tt>:
-</p>
-<blockquote><pre>
-__has_feature(__atomic_flag) == 1
-__has_feature(__atomic_exchange_seq_cst_j) == 1
-__has_feature(__atomic_store_seq_cst_j) == 1
-
-typedef unsigned int __atomic_flag__; 
-
-unsigned int __atomic_exchange_seq_cst(unsigned int volatile*, unsigned int)
-{
-   // ...
-}
-
-void __atomic_store_seq_cst(unsigned int volatile*, unsigned int)
-{
-   // ...
-}
-</pre></blockquote>
+<a href="atomic_design_c.html">Minimal work for the front end</a>
 </li>
-</ul></blockquote>
-
-<p>
-That's it!  Compiler writers do the above and you've got a fully conforming
-(though sub-par performance) <tt>&lt;atomic&gt;</tt> header!
-</p>
-
-<h2>Recommended work for a higher performance <tt>&lt;atomic&gt;</tt></h2>
-
-<p>
-It would be good if the above intrinsics worked with all integral types plus
-<tt>void*</tt>.  Because this may not be possible to do in a lock-free manner for
-all integral types on all platforms, a compiler must communicate each type that
-an intrinsic works with.  For example if <tt>__atomic_exchange_seq_cst</tt> works
-for all types except for <tt>long long</tt> and <tt>unsigned long long</tt>
-then:
-</p>
-
-<blockquote><pre>
-__has_feature(__atomic_exchange_seq_cst_b) == 1  // bool
-__has_feature(__atomic_exchange_seq_cst_c) == 1  // char
-__has_feature(__atomic_exchange_seq_cst_a) == 1  // signed char
-__has_feature(__atomic_exchange_seq_cst_h) == 1  // unsigned char
-__has_feature(__atomic_exchange_seq_cst_Ds) == 1 // char16_t
-__has_feature(__atomic_exchange_seq_cst_Di) == 1 // char32_t
-__has_feature(__atomic_exchange_seq_cst_w) == 1  // wchar_t
-__has_feature(__atomic_exchange_seq_cst_s) == 1  // short
-__has_feature(__atomic_exchange_seq_cst_t) == 1  // unsigned short
-__has_feature(__atomic_exchange_seq_cst_i) == 1  // int
-__has_feature(__atomic_exchange_seq_cst_j) == 1  // unsigned int
-__has_feature(__atomic_exchange_seq_cst_l) == 1  // long
-__has_feature(__atomic_exchange_seq_cst_m) == 1  // unsigned long
-__has_feature(__atomic_exchange_seq_cst_Pv) == 1 // void*
-</pre></blockquote>
-
-<p>
-Note that only the <tt>__has_feature</tt> flag is decorated with the argument
-type.  The name of the compiler intrinsic is not decorated, but instead works
-like a C++ overloaded function.
-</p>
-
-<p>
-Additionally there are other intrinsics besides
-<tt>__atomic_exchange_seq_cst</tt> and <tt>__atomic_store_seq_cst</tt>.  They
-are optional.  But if the compiler can generate faster code than provided by the
-library, then clients will benefit from the compiler writer's expertise and
-knowledge of the targeted platform.
-</p>
-
-<p>
-Below is the complete list of <i>sequentially consistent</i> intrinsics, and
-their library implementations.  Template syntax is used to indicate the desired
-overloading for integral and void* types.  The template does not represent a
-requirement that the intrinsic operate on <em>any</em> type!
-</p>
-
-<blockquote><pre>
-T is one of:  bool, char, signed char, unsigned char, short, unsigned short,
-              int, unsigned int, long, unsigned long,
-              long long, unsigned long long, char16_t, char32_t, wchar_t, void*
-
-template &lt;class T&gt;
-T
-__atomic_load_seq_cst(T const volatile* obj)
-{
-    unique_lock&lt;mutex&gt; _(some_mutex);
-    return *obj;
-}
-
-template &lt;class T&gt;
-void
-__atomic_store_seq_cst(T volatile* obj, T desr)
-{
-    unique_lock&lt;mutex&gt; _(some_mutex);
-    *obj = desr;
-}
-
-template &lt;class T&gt;
-T
-__atomic_exchange_seq_cst(T volatile* obj, T desr)
-{
-    unique_lock&lt;mutex&gt; _(some_mutex);
-    T r = *obj;
-    *obj = desr;
-    return r;
-}
-
-template &lt;class T&gt;
-bool
-__atomic_compare_exchange_strong_seq_cst_seq_cst(T volatile* obj, T* exp, T desr)
-{
-    unique_lock&lt;mutex&gt; _(some_mutex);
-    if (std::memcmp(const_cast&lt;T*&gt;(obj), exp, sizeof(T)) == 0)
-    {
-        std::memcpy(const_cast&lt;T*&gt;(obj), &amp;desr, sizeof(T));
-        return true;
-    }
-    std::memcpy(exp, const_cast&lt;T*&gt;(obj), sizeof(T));
-    return false;
-}
-
-template &lt;class T&gt;
-bool
-__atomic_compare_exchange_weak_seq_cst_seq_cst(T volatile* obj, T* exp, T desr)
-{
-    unique_lock&lt;mutex&gt; _(some_mutex);
-    if (std::memcmp(const_cast&lt;T*&gt;(obj), exp, sizeof(T)) == 0)
-    {
-        std::memcpy(const_cast&lt;T*&gt;(obj), &amp;desr, sizeof(T));
-        return true;
-    }
-    std::memcpy(exp, const_cast&lt;T*&gt;(obj), sizeof(T));
-    return false;
-}
-
-T is one of:  char, signed char, unsigned char, short, unsigned short,
-              int, unsigned int, long, unsigned long,
-              long long, unsigned long long, char16_t, char32_t, wchar_t
-
-template &lt;class T&gt;
-T
-__atomic_fetch_add_seq_cst(T volatile* obj, T operand)
-{
-    unique_lock&lt;mutex&gt; _(some_mutex);
-    T r = *obj;
-    *obj += operand;
-    return r;
-}
-
-template &lt;class T&gt;
-T
-__atomic_fetch_sub_seq_cst(T volatile* obj, T operand)
-{
-    unique_lock&lt;mutex&gt; _(some_mutex);
-    T r = *obj;
-    *obj -= operand;
-    return r;
-}
-
-template &lt;class T&gt;
-T
-__atomic_fetch_and_seq_cst(T volatile* obj, T operand)
-{
-    unique_lock&lt;mutex&gt; _(some_mutex);
-    T r = *obj;
-    *obj &amp;= operand;
-    return r;
-}
-
-template &lt;class T&gt;
-T
-__atomic_fetch_or_seq_cst(T volatile* obj, T operand)
-{
-    unique_lock&lt;mutex&gt; _(some_mutex);
-    T r = *obj;
-    *obj |= operand;
-    return r;
-}
-
-template &lt;class T&gt;
-T
-__atomic_fetch_xor_seq_cst(T volatile* obj, T operand)
-{
-    unique_lock&lt;mutex&gt; _(some_mutex);
-    T r = *obj;
-    *obj ^= operand;
-    return r;
-}
-
-void*
-__atomic_fetch_add_seq_cst(void* volatile* obj, ptrdiff_t operand)
-{
-    unique_lock&lt;mutex&gt; _(some_mutex);
-    void* r = *obj;
-    (char*&amp;)(*obj) += operand;
-    return r;
-}
-
-void*
-__atomic_fetch_sub_seq_cst(void* volatile* obj, ptrdiff_t operand)
-{
-    unique_lock&lt;mutex&gt; _(some_mutex);
-    void* r = *obj;
-    (char*&amp;)(*obj) -= operand;
-    return r;
-}
-
-void __atomic_thread_fence_seq_cst()
-{
-    unique_lock&lt;mutex&gt; _(some_mutex);
-}
-
-void __atomic_signal_fence_seq_cst()
-{
-    unique_lock&lt;mutex&gt; _(some_mutex);
-}
-</pre></blockquote>
-
-<p>
-One should consult the (currently draft)
-<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3126.pdf">C++ standard</a>
-for the details of the definitions for these operations.  For example
-<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> is allowed to fail
-spuriously while <tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> is
-not.
-</p>
-
-<p>
-If on your platform the lock-free definition of
-<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> would be the same as
-<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt>, you may omit the
-<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> intrinsic without a
-performance cost.  The library will prefer your implementation of
-<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> over its own
-definition for implementing
-<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt>.  That is, the library
-will arrange for <tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> to call
-<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> if you supply an
-intrinsic for the strong version but not the weak.
-</p>
-
-<h2>Taking advantage of weaker memory synchronization</h2>
-
-<p>
-So far all of the intrinsics presented require a <em>sequentially
-consistent</em> memory ordering.  That is, no loads or stores can move across
-the operation (just as if the library had locked that internal mutex).  But
-<tt>&lt;atomic&gt;</tt> supports weaker memory ordering operations.  In all,
-there are six memory orderings (listed here from strongest to weakest):
-</p>
-
-<blockquote><pre>
-memory_order_seq_cst
-memory_order_acq_rel
-memory_order_release
-memory_order_acquire
-memory_order_consume
-memory_order_relaxed
-</pre></blockquote>
-
-<p>
-(See the
-<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3126.pdf">C++ standard</a>
-for the detailed definitions of each of these orderings).
-</p>
-
-<p>
-On some platforms, the compiler vendor can offer some or even all of the above
-intrinsics at one or more weaker levels of memory synchronization.  This might
-lead for example to not issuing an <tt>mfense</tt> instruction on the x86.
-</p>
-
-<p>
-If the compiler does not offer any given operation, at any given memory ordering
-level, the library will automatically attempt to call the next highest memory
-ordering operation.  This continues up to <tt>seq_cst</tt>, and if that doesn't
-exist, then the library takes over and does the job with a <tt>mutex</tt>.  This
-is a compile-time search &amp; selection operation.  At run time, the
-application will only see the few inlined assembly instructions for the selected
-intrinsic.
-</p>
-
-<p>
-Each intrinsic is appended with the 7-letter name of the memory ordering it
-addresses.  For example a <tt>load</tt> with <tt>relaxed</tt> ordering is
-defined by:
-</p>
-
-<blockquote><pre>
-T __atomic_load_relaxed(const volatile T* obj);
-</pre></blockquote>
-
-<p>
-And announced with:
-</p>
-
-<blockquote><pre>
-__has_feature(__atomic_load_relaxed_b) == 1  // bool
-__has_feature(__atomic_load_relaxed_c) == 1  // char
-__has_feature(__atomic_load_relaxed_a) == 1  // signed char
-...
-</pre></blockquote>
-
-<p>
-The <tt>__atomic_compare_exchange_strong(weak)</tt> intrinsics are parameterized
-on two memory orderings.  The first ordering applies when the operation returns
-<tt>true</tt> and the second ordering applies when the operation returns
-<tt>false</tt>.
-</p>
-
-<p>
-Not every memory ordering is appropriate for every operation.  <tt>exchange</tt>
-and the <tt>fetch_<i>op</i></tt> operations support all 6.  But <tt>load</tt>
-only supports <tt>relaxed</tt>, <tt>consume</tt>, <tt>acquire</tt> and <tt>seq_cst</tt>.
-<tt>store</tt>
-only supports <tt>relaxed</tt>, <tt>release</tt>, and <tt>seq_cst</tt>.  The
-<tt>compare_exchange</tt> operations support the following 16 combinations out
-of the possible 36:
-</p>
-
-<blockquote><pre>
-relaxed_relaxed
-consume_relaxed
-consume_consume
-acquire_relaxed
-acquire_consume
-acquire_acquire
-release_relaxed
-release_consume
-release_acquire
-acq_rel_relaxed
-acq_rel_consume
-acq_rel_acquire
-seq_cst_relaxed
-seq_cst_consume
-seq_cst_acquire
-seq_cst_seq_cst
-</pre></blockquote>
-
-<p>
-Again, the compiler supplies intrinsics only for the strongest orderings where
-it can make a difference.  The library takes care of calling the weakest
-supplied intrinsic that is as strong or stronger than the customer asked for.
-</p>
+</ol>

 </div>
 </body>
--- a/www/atomic_design_a.html
+++ b/www/atomic_design_a.html
@ -0,0 +1,126 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
+          "http://www.w3.org/TR/html4/strict.dtd">
+<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
+<html>
+<head>
+  <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
+  <title>&lt;atomic&gt; design</title>
+  <link type="text/css" rel="stylesheet" href="menu.css">
+  <link type="text/css" rel="stylesheet" href="content.css">
+</head>
+
+<body>
+<div id="menu">
+  <div>
+    <a href="http://llvm.org/">LLVM Home</a>
+  </div>
+
+  <div class="submenu">
+    <label>libc++ Info</label>
+    <a href="/index.html">About</a>
+  </div>
+
+  <div class="submenu">
+    <label>Quick Links</label>
+    <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev">cfe-dev</a>
+    <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits">cfe-commits</a>
+    <a href="http://llvm.org/bugs/">Bug Reports</a>
+    <a href="http://llvm.org/svn/llvm-project/libcxx/trunk/">Browse SVN</a>
+    <a href="http://llvm.org/viewvc/llvm-project/libcxx/trunk/">Browse ViewVC</a>
+  </div>
+</div>
+
+<div id="content">
+  <!--*********************************************************************-->
+  <h1>&lt;atomic&gt; design</h1>
+  <!--*********************************************************************-->
+
+<p>
+This is more of a synopsis than a true description.  The compiler supplies all
+of the intrinsics as described below.  This list of intrinsics roughly parallels
+the requirements of the C and C++ atomics proposals.  The C and C++ library
+imlpementations simply drop through to these intrinsics.  Anything the platform
+does not support in hardware, the compiler arranges for a (compiler-rt) library
+call to be made which will do the job with a mutex, and in this case ignoring
+the memory ordering parameter.
+</p>
+
+<blockquote><pre>
+<font color="#C80000">// type can be any pod</font>
+<font color="#C80000">// Behavior is defined for mem_ord = 0, 1, 2, 5</font>
+type __atomic_load(const volatile type* atomic_obj, int mem_ord);
+
+<font color="#C80000">// type can be any pod</font>
+<font color="#C80000">// Behavior is defined for mem_ord = 0, 3, 5</font>
+type __atomic_store(volatile type* atomic_obj, type desired, int mem_ord);
+
+<font color="#C80000">// type can be any pod</font>
+<font color="#C80000">// Behavior is defined for mem_ord = [0 ... 5]</font>
+type __atomic_exchange(volatile type* atomic_obj, type desired, int mem_ord);
+
+<font color="#C80000">// type can be any pod</font>
+<font color="#C80000">// Behavior is defined for mem_success = [0 ... 5],</font>
+<font color="#C80000">//   mem_falure &lt;= mem_success &amp;&amp; mem_failure != [3, 4]</font>
+bool __atomic_compare_exchange_strong(volatile type* atomic_obj,
+                                      type* expected, type desired,
+                                      int mem_success, int mem_failure);
+
+<font color="#C80000">// type can be any pod</font>
+<font color="#C80000">// Behavior is defined for mem_success = [0 ... 5],</font>
+<font color="#C80000">//   mem_falure &lt;= mem_success &amp;&amp; mem_failure != [3, 4]</font>
+bool __atomic_compare_exchange_weak(volatile type* atomic_obj,
+                                    type* expected, type desired,
+                                    int mem_success, int mem_failure);
+
+<font color="#C80000">// type is one of: char, signed char, unsigned char, short, unsigned short, int,</font>
+<font color="#C80000">//      unsigned int, long, unsigned long, long long, unsigned long long,</font>
+<font color="#C80000">//      char16_t, char32_t, wchar_t</font>
+<font color="#C80000">// Behavior is defined for mem_ord = [0 ... 5]</font>
+type __atomic_fetch_add(volatile type* atomic_obj, type operand, int mem_ord);
+
+<font color="#C80000">// type is one of: char, signed char, unsigned char, short, unsigned short, int,</font>
+<font color="#C80000">//      unsigned int, long, unsigned long, long long, unsigned long long,</font>
+<font color="#C80000">//      char16_t, char32_t, wchar_t</font>
+<font color="#C80000">// Behavior is defined for mem_ord = [0 ... 5]</font>
+type __atomic_fetch_sub(volatile type* atomic_obj, type operand, int mem_ord);
+
+<font color="#C80000">// type is one of: char, signed char, unsigned char, short, unsigned short, int,</font>
+<font color="#C80000">//      unsigned int, long, unsigned long, long long, unsigned long long,</font>
+<font color="#C80000">//      char16_t, char32_t, wchar_t</font>
+<font color="#C80000">// Behavior is defined for mem_ord = [0 ... 5]</font>
+type __atomic_fetch_and(volatile type* atomic_obj, type operand, int mem_ord);
+
+<font color="#C80000">// type is one of: char, signed char, unsigned char, short, unsigned short, int,</font>
+<font color="#C80000">//      unsigned int, long, unsigned long, long long, unsigned long long,</font>
+<font color="#C80000">//      char16_t, char32_t, wchar_t</font>
+<font color="#C80000">// Behavior is defined for mem_ord = [0 ... 5]</font>
+type __atomic_fetch_or(volatile type* atomic_obj, type operand, int mem_ord);
+
+<font color="#C80000">// type is one of: char, signed char, unsigned char, short, unsigned short, int,</font>
+<font color="#C80000">//      unsigned int, long, unsigned long, long long, unsigned long long,</font>
+<font color="#C80000">//      char16_t, char32_t, wchar_t</font>
+<font color="#C80000">// Behavior is defined for mem_ord = [0 ... 5]</font>
+type __atomic_fetch_xor(volatile type* atomic_obj, type operand, int mem_ord);
+
+<font color="#C80000">// Behavior is defined for mem_ord = [0 ... 5]</font>
+void* __atomic_fetch_add(void* volatile* atomic_obj, ptrdiff_t operand, int mem_ord);
+void* __atomic_fetch_sub(void* volatile* atomic_obj, ptrdiff_t operand, int mem_ord);
+
+<font color="#C80000">// Behavior is defined for mem_ord = [0 ... 5]</font>
+void __atomic_thread_fence(int mem_ord);
+void __atomic_signal_fence(int mem_ord);
+</pre></blockquote>
+
+<p>
+If desired the intrinsics taking a single <tt>mem_ord</tt> parameter can default
+this argument to 5.
+</p>
+
+<p>
+If desired the intrinsics taking two ordering parameters can default
+<tt>mem_success</tt> to 5, and <tt>mem_failure</tt> to <tt>mem_success</tt>.
+</p>
+
+</div>
+</body>
+</html>
--- a/www/atomic_design_b.html
+++ b/www/atomic_design_b.html
@ -0,0 +1,247 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
+          "http://www.w3.org/TR/html4/strict.dtd">
+<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
+<html>
+<head>
+  <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
+  <title>&lt;atomic&gt; design</title>
+  <link type="text/css" rel="stylesheet" href="menu.css">
+  <link type="text/css" rel="stylesheet" href="content.css">
+</head>
+
+<body>
+<div id="menu">
+  <div>
+    <a href="http://llvm.org/">LLVM Home</a>
+  </div>
+
+  <div class="submenu">
+    <label>libc++ Info</label>
+    <a href="/index.html">About</a>
+  </div>
+
+  <div class="submenu">
+    <label>Quick Links</label>
+    <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev">cfe-dev</a>
+    <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits">cfe-commits</a>
+    <a href="http://llvm.org/bugs/">Bug Reports</a>
+    <a href="http://llvm.org/svn/llvm-project/libcxx/trunk/">Browse SVN</a>
+    <a href="http://llvm.org/viewvc/llvm-project/libcxx/trunk/">Browse ViewVC</a>
+  </div>
+</div>
+
+<div id="content">
+  <!--*********************************************************************-->
+  <h1>&lt;atomic&gt; design</h1>
+  <!--*********************************************************************-->
+
+<p>
+This is a variation of design A which puts the burden on the library to arrange
+for the correct manipulation of the run time memory ordering arguments, and only
+calls the compiler for well-defined memory orderings.  I think of this design as
+the worst of A and C, instead of the best of A and C.  But I offer it as an
+option in the spirit of completeness.
+</p>
+
+<blockquote><pre>
+<font color="#C80000">// type can be any pod</font>
+type __atomic_load_relaxed(const volatile type* atomic_obj);
+type __atomic_load_consume(const volatile type* atomic_obj);
+type __atomic_load_acquire(const volatile type* atomic_obj);
+type __atomic_load_seq_cst(const volatile type* atomic_obj);
+
+<font color="#C80000">// type can be any pod</font>
+type __atomic_store_relaxed(volatile type* atomic_obj, type desired);
+type __atomic_store_release(volatile type* atomic_obj, type desired);
+type __atomic_store_seq_cst(volatile type* atomic_obj, type desired);
+
+<font color="#C80000">// type can be any pod</font>
+type __atomic_exchange_relaxed(volatile type* atomic_obj, type desired);
+type __atomic_exchange_consume(volatile type* atomic_obj, type desired);
+type __atomic_exchange_acquire(volatile type* atomic_obj, type desired);
+type __atomic_exchange_release(volatile type* atomic_obj, type desired);
+type __atomic_exchange_acq_rel(volatile type* atomic_obj, type desired);
+type __atomic_exchange_seq_cst(volatile type* atomic_obj, type desired);
+
+<font color="#C80000">// type can be any pod</font>
+bool __atomic_compare_exchange_strong_relaxed_relaxed(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_consume_relaxed(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_consume_consume(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_acquire_relaxed(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_acquire_consume(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_acquire_acquire(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_release_relaxed(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_release_consume(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_release_acquire(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_acq_rel_relaxed(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_acq_rel_consume(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_acq_rel_acquire(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_seq_cst_relaxed(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_seq_cst_consume(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_seq_cst_acquire(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_seq_cst_seq_cst(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+
+<font color="#C80000">// type can be any pod</font>
+bool __atomic_compare_exchange_weak_relaxed_relaxed(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_consume_relaxed(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_consume_consume(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_acquire_relaxed(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_acquire_consume(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_acquire_acquire(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_release_relaxed(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_release_consume(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_release_acquire(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_acq_rel_relaxed(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_acq_rel_consume(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_acq_rel_acquire(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_seq_cst_relaxed(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_seq_cst_consume(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_seq_cst_acquire(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_seq_cst_seq_cst(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+
+<font color="#C80000">// type is one of: char, signed char, unsigned char, short, unsigned short, int,</font>
+<font color="#C80000">//      unsigned int, long, unsigned long, long long, unsigned long long,</font>
+<font color="#C80000">//      char16_t, char32_t, wchar_t</font>
+type __atomic_fetch_add_relaxed(volatile type* atomic_obj, type operand);
+type __atomic_fetch_add_consume(volatile type* atomic_obj, type operand);
+type __atomic_fetch_add_acquire(volatile type* atomic_obj, type operand);
+type __atomic_fetch_add_release(volatile type* atomic_obj, type operand);
+type __atomic_fetch_add_acq_rel(volatile type* atomic_obj, type operand);
+type __atomic_fetch_add_seq_cst(volatile type* atomic_obj, type operand);
+
+<font color="#C80000">// type is one of: char, signed char, unsigned char, short, unsigned short, int,</font>
+<font color="#C80000">//      unsigned int, long, unsigned long, long long, unsigned long long,</font>
+<font color="#C80000">//      char16_t, char32_t, wchar_t</font>
+type __atomic_fetch_sub_relaxed(volatile type* atomic_obj, type operand);
+type __atomic_fetch_sub_consume(volatile type* atomic_obj, type operand);
+type __atomic_fetch_sub_acquire(volatile type* atomic_obj, type operand);
+type __atomic_fetch_sub_release(volatile type* atomic_obj, type operand);
+type __atomic_fetch_sub_acq_rel(volatile type* atomic_obj, type operand);
+type __atomic_fetch_sub_seq_cst(volatile type* atomic_obj, type operand);
+
+<font color="#C80000">// type is one of: char, signed char, unsigned char, short, unsigned short, int,</font>
+<font color="#C80000">//      unsigned int, long, unsigned long, long long, unsigned long long,</font>
+<font color="#C80000">//      char16_t, char32_t, wchar_t</font>
+type __atomic_fetch_and_relaxed(volatile type* atomic_obj, type operand);
+type __atomic_fetch_and_consume(volatile type* atomic_obj, type operand);
+type __atomic_fetch_and_acquire(volatile type* atomic_obj, type operand);
+type __atomic_fetch_and_release(volatile type* atomic_obj, type operand);
+type __atomic_fetch_and_acq_rel(volatile type* atomic_obj, type operand);
+type __atomic_fetch_and_seq_cst(volatile type* atomic_obj, type operand);
+
+<font color="#C80000">// type is one of: char, signed char, unsigned char, short, unsigned short, int,</font>
+<font color="#C80000">//      unsigned int, long, unsigned long, long long, unsigned long long,</font>
+<font color="#C80000">//      char16_t, char32_t, wchar_t</font>
+type __atomic_fetch_or_relaxed(volatile type* atomic_obj, type operand);
+type __atomic_fetch_or_consume(volatile type* atomic_obj, type operand);
+type __atomic_fetch_or_acquire(volatile type* atomic_obj, type operand);
+type __atomic_fetch_or_release(volatile type* atomic_obj, type operand);
+type __atomic_fetch_or_acq_rel(volatile type* atomic_obj, type operand);
+type __atomic_fetch_or_seq_cst(volatile type* atomic_obj, type operand);
+
+<font color="#C80000">// type is one of: char, signed char, unsigned char, short, unsigned short, int,</font>
+<font color="#C80000">//      unsigned int, long, unsigned long, long long, unsigned long long,</font>
+<font color="#C80000">//      char16_t, char32_t, wchar_t</font>
+type __atomic_fetch_xor_relaxed(volatile type* atomic_obj, type operand);
+type __atomic_fetch_xor_consume(volatile type* atomic_obj, type operand);
+type __atomic_fetch_xor_acquire(volatile type* atomic_obj, type operand);
+type __atomic_fetch_xor_release(volatile type* atomic_obj, type operand);
+type __atomic_fetch_xor_acq_rel(volatile type* atomic_obj, type operand);
+type __atomic_fetch_xor_seq_cst(volatile type* atomic_obj, type operand);
+
+void* __atomic_fetch_add_relaxed(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_add_consume(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_add_acquire(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_add_release(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_add_acq_rel(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_add_seq_cst(void* volatile* atomic_obj, ptrdiff_t operand);
+
+void* __atomic_fetch_sub_relaxed(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_sub_consume(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_sub_acquire(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_sub_release(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_sub_acq_rel(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_sub_seq_cst(void* volatile* atomic_obj, ptrdiff_t operand);
+
+void __atomic_thread_fence_relaxed();
+void __atomic_thread_fence_consume();
+void __atomic_thread_fence_acquire();
+void __atomic_thread_fence_release();
+void __atomic_thread_fence_acq_rel();
+void __atomic_thread_fence_seq_cst();
+
+void __atomic_signal_fence_relaxed();
+void __atomic_signal_fence_consume();
+void __atomic_signal_fence_acquire();
+void __atomic_signal_fence_release();
+void __atomic_signal_fence_acq_rel();
+void __atomic_signal_fence_seq_cst();
+</pre></blockquote>
+
+</div>
+</body>
+</html>
--- a/www/atomic_design_c.html
+++ b/www/atomic_design_c.html
@ -0,0 +1,458 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
+          "http://www.w3.org/TR/html4/strict.dtd">
+<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
+<html>
+<head>
+  <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
+  <title>&lt;atomic&gt; design</title>
+  <link type="text/css" rel="stylesheet" href="menu.css">
+  <link type="text/css" rel="stylesheet" href="content.css">
+</head>
+
+<body>
+<div id="menu">
+  <div>
+    <a href="http://llvm.org/">LLVM Home</a>
+  </div>
+
+  <div class="submenu">
+    <label>libc++ Info</label>
+    <a href="/index.html">About</a>
+  </div>
+
+  <div class="submenu">
+    <label>Quick Links</label>
+    <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev">cfe-dev</a>
+    <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits">cfe-commits</a>
+    <a href="http://llvm.org/bugs/">Bug Reports</a>
+    <a href="http://llvm.org/svn/llvm-project/libcxx/trunk/">Browse SVN</a>
+    <a href="http://llvm.org/viewvc/llvm-project/libcxx/trunk/">Browse ViewVC</a>
+  </div>
+</div>
+
+<div id="content">
+  <!--*********************************************************************-->
+  <h1>&lt;atomic&gt; design</h1>
+  <!--*********************************************************************-->
+
+<p>
+The <tt>&lt;atomic&gt;</tt> header is one of the most closely coupled headers to
+the compiler.  Ideally when you invoke any function from
+<tt>&lt;atomic&gt;</tt>, it should result in highly optimized assembly being
+inserted directly into your application ...  assembly that is not otherwise
+representable by higher level C or C++ expressions.  The design of the libc++
+<tt>&lt;atomic&gt;</tt> header started with this goal in mind.  A secondary, but
+still very important goal is that the compiler should have to do minimal work to
+faciliate the implementaiton of <tt>&lt;atomic&gt;</tt>.  Without this second
+goal, then practically speaking, the libc++ <tt>&lt;atomic&gt;</tt> header would
+be doomed to be a barely supported, second class citizen on almost every
+platform.
+</p>
+
+<p>Goals:</p>
+
+<blockquote><ul>
+<li>Optimal code generation for atomic operations</li>
+<li>Minimal effort for the compiler to achieve goal 1 on any given platform</li>
+<li>Conformance to the C++0X draft standard</li>
+</ul></blockquote>
+
+<p>
+The purpose of this document is to inform compiler writers what they need to do
+to enable a high performance libc++ <tt>&lt;atomic&gt;</tt> with minimal effort.
+</p>
+
+<h2>The minimal work that must be done for a conforming <tt>&lt;atomic&gt;</tt></h2>
+
+<p>
+The only "atomic" operations that must actually be lock free in
+<tt>&lt;atomic&gt;</tt> are represented by the following compiler intrinsics:
+</p>
+
+<blockquote><pre>
+__atomic_flag__
+__atomic_exchange_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr)
+{
+    unique_lock&lt;mutex&gt; _(some_mutex);
+    __atomic_flag__ result = *obj;
+    *obj = desr;
+    return result;
+}
+
+void
+__atomic_store_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr)
+{
+    unique_lock&lt;mutex&gt; _(some_mutex);
+    *obj = desr;
+}
+</pre></blockquote>
+
+<p>
+Where:
+</p>
+
+<blockquote><ul>
+<li>
+If <tt>__has_feature(__atomic_flag)</tt> evaluates to 1 in the preprocessor then
+the compiler must define <tt>__atomic_flag__</tt> (e.g. as a typedef to
+<tt>int</tt>).
+</li>
+<li>
+If <tt>__has_feature(__atomic_flag)</tt> evaluates to 0 in the preprocessor then
+the library defines <tt>__atomic_flag__</tt> as a typedef to <tt>bool</tt>.
+</li>
+<li>
+<p>
+To communicate that the above intrinsics are available, the compiler must
+arrange for <tt>__has_feature</tt> to return 1 when fed the intrinsic name
+appended with an '_' and the mangled type name of <tt>__atomic_flag__</tt>.
+</p>
+<p>
+For example if <tt>__atomic_flag__</tt> is <tt>unsigned int</tt>:
+</p>
+<blockquote><pre>
+__has_feature(__atomic_flag) == 1
+__has_feature(__atomic_exchange_seq_cst_j) == 1
+__has_feature(__atomic_store_seq_cst_j) == 1
+
+typedef unsigned int __atomic_flag__; 
+
+unsigned int __atomic_exchange_seq_cst(unsigned int volatile*, unsigned int)
+{
+   // ...
+}
+
+void __atomic_store_seq_cst(unsigned int volatile*, unsigned int)
+{
+   // ...
+}
+</pre></blockquote>
+</li>
+</ul></blockquote>
+
+<p>
+That's it!  Compiler writers do the above and you've got a fully conforming
+(though sub-par performance) <tt>&lt;atomic&gt;</tt> header!
+</p>
+
+<h2>Recommended work for a higher performance <tt>&lt;atomic&gt;</tt></h2>
+
+<p>
+It would be good if the above intrinsics worked with all integral types plus
+<tt>void*</tt>.  Because this may not be possible to do in a lock-free manner for
+all integral types on all platforms, a compiler must communicate each type that
+an intrinsic works with.  For example if <tt>__atomic_exchange_seq_cst</tt> works
+for all types except for <tt>long long</tt> and <tt>unsigned long long</tt>
+then:
+</p>
+
+<blockquote><pre>
+__has_feature(__atomic_exchange_seq_cst_b) == 1  // bool
+__has_feature(__atomic_exchange_seq_cst_c) == 1  // char
+__has_feature(__atomic_exchange_seq_cst_a) == 1  // signed char
+__has_feature(__atomic_exchange_seq_cst_h) == 1  // unsigned char
+__has_feature(__atomic_exchange_seq_cst_Ds) == 1 // char16_t
+__has_feature(__atomic_exchange_seq_cst_Di) == 1 // char32_t
+__has_feature(__atomic_exchange_seq_cst_w) == 1  // wchar_t
+__has_feature(__atomic_exchange_seq_cst_s) == 1  // short
+__has_feature(__atomic_exchange_seq_cst_t) == 1  // unsigned short
+__has_feature(__atomic_exchange_seq_cst_i) == 1  // int
+__has_feature(__atomic_exchange_seq_cst_j) == 1  // unsigned int
+__has_feature(__atomic_exchange_seq_cst_l) == 1  // long
+__has_feature(__atomic_exchange_seq_cst_m) == 1  // unsigned long
+__has_feature(__atomic_exchange_seq_cst_Pv) == 1 // void*
+</pre></blockquote>
+
+<p>
+Note that only the <tt>__has_feature</tt> flag is decorated with the argument
+type.  The name of the compiler intrinsic is not decorated, but instead works
+like a C++ overloaded function.
+</p>
+
+<p>
+Additionally there are other intrinsics besides
+<tt>__atomic_exchange_seq_cst</tt> and <tt>__atomic_store_seq_cst</tt>.  They
+are optional.  But if the compiler can generate faster code than provided by the
+library, then clients will benefit from the compiler writer's expertise and
+knowledge of the targeted platform.
+</p>
+
+<p>
+Below is the complete list of <i>sequentially consistent</i> intrinsics, and
+their library implementations.  Template syntax is used to indicate the desired
+overloading for integral and void* types.  The template does not represent a
+requirement that the intrinsic operate on <em>any</em> type!
+</p>
+
+<blockquote><pre>
+T is one of:  bool, char, signed char, unsigned char, short, unsigned short,
+              int, unsigned int, long, unsigned long,
+              long long, unsigned long long, char16_t, char32_t, wchar_t, void*
+
+template &lt;class T&gt;
+T
+__atomic_load_seq_cst(T const volatile* obj)
+{
+    unique_lock&lt;mutex&gt; _(some_mutex);
+    return *obj;
+}
+
+template &lt;class T&gt;
+void
+__atomic_store_seq_cst(T volatile* obj, T desr)
+{
+    unique_lock&lt;mutex&gt; _(some_mutex);
+    *obj = desr;
+}
+
+template &lt;class T&gt;
+T
+__atomic_exchange_seq_cst(T volatile* obj, T desr)
+{
+    unique_lock&lt;mutex&gt; _(some_mutex);
+    T r = *obj;
+    *obj = desr;
+    return r;
+}
+
+template &lt;class T&gt;
+bool
+__atomic_compare_exchange_strong_seq_cst_seq_cst(T volatile* obj, T* exp, T desr)
+{
+    unique_lock&lt;mutex&gt; _(some_mutex);
+    if (std::memcmp(const_cast&lt;T*&gt;(obj), exp, sizeof(T)) == 0)
+    {
+        std::memcpy(const_cast&lt;T*&gt;(obj), &amp;desr, sizeof(T));
+        return true;
+    }
+    std::memcpy(exp, const_cast&lt;T*&gt;(obj), sizeof(T));
+    return false;
+}
+
+template &lt;class T&gt;
+bool
+__atomic_compare_exchange_weak_seq_cst_seq_cst(T volatile* obj, T* exp, T desr)
+{
+    unique_lock&lt;mutex&gt; _(some_mutex);
+    if (std::memcmp(const_cast&lt;T*&gt;(obj), exp, sizeof(T)) == 0)
+    {
+        std::memcpy(const_cast&lt;T*&gt;(obj), &amp;desr, sizeof(T));
+        return true;
+    }
+    std::memcpy(exp, const_cast&lt;T*&gt;(obj), sizeof(T));
+    return false;
+}
+
+T is one of:  char, signed char, unsigned char, short, unsigned short,
+              int, unsigned int, long, unsigned long,
+              long long, unsigned long long, char16_t, char32_t, wchar_t
+
+template &lt;class T&gt;
+T
+__atomic_fetch_add_seq_cst(T volatile* obj, T operand)
+{
+    unique_lock&lt;mutex&gt; _(some_mutex);
+    T r = *obj;
+    *obj += operand;
+    return r;
+}
+
+template &lt;class T&gt;
+T
+__atomic_fetch_sub_seq_cst(T volatile* obj, T operand)
+{
+    unique_lock&lt;mutex&gt; _(some_mutex);
+    T r = *obj;
+    *obj -= operand;
+    return r;
+}
+
+template &lt;class T&gt;
+T
+__atomic_fetch_and_seq_cst(T volatile* obj, T operand)
+{
+    unique_lock&lt;mutex&gt; _(some_mutex);
+    T r = *obj;
+    *obj &amp;= operand;
+    return r;
+}
+
+template &lt;class T&gt;
+T
+__atomic_fetch_or_seq_cst(T volatile* obj, T operand)
+{
+    unique_lock&lt;mutex&gt; _(some_mutex);
+    T r = *obj;
+    *obj |= operand;
+    return r;
+}
+
+template &lt;class T&gt;
+T
+__atomic_fetch_xor_seq_cst(T volatile* obj, T operand)
+{
+    unique_lock&lt;mutex&gt; _(some_mutex);
+    T r = *obj;
+    *obj ^= operand;
+    return r;
+}
+
+void*
+__atomic_fetch_add_seq_cst(void* volatile* obj, ptrdiff_t operand)
+{
+    unique_lock&lt;mutex&gt; _(some_mutex);
+    void* r = *obj;
+    (char*&amp;)(*obj) += operand;
+    return r;
+}
+
+void*
+__atomic_fetch_sub_seq_cst(void* volatile* obj, ptrdiff_t operand)
+{
+    unique_lock&lt;mutex&gt; _(some_mutex);
+    void* r = *obj;
+    (char*&amp;)(*obj) -= operand;
+    return r;
+}
+
+void __atomic_thread_fence_seq_cst()
+{
+    unique_lock&lt;mutex&gt; _(some_mutex);
+}
+
+void __atomic_signal_fence_seq_cst()
+{
+    unique_lock&lt;mutex&gt; _(some_mutex);
+}
+</pre></blockquote>
+
+<p>
+One should consult the (currently draft)
+<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3126.pdf">C++ standard</a>
+for the details of the definitions for these operations.  For example
+<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> is allowed to fail
+spuriously while <tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> is
+not.
+</p>
+
+<p>
+If on your platform the lock-free definition of
+<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> would be the same as
+<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt>, you may omit the
+<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> intrinsic without a
+performance cost.  The library will prefer your implementation of
+<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> over its own
+definition for implementing
+<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt>.  That is, the library
+will arrange for <tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> to call
+<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> if you supply an
+intrinsic for the strong version but not the weak.
+</p>
+
+<h2>Taking advantage of weaker memory synchronization</h2>
+
+<p>
+So far all of the intrinsics presented require a <em>sequentially
+consistent</em> memory ordering.  That is, no loads or stores can move across
+the operation (just as if the library had locked that internal mutex).  But
+<tt>&lt;atomic&gt;</tt> supports weaker memory ordering operations.  In all,
+there are six memory orderings (listed here from strongest to weakest):
+</p>
+
+<blockquote><pre>
+memory_order_seq_cst
+memory_order_acq_rel
+memory_order_release
+memory_order_acquire
+memory_order_consume
+memory_order_relaxed
+</pre></blockquote>
+
+<p>
+(See the
+<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3126.pdf">C++ standard</a>
+for the detailed definitions of each of these orderings).
+</p>
+
+<p>
+On some platforms, the compiler vendor can offer some or even all of the above
+intrinsics at one or more weaker levels of memory synchronization.  This might
+lead for example to not issuing an <tt>mfence</tt> instruction on the x86.
+</p>
+
+<p>
+If the compiler does not offer any given operation, at any given memory ordering
+level, the library will automatically attempt to call the next highest memory
+ordering operation.  This continues up to <tt>seq_cst</tt>, and if that doesn't
+exist, then the library takes over and does the job with a <tt>mutex</tt>.  This
+is a compile-time search &amp; selection operation.  At run time, the
+application will only see the few inlined assembly instructions for the selected
+intrinsic.
+</p>
+
+<p>
+Each intrinsic is appended with the 7-letter name of the memory ordering it
+addresses.  For example a <tt>load</tt> with <tt>relaxed</tt> ordering is
+defined by:
+</p>
+
+<blockquote><pre>
+T __atomic_load_relaxed(const volatile T* obj);
+</pre></blockquote>
+
+<p>
+And announced with:
+</p>
+
+<blockquote><pre>
+__has_feature(__atomic_load_relaxed_b) == 1  // bool
+__has_feature(__atomic_load_relaxed_c) == 1  // char
+__has_feature(__atomic_load_relaxed_a) == 1  // signed char
+...
+</pre></blockquote>
+
+<p>
+The <tt>__atomic_compare_exchange_strong(weak)</tt> intrinsics are parameterized
+on two memory orderings.  The first ordering applies when the operation returns
+<tt>true</tt> and the second ordering applies when the operation returns
+<tt>false</tt>.
+</p>
+
+<p>
+Not every memory ordering is appropriate for every operation.  <tt>exchange</tt>
+and the <tt>fetch_<i>op</i></tt> operations support all 6.  But <tt>load</tt>
+only supports <tt>relaxed</tt>, <tt>consume</tt>, <tt>acquire</tt> and <tt>seq_cst</tt>.
+<tt>store</tt>
+only supports <tt>relaxed</tt>, <tt>release</tt>, and <tt>seq_cst</tt>.  The
+<tt>compare_exchange</tt> operations support the following 16 combinations out
+of the possible 36:
+</p>
+
+<blockquote><pre>
+relaxed_relaxed
+consume_relaxed
+consume_consume
+acquire_relaxed
+acquire_consume
+acquire_acquire
+release_relaxed
+release_consume
+release_acquire
+acq_rel_relaxed
+acq_rel_consume
+acq_rel_acquire
+seq_cst_relaxed
+seq_cst_consume
+seq_cst_acquire
+seq_cst_seq_cst
+</pre></blockquote>
+
+<p>
+Again, the compiler supplies intrinsics only for the strongest orderings where
+it can make a difference.  The library takes care of calling the weakest
+supplied intrinsic that is as strong or stronger than the customer asked for.
+</p>
+
+</div>
+</body>
+</html>