Talk:Read-copy-update/Archive 1

2006 discussion edit

OK, I will bite. What needs to be fixed in the Read-copy-update article? PaulMcKenney 22:11, 4 May 2006 (UTC)Reply

I don't know what was on the mind of the guy who marked it, but for my money, RCU needs to be fully defined in the first paragraph, not in Section 3 after a load of irrelevant detail about an API. Maybe something like:
"In RCU, an object is copied before being updated, allowing concurrent readers to continue reading the old version without any locking or checking. However, RCU updates can be expensive as they must verify that all readers are done with a version before reclaiming it; RCU is thus used in situations where updates are rare." --Chris Purcell 21:26, 4 May 2006 (UTC)Reply

This is indeed the sequence that gave RCU its name, but it turns out to be rather rare in practice. So how about the following?

"RCU allows concurrent readers to access a shared object without any locking, atomic operations, or checking. RCU updates are divided into removal and reclamation phases, where the removal phase avoids disturbing concurrent readers (who can continue accessing the removed object), and the reclamation phase is deferred until all readers are done with the removed object. Because this two-phase update can be expensive, RCU is most heavily used in situations where updates are rare." PaulMcKenney 22:12, 4 May 2006 (UTC)Reply
That doesn't give me any idea how objects are changed, though. Am I wrong in thinking that the sequence you are describing is RCU as I described it, except large portions of different versions do not need to be copied if they are not changed? i.e. the object is broken down into a tree, and any branch of the tree which is unchanged can be kept: only nodes that are changed (and their ancestors) are copied and updated? Is that wildly off-base? --Chris Purcell 21:26, 4 May 2006 (UTC)Reply

That is one way RCU can be used, but there are others. You might delete a portion of the tree, in which case there would be no replacement, you might insert a new subtree, in which case there might be no old version. The most common uses of RCU just do inserts and deletes in this manner; the "replace with new copy" is rather rare. There are other uses as well, for example, dynamically changing NMI (Non-Maskable interrupt) handlers, implementing lazy barriers, and so on. Hence the less-specific description. -- PaulMcKenney 22:10, 4 May 2006 (UTC)Reply

Well, you've been more helpful in this paragraph than I find the article is in several sections. That's what the problem is for my money. The less-specific description is too unspecific. --Chris Purcell 07:05, 5 May 2006 (UTC)Reply


So what do you suggest? A description of the most common uses, with a teaser saying that other uses are possible? PaulMcKenney 15:08, 5 May 2006 (UTC)Reply

Try it, let's see. I find the tree-based description the most helpful, by the way, it gives concreteness that's otherwise lacking. --Chris Purcell 20:31, 5 May 2006 (UTC)Reply


Done. The new complaint is no doubt that the introduction is too long.  :-) If so, I would revert back to the original, given that the intro for locking is the quite succinct "In computer science, a lock is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution. Locks are one way of enforcing concurrency control policies. ". Someone who was not already somewhat familiar with concurrency would not get much from the locking intro, right? PaulMcKenney 23:43, 5 May 2006 (UTC)Reply

I fear your list example has problems with linearizability: if one writer updates one location, then updates a later one, a concurrent reader may see the see the second update but not the first, and thus cannot be linearized. RCU as I understood it requires that the entire top of the tree (or, in this degenerate case, a list) must be changed, not just bits halfway down, so as to avoid this problem. Or is it simply that care is taken to ensure that this problem does not have bad consequences? --Chris Purcell 10:26, 6 May 2006 (UTC)Reply

(Sorry for the delay, was in a locale in which Wikipedia is unavailable.) Ah, linearizability! First, not all applications/algorithms require linearizability. Those that do not should not be required to pay the large cost (both in terms of complexity and in terms of performance) often associated with it, which might be what you were getting at in your last sentence. Second, linearizing the updates is not sufficient; the readers must be linearized as well, which greatly increases the cost of reading. Therefore, it you need good performance and simplicity, you will be motivated to avoid the need for linearizability. In other words, rather than my example having problems with linearizability, I assert that linearizability has problems with both CPU/system architecture and software-engineering considerations -- although linearizability might have been defensible in the mid-1980s, it is beyond the pale here in 2006. Perhaps heretical, but, if so, extremely valuable and hard-won heresy! PaulMcKenney 18:42, 22 May 2006 (UTC)Reply

Firstly, and most importantly, I like what you've done with the introduction to the page. It's very nice. Great work.
I assert that in the example, your reads are not atomic operations. Forget linearizability for a moment: if you can find a sensible way of defining "atomic" that covers them, I'll be much more inclined to believe that "linearizability has problems". (None of the ad-hoc definitions of "atomic" I have ever seen have actually not been linearizable, merely confusing and unhelpful.) --Chris Purcell 08:34, 23 May 2006 (UTC)Reply

Glad you like it!


I agree that the reads are not atomic operations. This is a good thing!!! Atomic operations are quite expensive, and making complex operations appear to be atomic is quite complex (in absence of locking, which is expensive and imposes high realtime latencies).

But, since you asked for a definition of "atomic" that is non-linearizable, I will give you an example. Consider a per-thread variable used as a counter, so that the value of the counter is the sum of the per-thread variables. I assert that adding a set of fixed (non-unity) values to the counter is non-linearizable, since different threads might disagree on the intermediate values of the counter, although they would all agree on the final value. So, is incrementing this counter atomic? If not, why not? PaulMcKenney 16:59, 23 May 2006 (UTC)Reply

The writes are atomic. They are also linearizable. The reads are clearly neither, since they can be visibly "divided" by concurrent writes.
I agree in a theoretical sense, but not from a practical viewpoint. The theoretical linearizability of writes is visible only to an oracle, no real sequence of read-side operations can observe the linearizability. So what does it really mean to say that the writes are linearizable?
In this case, it means a read that is not divided by a concurrent write will see a view of preceding writes that conforms with a linearizable history. Further, a read that is divided by concurrent writes will see a value that is constrained by a linearizable history of preceding writes, even though it cannot itself be linearized.
My objection to this is that there is no useful way to determine whether or not a given read has been divided by concurrent writes. OK, no useful way that I know of.
Locking. Any out-of-line mechanism would be fine, in fact. Version counters. Whatever. I agree that it's not terribly useful — you asked about theory :D
Touché!!!
If you can call the writes "atomic" but not the reads, you can equally call the writes "linearizable" but not the reads. I cannot see a way of redefining "atomic" that is not just "linearizable" in sheep's clothing.
More on this at the end -- didn't put it here in interest of nesting level.
I'm sure you agree with that statement, but I'm going to give a quick illustration anyway. Imagine some tangible constraint imposed on the counter: perhaps it must take only values which are square numbers, and the threads achieve this by incrementing it in turn: the first adds 1; the seconds adds 3 when it sees the counter has reached 1; the third adds 5 when it sees it has reached 4; etc. Stupid, I know. Now a concurrent reader can quite happily miss the first addition but see the second, and thus see a non-square sum.
Understood.
On the other hand, if writes are only allowed to increment the counter, and then only by 1, it can be shown that reads are atomic/linearizable. Take the value held in each register when one started the read, and the value held at the end. Since each strictly increases, one must have read a value between the two extremes. Hence the final sum must also be between the starting sum (minimum) and finishing sum (maximum), and since all integer values between the start and end value must by construction have been taken at some point, we can linearize the read operation to that point. (Try arguing that without the concrete definition of linearizability to help you :)
Yep, increment by 1 is a special case. But sometimes you need to add values other than 1. And the analogous proof for non-unity additions seems pretty straightforward: (1) each counter is increasing in value monotonically. (2) Each read   will therefore give a value  , where   is the value of variable   at time  . (3) Therefore,  , QED. Why is this considered hard and how was linearizability helping? Or am I missing something here???
You missed the last bit — all integer values between the start and end value must by construction have been taken at some point. That doesn't hold if non-constant increments are supported, and the result is thus not linearizable.
I didn't miss it. My formula just says that any read will return a value between the starting and ending values, inclusive. It does not say that a pair of reads will agree on the order of writes. Of course, the writes don't care about the order of writes, which is another reason that your claim that the writes are linearizable seems irrelevant to me.  ;-)
I see, you weren't claiming it was still linearizable, just that it had a useful bound on its behaviour. Great!
"It is a service I provide."  ;-)
Linearizability allows us to pull the trick of saying "X linearizes here". Unless you give me an alternative definition of "atomic" — which you have yet to do — I must fall back on an alternative definition I have heard in the past (and on WikiPedia no less): "An operation is atomic if it is not divided by a concurrent update." How would you use that definition to show that the increment-by-1 case is atomic?
But by that definition, the increment-by-1 read case is not atomic. A read can easily be divided by concurrent writes. And in reality, the read case is not atomic. It cannot be proven to be non-atomic, in the sense that any value returned will have been taken on by some linearization of the updates. But with real machines, it is quite possible that a pair of increments happens on precisely the same clock cycle. In that case, you can linearize the sequence of updates, but the linearization will be quite artificial.
Yep, quite artificial: solely useful for reasoning by the user. The read case not being atomic by any amount of staring at that definition was my point: when does the operation start and end? In a locking system, it starts and ends when the locks are held and released, not when the function is called or returns, so what does "not divided" mean? It pretty much reduces to linearizable.
Again, see the end for a candidate non-linearizable definition of "atomic".
The same holds whenever one decides to throw overall atomicity to the wind: one must manually confirm no schedule will cause a constraint violation and upset the application. In the case of a complex object protected by RCU, it is much easier to think up plausible constraints that can be violated by seeing updates out-of-order. Given that the whole point of RCU is to make read operations cheap, the extra logic of having to check for isolation failure seems ludicrous, given that atomicity is so cheap to effect: simply swap out the top part of the tree, a technique described by Herlihy in the late '80s. Updates are rare anyway; a little extra cost allocating and freeing copies of the top bit of the tree is negligible compared to the price one has to pay implementing a barrier on all read threads!
Yep, you can do this, but if the tree is large, the cost of doing the full-tree copy and replace might be anything but negligible (consider for example the Linux-kernel directory-entry cache, which on my laptop has 43000 elements!!!). Copying and replacing the entire directory-entry cache every time someone removes a file is just not going to fly. And you really do not need per-read memory barriers if you update subtrees rather than the whole tree -- or am I misunderstanding what you are suggesting?
Sorry, by read barrier I meant the write operation having to use operating-system–level tricks to wait until all read operations have passed a barrier and the memory can be reclaimed. Compared with that, the cost of replacing the section of the tree leading to the modified file entry (not "the entire cache" as you have understood it from my description) is, as I have said, negligible.
But such a barrier (or, in RCU parlance, a wait for a grace period) need not be incurred by all reads. It need be incurred only for updates that remove something from the list.
Ah! I see your point now. How about updates that modify something in the list? Since they do not have transactional memory, presumably they need to swap out the entry they are modifying for a new one, and then barrier to release the memory. Is the cost of barriering amalgamated across many calls by delaying memory reclamation? I can see how keeping memory costs minimal would then be important.
Yep, if the modification itself is not atomic, then one approach is to swap the old element for a new/updated one. And, yes, and important optimization is to make a single grace period serve many updates (>1,000 of them, in one measurement on a heavily loaded Linux kernel) -- and, as you surmised, one does this by delaying some of the reclamation. One must temper this optimization in machines with small memories, and dynamically striking the right balance is the subject of ongoing work.
However, in some instances, such as when one is implementing a set, or a partial function, it is again easy to show that discarding atomicity on the whole tree does not prevent the individual set operations from being linearized. This is what I obliquely referred to above. If this kind of valid optimization is commonly used in RCU operations, that's great: the end result is atomic. It might be nice to mention the issue on the page, but I'm not too fussed.
RCU never linearizes the reads, by design. I am still choking on your description of updates being linearizable, but the corresponding reads not being linearizable -- if you cannot see it, and cannot take advantage of it, what meaning does it really have?
What do you mean by "linearizes (the reads)"? I have not encountered this term before. Do you mean "serializes"? If not, what do you mean?
Some time ago, you said "The writes are atomic. They are also linearizable. The reads are clearly neither, since they can be visibly "divided" by concurrent writes." So you have seen the term before.  ;-)
The usage is slightly different. I would say "RCU does not have linearizable reads", or "the reads are not linearizable", not "RCU does not linearize reads". I was wondering if you were trying to say "RCU does not serialize reads", which is a term I've heard before. Now I see what you mean.
More on this at the end as well.
P.S. I'm not sure if you are citing any particular authority when you claim "linearizability [is] beyond the pale here in 2006". Is this your own private opinion? Certainly I know of some authorities who would disagree with that sentiment: Doug Lea, who wrote java.util.concurrent, for instance. Yes, many functions in that library, especially iterators, are for good reasons not linearizable; however, they are always documented as equivalent to a sequence of linearizable operations. I think the users of juc's Map would be understandably annoyed to find a simple Lookup was returning a value they never placed in the hashtable — or returned one they had just deleted.
You just said it -- Doug Lea abandoned linearizability for important pieces of java.util.concurrent, for good reason, as you put it. The fact that no popular SMP CPU supports a sequentially consistent view of memory should also be considered -- sequential consistency is avoided by CPU designers for the same reason that linearizability should (often, not necessarily always) be avoided in SMP software. Finally, show me a linearizable implementation of the per-thread counter example we discussed above that permits non-unity increments, but without giving up the performance advantages.
The registers that compose that memory are linearizable, and are atomic. So are the basic operations on Lea's structures. In fact, I would go further, and claim that modern SMP computers still provide an entirely linearizable model of memory; the only thing that has changed is that the invocation and responses of successive operations by the same processor now overlap thanks to out-of-order execution. Memory barriers simply ensure certain responses occur before, and certain invocations occur after, the barrier. The memory as a whole is entirely linearizable. (I am willing to expound on this point if you like.)
I would instead say that modern SMP computers provide an entirely linearizable model of memory, but only one a word-by-word basis. Modern SMP computers do not necessarily provide linearizability for updates of unrelated pairs of words of memory -- as the per-CPU-counter example clearly demonstrates.
Nevertheless, they can also be considered linearizable: it's a matter of point-of-view, not correctness. Let's drop it.
Fair enough!
Claiming that atomicity is beyond the pale in 2006 is highly blinkered. There is serious consideration being given to implementing transactional memory on main-stream systems in the near future; and Lea recommends the use of a standard hashtable for anyone who cannot get by on the limits of the non-atomic iterators provided by his sets. (However, I note you have below retracted "beyond the pale" considerably, so perhaps you are, after all, not so blinkered ;)
Transactional memory can indeed have some nice properties. However, it usually also comes with interesting hardware limitations, e.g., cache geometry.
I will not present an equally-performant per-thread counter, as I know of none. The best I know of keeps the same update performance, but requires a double-scan of the array. It will also require repeated rereads to obtain a 100% accurate value if interrupted by concurrent updates, though it can bound the range of values the counter could have taken. I would point out that your implementation is essentially useless in many situations, as the value it takes can be arbitrarily inaccurate with no warning. The first application that springs to mind, measuring the occupancy of a hashtable for dynamic resizing, would be entirely inappropriate, as the counter could return a negative occupancy, or an occupancy a dozen times larger than ever seen. The hashtable would therefore be highly prone to stupid resizing, a huge price to pay for a negligible (see below) optimization opportunity.
If I guess correctly, in your double-scan approach, a reader could be indefinitely delayed by writers. Or do readers have some way to "shut down" writers? I could imagine some tricks to impose linearizability when readers needed it at modest cost to updaters, but why? The per-CPU counter is used quite heavily for statistical counters. For example, keeping a count of the number of bytes transmitted or received, where you increment the current CPU's counter by the length of the packet just sent or received. The accuracy is bounded by the equation I gave earlier, where the read starts at   and ends at   -- which is as much as you can ask for, since the value of counter had to have "passed through" the value returned by the read. In my view, it does not make sense to ask for more -- even if you used a single counter (presumably updated with atomic machine instructions), the value would be stale as soon as you read it, which would introduce further error. Linearizability isn't buying you anything in this case.
True.
It is possible to use variations on the per-CPU counter in situations where the counter is non-monotonic, as in your hash-table occupancy example, and to bound the maximum error, and such variations have been used in production.
However, we have strayed very far from the point, which was that atomic = linearizable, and that RCU not being atomic may not be a disadvantage. On the latter, you have convinced me you are sensible of the trade-offs, while the former you appear to agree to!
I remain uncomfortable with the notion of confining myself to linearizable algorithms, but since you agree that linearizability is not required in all cases, this should not be a problem. Given that you do not seem to be arguing that I should confine myself to either atomic or linearizable algorithms, I have a hard time getting too excited about the exact definition of either term.
Really, though, I would love for you to present me with a definition of "atomic" that is not linearizable. I would love to be wrong about the merge of the two pages. Please try!
See below -- but keep in mind that you should be careful what you wish for. Oh, so sorry, it is too late to give you that advice.  ;-)
Linearizability does not have to be slow. It does not demand that all operations have to be serialized, which I think you were trying to say above. Linearizable read operations can happily execute in parallel with no communication over the memory bus.
Again, consider per-thread counters with non-unity increments -- how do you linearize the reads for free? And I probably should have said something to the effect of "always requiring linearizability is beyond the pale" -- I freely admit that there are situations where linearizability is needed.
If I wanted cheap reads, I would store the counter in a single memory location. The cost of gaining even shared ownership on one cacheline per processor is very high on a multi-way machine, as it interrupts every updating thread for as long as it takes the cacheline to traverse the bus, be held in shared mode for a while (fairness reasons), then traverse back — a very long time. Optimizing performance is not a matter of reducing the number of read operations and avoiding CAS operations: much more significant is the number of shared cachelines that must be touched, and on that metric your algorithm is severely deficient.
I am extremely happy that you recognize that gaining shared (let alone exclusive) ownership of a cache line is quite expensive on modern systems. I had the privilege of learning this from a logic analyzer longer ago than I care to admit. But the fact is that if you had multiple updating threads, they were causing each other severe performance problems, so they might not notice the extra read!
By a twist of fate, I am well conversant with your particular example (a shared counter), and have in the past tested various algorithms, including yours, on a 64-way machine. The cost of synchronization hugely outweighs the cost of linearizability. There is no way to solve the counter problem cheaply: either reads will cost you painfully, or writes will, or a mixture of the two.
I disagree -- it is possible for both reads and updates to be cheap. I know, I have implemented such a beast. (Would you like me to tell you the trick, or do you want to work it out yourself?)
Do tell.
The trick is to make a controlled sacrifice in the accuracy of the values read. In many cases, a significant sacrifice is possible -- see our earlier discussion about the value of the counter being stale immediately after being read, for example.
One can then use per-CPU counters each of whose range is limited to the range from zero to some target value. There is an additional global counter that is manipulated with atomic adds and subtracts. CPUs (or, equivalently, threads) manipulate their counter non-atomically, but only when the result lies within the range that these counters are constrained to. If the result would overflow or underflow the range, the CPU atomically adjusts the global counter, optionally so as to leave the per-CPU variable at the center of its permissible range. If the fates are kind, a very large fraction of the updates need only manipulate the per-CPU counters, maintaining near-optimal performance. The readers simply return the value of the global counter (correcting by adding the number of CPUs times half the target), incurring a maximum error of the number of CPUs times half the target.
Hey, you asked!!! (BTW, you are free to use this in software licensed under GPL -- long story, sorry that I must throw this proviso in, but there you have it...) PaulMcKenney 15:32, 24 May 2006 (UTC)Reply
I may be ranting now, so I'll shush :D --Chris Purcell 19:59, 23 May 2006 (UTC)Reply
Wouldn't want to suppress a good rant! ;-) PaulMcKenney 20:59, 23 May 2006 (UTC)Reply
Oh good, I'll continue ;)
So, in conclusion: (1) I still maintain that atomicity = linearizability, and request a plausible definition that would contradict that. (2) I agree that there are places in which linearizability is impossible to achieve sensibly, but in which less strict isolation guarantees are possible and useful. I think that, in this, we only disagree on perhaps how important atomicity is. (3) I disagree that atomicity is the cost you claim it to be. Communication costs dominate. Atomicity is mainly a problem when the required communication costs of an operation are so large that maintaining strict isolation during them would be impractical, e.g. during a snapshot of an entire hashtable. However, even then, throwing out isolation completely is foolhardy. I suspect we agree on that, too.
For (1), it depends on what you meant earlier when you said that the per-CPU counter had linearizable updates, but not reads. For (2), we do agree that dispensing with linearizability can be useful, and we will probably continue to have difficulties identifying exactly what high-level abstraction is expensive, since the abstractions get difficult to tease out at the machine level, where I usually live. (3) Ditto on the identification problem, but it -is- possible to snapshot an entire hashtable -- or even an entire filesystem -- at relatively low cost. I agree that some sort of isolation guarantee is almost always needed -- but linearizability has often proven a very blunt instrument.
How would you do that low-cost snapshot? Would it be linearizable? (It would be awkward, for instance, if a file got moved and ended up appearing twice in the snapshot.)
The standard approach (old news even back when I started in this business) is to freeze the data structure, making further updates to an auxiliary structure. If you are sufficiently careful, this can be linearizable, but I believe that this makes the transition from updating the data structure itself to updating the auxiliary structure more expensive. This technique is most often used for data structures on mass storage (as opposed to data structures in RAM). You can of course also take the inverse approach, continuing to update the main data structure and maintaining an auxiliary data structure that tracks the differences, but it would be more difficult to linearize. Many times, people don't care about linearization, preferring the greater read-side performance of this second option. PaulMcKenney 15:32, 24 May 2006 (UTC)Reply
Let us return, therefore, to point (1) and set aside (2) and (3), assuming you have no objections to that. This discussion has been most rewarding thus far, thank you. --Chris Purcell 00:48, 24 May 2006 (UTC)Reply
The problem is that I don't have any particular vested interest in definitions of "linearizable" or "atomic". I do use them, for example when using a globally locked data structure, but I very often violate them. Coming up with a separate name for each different relaxed isolation constraint seems strange, but then again, I have never thought to try to name them, which is a consequence of my working down in the bits. One approach would be to start with the VAXCluster locking model: null, concurrent read, concurrent write, protected read, protected write, and exclusive. Use of the last three modes results in atomic/linearizable/whatever. Use of the first three modes result in weaker isolation modes. But this does not cover RCU, which has a fuzzy-barrier-like constraint between reads and reclamation.
And yes, interesting discussion, here's hoping to further illumination! PaulMcKenney 05:10, 24 May 2006 (UTC)Reply
I concede, you have convinced me that your example is correct :) RCU is just fine with "read committed" isolation. Might be nice to mention the fact on the page.
If you have no suggestion regarding a better definition of "atomic" than "linearizable", we appear to have nothing more to argue over. (Rats.) If so, we should probably note that on Talk:Linearizability for the record. I would then appreciate help making the Linearizability page more accessible, merging in the kind of picture-book descriptions found on Atomic operation, if you would be interested. --Chris Purcell 09:21, 24 May 2006 (UTC)Reply
One more time, see below... PaulMcKenney 15:32, 24 May 2006 (UTC)Reply

Now that the fog of jet lag is receding a bit (leaving me only with the omnipresent fog of late middle age) here is a candidate definition of "atomic" that is not linearizable. To give credit where it is due, this definition derives from ongoing discussions on RCU semantics with Jon Walpole, Robert Bauer, Tom Hart, Angela Demke, and others.

"An algorithm can be considered atomic if all updates are linearizable, though possibly with multiple candidate linearizations, and if any given read gives a result that is consistent with at least one of the candidate linearizations. There is no requirement that concurrent reads give results that are consistent each others' linearizations."

This describes the semantics of the per-CPU counter. Each read will return a result that is consistent with some linearization of the updates, but different concurrent reads might see different, mutually incompatible, linearizations. In hindsight, this definition has been quite useful to me.

Going back to the VAXCluster locking modes, this definition is consistent with the concurrent read, protected read, protected write, and exclusive locking modes, but not with the null or the concurrent write locking modes. But the definition is a bit looser than that -- it is consistent with multiple data structures being protected by corresponding multiple locks. It is the semantics of data locking, as near as I can tell.

Thoughts? PaulMcKenney 15:32, 24 May 2006 (UTC)Reply

I've got a few points to make.
Controlled sacrifice: I like the idea of "controlled sacrifice". It's nice. I can't see how it can be under the GPL, since that's a copyright license, not a patent license, so can only protect code, not ideas — is the idea patented?
'Fraid so...
Weaker atomicity: This definition of atomic is interesting. It fails to describe the semantics of the per-CPU counter, as the counter doesn't respect orderings between processor updates. In the monotone example I gave above (square numbers), there is only one possible linearization of the updates, as they occur serially. The read operation doesn't return a value that this linearization held. Hence, it's not atomic by your definition.
Almost. Your monotone example fails to exercise the full ability of the counter to misorder updates.  ;-) Suppose that one writer is repeatedly adding the value 2 to its counter, and another writer is repeatedly adding the value 11 to its counter, with the initial value of both counters being zero. The set of possible values over time is then { 0, 2, 4, 6, 8, 10, 11, 12, 13, ... }. Suppose also that a pair of readers (A and B) concurrently sums the two counters. Given a linearizable algorithm, if reader A gives 2, then reader B is prohibited from returning 11. However, the per-CPU counter algorithm could in fact have reader A giving 2 and reader B giving 11, as permitted by my weaker notion of atomicity. The sequence of events leading to this result is as follows:
Reader A reads the second counter, which also has value of zero.
Reader B reads the first counter, which has value of zero.
Writer 1 adds 2 to the first counter.
Writer 2 adds 11 to the second counter.
Reader A reads the first counter, which has a value of 2.
Reader B reads the second counter, which has a value of 11.
Yes, reader A has read the counters out of order, but this is a definite possibility for modern superscalar CPUs, to say nothing of modern optimizing compilers. Nonetheless, both readers A and B have given values consistent with linearized views of the updates -- but the two linearized views are inconsistent with each other.
You can't give me a single example where it is atomic and claim it's therefore atomic always. The algorithm is either "atomic" in all valid histories or it's not "atomic" at all. Imagine trying to sell an algorithm to a bunch of programmers that incremented a counter by 1 "in this exact interleaving here". They'd chase you up a tree and set fire to it. (Douglas Adams reference there, in case you miss it :)
They first would have to get in line, and second would have to avoid my chasing them instead.  ;-) Besides, I have sold this particular algorithm to quite a few programmers, some of whom might well have read Douglas Adams. And many others sold it before me -- I did not invent this algorithm.  ;-)
The "atomic always" part is straightforward. Informally: (1) Each increment of a given per-CPU variable is atomic, since the variables are private (hence we never have two CPUs attempting to increment the same variable concurrently, or ever, for that matter). (2) Each reader will therefore see the old value of a given per-CPU counter or the new value; a reader cannot see a half-incremented per-CPU variable, whatever that might mean. (3) A given reader's traversal of the per-CPU variables will therefore be a "cut" through the history of increments, therefore representing a valid linearization of this history. (4) A given reader's sum of the per-CPU variables is therefore an atomic read of the full counter given the corresponding linearization. This is admittedly a very bizarre definition of "atomic", but not nearly as bizarre as many other things I have done.
(3) is false, sorry. If you'd used the phrase "therefore representing a valid sequential reordering of this history", I'd agree, but then you're just claiming that atomic = read committed, and you might as well do away with the complex definition and use the pre-existing one.
So your claim is that my (3) would be valid for sequential consistency, but not for linearization?
Of course, other concurrent readers might end up with different linearizations, which was what I was illustrating with the above example. This situation is analogous to relativistic physics, which might seem strange, but is completely natural given the non-negligible latencies required for data to move from one CPU to another in modern systems. Any assumption that there will be only one "real" linearization in a given thread of execution is an anachronism. This assumption was defensible when the CPU-to-CPU latency was negligible (i.e., small compared to the time required to execute a single instruction), but is unacceptable today.
I have given you a valid history that is not atomic by your definition. It may be simple — just a single thread doing updates — but it's still valid. Hence the algorithm as a whole is not atomic.
Your single thread would be incrementing a single counter, hence the reads would always read a square number. A more interesting example would use a lock to allow multiple threads/CPUs to increment the counter monotonically, so that a reader thread could then read out a non-square number. But this example is quite silly -- if you are going to protect the updates with a lock, you gain nothing by spreading the counter over the per-CPU variables; you should instead use a single variable for the counter. In other words, if you have a situation where you are capable of forcing a given linearization, you should not be using per-CPU counters in the first place.
Whoops, my bad. The example I gave wasn't single-threaded, that's the one below. The example I gave had ordering between the updates caused by making them conditional on the counter. However, you can't call a usage "silly". Either the thing is atomic in all situations or it's not. And it's not.
Of course I can call a usage "silly". I can validly claim that a hammer will drive a nail, even though it probably won't drive a steel nail into a block of titanium. It is perfectly valid to restrict the usage domain of a given algorithm, and then make statements about the properties of that algorithm in that restricted domain.
It also fails to describe the semantics of the RCU system. Imagine updates are made serially by a single processor. A concurrent read may see later updates but not earlier ones — seeing a file be duplicated when it should have simply moved, for instance. However, there is only one linearization of the updates possible.
That is an overly constrained example. Instead, consider a hash table, where each hash chain is protected by a separate lock, being updated by multiple concurrent threads. In this case, there could be multiple linearizations of the updates, and different reading CPUs could see different linearizations, particular given superscalar CPUs and optimizing compilers.
Again, all I need is a single counterexample. Thread A removes key 1, then inserts key 2. A read by thread B sees both keys. Not atomic by your definition. Q.E.D.
Not atomic by 'your' definition, that is! I did not claim that pairs of reads were atomic, only single reads. And you will likely reply that this is further evidence of "atomic" == "linearizable". As near as I can tell, you are imposing the additional constraint that the atomicity and linearizability be evaluated at the same level of abstraction. I was not assuming this constraint, and, as you may have noticed, am predisposed to think of atomicity of single operations and then linearizability of groups of these operations.
So let me get this straight. You're claiming that the "reads" you were referring to are not the "read" as in "read-copy-update", namely a single read of the entire structure by a user, but one single read primitive? The atomicity of a single operation is guaranteed by the hardware. If you're considering individual reads, not groups of them, then they're linearizable! No need for this horrific new definition. Just use the existing term: read committed isolation.
So you agree that "atomic" applies to a specific primitive, but "linearizable" applies to the group? In that case, they are not the same.
There's no precedent for this idea that "atomic" should refer to the hardware primitives used in constructing an implementation, but "linearizable" must meet a higher standard. It's not a "constraint" I've suddenly imposed without warning. If you say an operation on a data structure is atomic, you must be making a claim about the composite, not the individual bits it's made from.
Again, it sounds like you agree that "atomic" applies to an individual operation (perhaps a composite, but in that case, the composite is itself treated as an individual operation), while "linearizable" applies to a group of operations. Again, this would imply that "atomic" and "linearizable" are different concepts.
Do you have any example where something is atomic by this looser definition, but not by linearizability? Or any motivation why it should be called "atomic", since it has lost any idea of "indivisible" reads in any absolute sense? (Also, we'd need to consider the maxims of Wikipedia:No_original_research before we could use it on Atomic operations.)
As noted earlier, I assert that the per-CPU counter algorithm running on super-scalar CPUs (or with optimizing compilers) and RCU with multi-locked data-locking algorithms meet this looser definition, but not the tighter one.
And this stuff has been in use for decades!!! It is not my fault that some researchers have failed to keep up!!!  ;-) Seriously, the per-CPU/per-thread counter was described in Y. C. Tay's 1987 textbook entitled "Locking Performance in Centralized Databases" -- that would be 19 years ago.
See above.
Isolation: One thing has just occurred to me: you seem to be arguing "atomic" should mean the same thing as READ COMMITTED isolation; that is, updates are only ever seen as single lumps, but a read may or may not see the effects of any given update performed during the read, essentially randomly. However, would you call a RCU read "atomic"? No: it can show visible evidence of being divided by updates. Only the updates are atomic, as a read will never see a partial update. Fair?
I am starting to think that "atomic" is orthogonal to "linearizable", "isolation", "consistency", "durability", and so on. You can have a linearizable sequence of operations that is non-atomic, right?
Not unless you are talking about atomicity as in databases, which is totally different. In concurrent programming, operating systems, etc., "atomic" is a statement about interactions between threads: about isolation.
Hmmm... The definitions for databases were there first, I believe.
I'm just conforming to standard use of the terms. Wikipedia isn't the place to start insisting that one group of guys got there first, and ignore standard usage in an entire field of computing simply because it happens to use the same word. (And hasn't "atomic operation" referring to isolation on the hardware level goes back as far as there have been multiple bits of hardware using the same bus?)
Are you meaning "atomicity" as used by Lamport, later obsoleted by "linearizability" by Herlihy and Wing? If so, shouldn't that use of "atomicity" be left out entirely as being obsolete? (OK, perhaps restricted to a footnote in the "linearizability" page...) Then "atomicity" can take on its more natural meaning.
But rather that quibble further about the definitions, let me give an example. Suppose a group of CPUs non-atomically increment a single variable. This is clearly non-atomic, in fact, increments can be lost. But the sequence of loads and stores is clearly linearizable (especially clear if you consider the nearly mythical sequentially consistent computer system). And yes, the individual loads and stores are indeed atomic, but the sequence of loads and stores is not atomic, despite being linearizable.
I have two objections here: (1) Suddenly you're talking about atomicity of a group of operations again, and suddenly linearizability refers to the individual operations. Your nomenclature is getting pretty confusing for me. (2) No, the sequence of loads and stores is also atomic. The implementation of incrementation that you are trying to achieve with those loads and stores is not atomic. It's important to be clear whether you are talking about the primitive operations or the logical ones you are trying to implement with them.
No. Individual operations are atomic or not. Groups of operations are linearizable or not. The increments are not atomic, but the load and store implementing the increment are atomic. The sequence of loads and stores are linearizable. The sequence of loads and stores is not atomic. I have used the terms consistently.
And then the converse example. On a weakly ordered machine, process A does { w = atomic_inc(&a); x = atomic_inc(&b); } while process B does { y = atomic_inc(&b); z = atomic_inc(&a); }, with a and b initially both zero, where atomic_inc() returns the value after the increment. One could end up with w = 2, x = 1, y = 2, and z = 1, since a weakly ordered CPU is within its rights to reorder the atomic operations. The individual operations are atomic, but there are valid execution traces for which there is no linearization that preserves program order.
As I said, this can be modeled as OOO processors overlapping invocation and response on the same processor, unless you specifically use barriers. Or you can think of it as multiple, individual linearizable registers, keeping OOO reordering separate in your head. Either way, you must keep "primitives are atomic" and "implementations are atomic" separate.
Don't forget the "preserves program order" at the end -- an OOO processor would not preserve program order, right?
In short, I am still not convinced that your definitions of "linearizable" and "atomic" are necessarily helpful in practice. If you hew to them at all closely, you lose sight of useful optimizations, a cardinal sin in my book. Not that I intend to chase you up a tree, let alone set fire to it.  ;-)
Unless you want to really confuse your users, you should be clear when an (implemented) operation is guaranteed to occur atomically, and when it's not. If you don't know what the word means, you can't do that. If they don't know what the word means, it won't help them, and they'll have to learn how your implementation works and how the isolation level will affect their use of it. Atomic operations are so unbelievably handy because they are fully isolated. The user doesn't need to worry. If you have valid reasons to discard it for optimization, that's fine, but you'll need to describe the isolation level explicitly.
Hey, you were the one originally asking for an alternative definition for atomic!!! So why are you now complaining that the alternative definition I gave you does not match the definition you are used to??? If it did match, what would be the point??? Just stick with the classic definition, and stop asking for alternative definitions!!!  ;-)
By the way, I fail to see why the referenced paper's "data locking", and indeed multiple data structures protected by multiple locks, is not plain ol' linearizable. --Chris Purcell 16:24, 24 May 2006 (UTC)Reply
If there are RCU readers, then it is not linearizable. Certainly not in presence of superscalar CPUs, weak memory models, and optimizing compilers. PaulMcKenney 18:59, 24 May 2006 (UTC)Reply
But the referenced paper, and indeed locking in general, is not using RCU. --Chris Purcell 19:31, 24 May 2006 (UTC)Reply
Yes, I should have stated "with RCU" when I originally called it out. PaulMcKenney 02:17, 25 May 2006 (UTC)Reply
This is still fun :) --Chris Purcell 09:29, 25 May 2006 (UTC)Reply
But time for resolution. It looks like we agree that the classic definition of "atomic" is different from the classic definition of "linearizable" -- groups of atomic operations will normally be linearizable, but not all linearizable sequences of operations will be atomic. We agree that Lamport's (now obsolete) use of "atomicity" is identical to "linearizable" as defined by Herlihy and Wing. I assert that using "atomicity" to mean "linearizable" is bogus, given the potential for confusion between Lamport's "atomicity", the classic database definition of "atomicity", the classic definition of "atomic", and the classic English meaning of the "ity" suffix. Fair enough? PaulMcKenney 15:41, 25 May 2006 (UTC)Reply

Time to reindent. Alright, the nomenclature I am used to, spelled out very clearly.

Good idea -- as you may have noticed, I am not big on worrying much about nomenclature. Yes, you can think without language!

One can talk about either a primitive object (usually a register) or an implementation of a logical object using primitive object(s). When one says something is "linearizable", it is usually implicit, but can be made explicit, whether one is talking about the primitive operations or the implementation. Similarly, when one says "atomic", one may be talking about either; however, it is easier to clarify with the word "atomic" - an atomic operation versus an atomic group of operations.

"Atomic" is indeed quite fuzzy. For example, many people are used to thinking of all the operations happening in a critical section as being "atomic" -- which they are, but only from the perspective of another critical section that is mutually exclusive with the first critical section.

Unfortunately, the word "operation" is ambiguous: it can be used to refer to a group of primitive operations implementing a single logical operation. To clarify, one can use the qualifier "primitive" or "logical". This qualifier can be omitted if one is talking about an "object" rather than a particular implementation of that object, just to be even more confusing. In general, I use the word "operation" to mean a logical operation, not a primitive one. This is consistent with the literature, I believe.

There are not enough words to go around. There never will be. People creating new artifacts always must create new words/phrases to describe them. People who are good at creating new artifacts are not necessarily competent creators of new words or phrases. Life is like that!
However, there is hope, given that most people can successfully parse sentences of the form "Did she make it to the make-up test, or did making her make-up make her late?" or "Since he has the right, it is right that he write that the right rite is to the right." Although the latter sentence should be spoken rather than written for maximum effect.

A history can be linearizable. If all histories are linearizable, the object is said to be linearizable. And if the object is linearizable, the operations of the object are atomic. I've played fast and loose with this above, talking about linearizable operations and atomic histories. My apologies.

No problem! The first two sentences I accept. The third I believe to be generally true, but I expect that there is some degenerate case or another that falsifies it. For one potential (stupid) example, long-running no-ops.

One can call some of the operations of an object atomic, and not others, by considering the same object but without the non-atomic operations, extended with any other logical operations necessary to later re-implement the non-atomic ones that were removed. If this new object is linearizable, the operations can be considered atomic. Hence single-word memory primitives are atomic, and multi-word ones (like operations on double-word floats) can be described using a simple algorithm of single-word primitives (e.g. read first word, read second).

However, there is no circumstance I am aware of in which the operations of an object are atomic but the object as a whole is not linearizable, nor in which non-atomic operations cannot be described in terms of (logical!) atomic ones.

If the operations are sufficiently bizarre (e.g., quantum computing or even plain old analog computing), one could probably find a counter-example. But I agree that these definitions/observations should have widespread applicability.

The simplest way to explain what "atomic" means is to use the alternative definition of linearizability: an operation is atomic if it appears to all other operations to occur indivisibly at some instant of time between its invocation and its response. An operation is non-atomic if it appears to consist of a sequence of (logical, not primitive) atomic operations which may be divided from each other by concurrent (logical) operations. Typically, a non-atomic update operation is highly unsafe without external mutual exclusion, while a non-atomic read operation is often simply harder to describe and use safely. There are exceptions to both.

Fair enough!

I'm sorry if I've been terribly unclear above, and I hope this states things clearly enough that you can decide whether or not to agree with me. We can then proceed to rectify any horrors I've perpetrated on linearizability.

Hey, if necessity is the mother of invention, lack of clarity is the father, or at least the concerned neighbor!

Finally, I would say that, outside of databases, "atomic" does not mean "either occurs completely or not at all". It is used throughout a large body of literature, both old and new, to mean a statement about isolation, especially the phrase "atomic operation". Thus, whether you agree with the above or not, we do need to associate the term "atomic operation" with isolation somewhere (e.g. on the linearizability page), or we are not creating an encyclopedia, regardless of how nice it might be to unilaterally change the terminology.

Agree, database nomenclature is slightly different. Non-database "atomic operation" is essentially "indivisible"; not convinced "isolation" completely captures it, but it is certainly related.

(We could probably kill atomicity (disambiguation) and merge it with atomic (disambiguation), however, as I don't think "atomicity" is used outside of databases, just "atomic".)

Agreed. Lamport's use of "atomicity" needs to be forgotten in favor of "linearizable".

Many thanks for your patience!

P.S. You wrote "So why are you now complaining that the alternative definition I gave you does not match the definition you are used to???" I was hoping to understand why you wrote on Talk:Linearizability that "an atomic operation need not be linearizable to be useful". I'm still not sure if there is a useful, pre-existing definition of "atomic operation" (c.f. "atomic transaction") that would back up your statement, and wasn't intellectually satisfied by your non-standard alternative, for the reasons I gave: it doesn't appear that either an RCU read (that is, a group of primitive read operations executed in sequence on an RCU structure) or a per-thread counter read (again, the group of primitive reads) is (logically) atomic by the new definition, unless one arbitrarily constrains the sequence of updates one can perform so much that it hardly seems worth using the word "linearizable" at all in the definition, since you've ruled out the possibility of establishing inter-process orderings. The main problem with the string of discussion, from which I just quoted your reply, was that we each assumed contradictory facts about whether the examples given were or were not atomic by your new definition. That is, I wasn't complaining that it did not match, I was complaining it was useless ;) Threaded conversations are temporally confusing. Let us say no more about it. --Chris Purcell 18:48, 25 May 2006 (UTC)Reply

Given your nomenclature, I should have said "a sequence of operations need not be linearizable to be useful" or some such. And I agree that it is probably more confusing than useful to call an RCU read "atomic". Then again, as one of my colleagues used to say, "Only those who have gone too far can possibly know how far you can go". And this conversation was not all that bad. The very early conversations about RCU (before the concept of "grace period" had been codified) were something else altogether!  ;-)
In any case, thank you for putting up with my consistent abuse of your nomenclature! PaulMcKenney 01:10, 26 May 2006 (UTC)Reply

An example of a linearizable sequence of non-atomic operations is as follows:

Thread 1 non-atomically increments variable A.
Thread 2 non-atomically increments variable B.

This is a degenerate case, as there is no shared data, but it does meet the letter of the definition.

liburcu edit

This can be used not only in kernel, but in userspace too. liburcu `a5b (talk) 17:56, 5 November 2009 (UTC)Reply

I fixed this, adding librcu to both the intro and the list of references. PaulMcKenney (talk) 20:28, 18 April 2010 (UTC)Reply

rcu_dereference vs. rcu_fetch_pointer edit

I changed the rcu_fetch_pointer back to rcu_dereference in order to be consistent with all known RCU implementations. I do sympathize with the desire for a different name, however, there is an rcu_access_pointer API going into the kernel. It gives you the value of the pointer, but does not make it safe to dereference it. So a self-explanatory API might be something like rcu_fetch_pointer_and_make_safe_for_dereferencing.

In light of this example, I believe that we can all agree on the relative merits of just plain rcu_dereference. PaulMcKenney (talk) 20:28, 18 April 2010 (UTC)Reply

Citations need cleanup edit

The (new?) citations on this page need some cleanup to conform to WP:RS:

  1. "Introduction to RCU" is apparently self-published and thus not a reliable source by WP standards
  2. Citation format for some others needs tweaking.

There is a complete list on Wikipedia:Reliable_sources/Noticeboard#RCU_page_reliably_sourced?, which I went to first as I'm not au fait with WP's policies (I've seen complaints elsewhere about self-published and journal citations being "not reliable" recently). --Chris Purcell (talk) 12:21, 27 September 2010 (UTC)Reply

I updated to address most of the comments, however, Active Banana reverted the works. Suggestions? PaulMcKenney (talk) 20:43, 28 September 2010 (UTC)Reply

Hmm, I appear to have accidentally called down aggressive gnomes... sorry about that. I'd say:

  1. Keep discussing on WP:RSN, if there's any fruitful avenue to pursue there. I was under the impression anything which has been peer reviewed is by WP's rules acceptable, so journals and theses would be fine.
  2. AB's revert appears on the surface far too aggressive; I don't see why citations of works not even contributed to by yourself should have been removed.
  3. I assume the COI is Wikipedia:COI#Citing_oneself, which says the problem is "excessive" self-citation. Are there other works (perhaps ones you have referenced in your own papers) that could be cited to avoid claims of conflict of interest?

--Chris Purcell (talk) 08:23, 29 September 2010 (UTC)Reply

No problem... These things happen.

  1. I will keep monitoring WP:RSN. If there are no responses, my thought is for either you or I to post what we agree is the appropriate resolution.
  2. I also left a response on Active Banana's user page, no response yet. I will keep checking, and if there is no response by Friday Pacific time, I will take whatever action the two of us agree on.
  3. Good point! I could add a section on uses of RCU, which could pull in a number of other authors. The "history" section is a bit old, and updating that section would pull in a few more references. In addition, some of the entries have multiple publications, and I could cite all of them rather than the most important.

Seem reasonable?

PaulMcKenney (talk) 13:00, 29 September 2010 (UTC)Reply

Sounds reasonable to me --Chris Purcell (talk) 13:27, 29 September 2010 (UTC)Reply

Your rework looks good. --Chris Purcell (talk) 19:40, 1 October 2010 (UTC)Reply

Thank you, glad it helped! PaulMcKenney (talk) 20:09, 1 October 2010 (UTC)Reply

Patent number 5,727,528? edit

The article indicates that RCU is somehow covered by US Patent 5,727,528, but that patent's title is "Control apparatus and control method of internal combustion engine" and seems unlikely to be related.

I suspect the right number might be 5,727,209. It is referenced by one of the later patents. — Preceding unsigned comment added by 66.129.246.4 (talk) 00:09, 7 February 2013 (UTC)Reply

Your suspicion is correct, and I updated the page accordingly. PaulMcKenney (talk) 02:04, 9 March 2013 (UTC)Reply

Too Technical edit

I addressed this by adding diagrams of how RCU insertion and deletion work. PaulMcKenney (talk) 02:05, 9 March 2013 (UTC)Reply

Contradiction per RFST edit

User RFST asserted a contradiction: "contradiction-inline|reason=an implementation that entirely disregards requirements about atomicity and memory semantics makes nonportable assumptions about the underlying architecture, so this is probably misrepresented or misinterpreted (requirements are far cheaper and maybe implicitly met but not entirely absent), and it is contradicted by more accurate information later in the article|date=August 2013". This contradiction tag is attached to the following sentence: "Concurrent RCU readers can then continue accessing the old versions, and can dispense with the atomic operations, memory barriers, and cache misses that are so expensive on modern SMP computer systems, even in absence of lock contention."

The problem is not with the article, but rather with RFST's jump from a lack of "atomic operations, memory barriers, and cache misses" to "an implementation that entirely disregards requirements about atomicity and memory semantics". Instead, it is in fact RCU's careful handling of requirements regarding atomicity and memory semantics that permits it to dispense with atomic operations, memory barriers, and cache misses. The IEEE TPDS paper by Desnoyers et al. covers this in detail. In lieu of paraphrasing several pages of that paper into this Wikipedia article, I am simply removing the "contradiction" tag. If someone has other ideas about how to approach this, please let me know. PaulMcKenney (talk) 02:47, 27 August 2013 (UTC)Reply

Reading a pointer that can be written concurrently by other threads requires an atomic operation (in the basic sense that all bytes are transferred indivisibly), otherwise it's a data race. That reads of aligned variables are inherently atomic on most architectures is very convenient, because it means that atomic reads don't have to do anything special (except if you also want specific memory semantics), but that doesn't make them non-atomic ("dispense with"), and I would even expect software analysis tools to flag a data race if the read were not marked atomic. The text should instead state precisely what is being dispensed with, presumably specific costly memory semantics and/or read-modify-write operations (a third possible aspect of atomic operations); currently "memory barriers" is listed separately so probably the author meant specifically read-modify-write here? Similarly, it would seem that "consume" memory semantics are implicit on "modern CPUs" (discounting Alpha), but it's still formally considered something that might theoretically require a barrier, at least until a revision of the relevant standard texts reflects that and deprecates its use (which I expect to become an even bigger mess than use or non-use of acquire and release by programmers who only consider the x86 architecture). But what I meant most of all is the contradiction with "It also executes any needed memory-barrier instructions for a given CPU architecture." for rcu_dereference_pointer() further down, and also with its example implementation that clearly shows the use of memory barriers (implicit on x86, but explicit on other modern CPUs). There may have been no contradiction in the mind of the author, but an encyclopaedic text should not require previous expert knowledge and/or telepathic abilities, so that's where I still locate the problem. RFST (talk) 19:48, 31 August 2013 (UTC)Reply

I made three changes to emphasize "atomic read-modify-write instructions" rather that "atomic operations" and "memory-barrier instructions" rather than "memory barriers" (as 204.57.119.28, forgot to log in). Does that help? PaulMcKenney (talk) 21:32, 20 September 2013 (UTC)Reply

The former change is informative, the latter not so much (read-modify-write is implicitly sequentially consistent on x86). But let me ask you first: what's the role of "smp_rmb()" instead of "smp_read_barrier_depends()" in "rcu_dereference_pointer()" elsewhere in the article? It seems to imply acquire semantics, but that also requires a separate memory barrier instruction on Power Architecture (lwsync or perhaps isync), so that contradiction is still not resolved. I've found what appears to be a Linux kernel patch by you (http://lwn.net/Articles/94241/), with rcu_dereference() using smp_read_barrier_depends(). So which is it, and is the rcu_dereference() in that patch code merely the name used in Linux kernel code for the concept rcu_dereference_pointer() or is there a significant difference? RFST (talk) 06:09, 21 September 2013 (UTC)Reply

Ah, note the comment on the "smp_mb()" in rcu_dereference(), which says that it is not needed on most CPUs. Nevertheless, I updated the code sample to use smp_read_barrier_depends() and also added verbiage noting that this is an empty macro on all CPUs but DEC Alpha.

I am not sure where the rcu_dereference_pointer() came from. Either I made a typo or I missed yet another attempt to change the API by editing the wikipedia page. Sigh... I have changed them back to rcu_dereference(). And the wikiblame site shows that it was a typo on my part when reverting someone's earlier edit (and that earlier edit made it into the curriculum of at least one university, believe it or not). PaulMcKenney (talk) 18:48, 22 September 2013 (UTC)Reply

OK, so the explicit code below was not "more accurate" like I thought at first. I believe that clears up this issue. How about reverting the change from "memory barriers" to "memory-barriers instructions" (sic), because those barriers are likely to be implied on an architecture like x86? RFST (talk) 22:32, 22 September 2013 (UTC)Reply