Talk:Pairing heap
This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||
|
It is requested that a computing diagram or diagrams be included in this article to improve its quality. Specific illustrations, plots or diagrams can be requested at the Graphic Lab. For more information, refer to discussion on this page and/or the listing at Wikipedia:Requested images. |
I think some explanations need refinement
edit- Isn't the definition supposed to be: empty heap | root + list of non empty pairing-heaps?
- Not specifying that the sub-heaps in the children-list cannot be empty-heap is kinda misleading.
- seems that merge is a symmetric operation, but not associative. so if we merge a list of sub-heaps in groups of 3 (or more), and then merge the resulting list - it would matter which direction we did each part. but merging in pairs is symmetric, so what does it matter in merge-pairs() whether we go left-to-right or otherwise on each level?
Open Question is Already Solved
editThe article states that:
"Moreover, it is an open question whether a o(log n) amortized time bound for decrease-key and a O(1) amortized time bound for insert can be achieved simultaneously.[8]"
This question appears to be solved. For example, Amr Elmasry in his TALG'17 paper "Toward Optimal Self-Adjusting Heaps" gives a data structure with O(1) insert and O(log log n) decrease-key.
--2001:62A:4:413:5D51:AF10:173B:A57C (talk) 12:41, 15 November 2018 (UTC)
- Thanks; the article has been updated. 167.88.12.231 (talk) 05:09, 2 July 2019 (UTC)
The "Often faster than array-based heaps" claim
editIn the text it is currently stated "that pairing heaps are often faster in practice than array-based binary heaps and d-ary heaps". When I read this yesterday I found this to be quite a strong and surprising claim so I checked the 3 cited sources. My conclusions was that the claim seems not justified and I removed it. Unfortunately this change was reverted.
Here I want to describe again in more detail why I believe that the current claim is unsourced.
Lets me focus on the most recent paper by Larkin, Sen, and Tarjan (2014) first because it seems to be of excellent quality, up to date and most relevant.
If you read that paper you see that the pairing heap wins only 9 out of 41 of the performed benchmarks while the implicit d-ary heaps wins 17 out of 41. At the end of the paper they just conclude that different data structures are favorable in different situations. Our strong statement certainly cannot be justified based on that.
The reverter of my change suggested the wording "sometimes" instead of "often" as a compromise. I see one more reason though why we should just remove the claim entirely: What does "array-based" mean exactly and is the implementation of the implicit heaps in the sources of that form?
In the last paper I think not because it actually uses pointers to individally managed memory nodes. This is done to keep track of nodes in the data structure and to support the "decrease-key" operation on them at a later point. However this obviously negates some of the cache related benefits and I argue that this is not truely "array-based" anymore. We don't define what this means in the Wikipedia article but I am pretty sure most people imagine an entirely implicit data structure without auxillary data and pointers. The linked source also benchmarks "truely" implicit array-based d-ary heaps (which they called "implicit_simple_n") but they can only participate in some benchmarks in which pointer stability is not needed. However in ALL 9 benchmarks this variant paricipated it was also the fastest and in particular faster than pairing heaps. Sometimes significantly so like in the heap sorting benchmark.
I also checked out the two other linked sources. The second one by Moret and Shapiro does indeed state that pairing heaps were faster than binary heaps for them. However this was 30 years ago when the computing landscape was quite different and we do not know to which extent their implementation was "array-based". In contrast to the newer paper they did not publish source codes or timings.
Therefore I propose to remove the "pairing heaps are often faster in practice than array-based binary heaps and d-ary heaps" statement again entirely.
2A02:810D:4BC0:33B8:64EA:C348:8F07:DFBB (talk) 22:53, 2 September 2020 (UTC)
- I don't agree with removing the statement entirely. The paper by Moret and Shapiro is somewhat old, but is not contradicted by more recent works I'm aware of. Larkin found mixed results, and we should say that; they also found that the best results came from dropping the node-tracking functionality needed for things like decrease-key, and we should definitely say that, because dropping that functionality is incompatible with a lot of the 'normal' things one would use a heap for, like Dijkstra's algorithm. It's clear that if you're doing a bunch of inserts and delete-mins, an "implicit simple" binary heap will beat all contenders. But in circumstances where you need to decrease or delete nodes, the latter two papers both report that pairing heaps often won out. Sneftel (talk) 11:47, 3 September 2020 (UTC)
- Oh, incidentally, I found nothing worth reporting in the Stasko&Vitter paper, and it sounds like you came to the same conclusion. Independent of everything else, I'd suggest we remove the reference to it. Sneftel (talk) 12:06, 3 September 2020 (UTC)
- Also also, just to further muddy the pond, note https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.117.9380&rep=rep1&type=pdf . They were quite negative about binary heaps, particularly small ones (but in 1987, meaning they're even more out of touch with modern CPUs than Moret&Shapiro). Nevertheless, their conclusion - "It depends on whether you need decrease-key" - is more clearly articulated. Sneftel (talk) 12:33, 3 September 2020 (UTC)
- I agree that that dropping node-tracking/decrease-key functionality is a key point in the whole performance discussion and we should state that. The "array-based" wording is my biggest concern with the current statement. I don't disagree with the later statement on Wikipedia that pairing heaps are "almost always" faster than pointer-based heaps. "Almost always" is again quite a strong wording but compared to pairing heaps the paper does show that they are pretty competitive.
- I am not an expert specifically in the usage of heaps but I question to which extent decrease-key is required to do 'normal' heap things. The heap implementations I have personally used in C++ do not support this and in particular the STL's "std::priority_queue" does not. Regarding Dijkstra's algorithm you can trivially reword it to work without node-tracking and/or decrease-key functionality. This can be done by always adding new nodes instead of adjusting the old ones and simply ignoring later occurrances when popping. Unfortunately this is not used in the Larkin, Sen and Tarjan paper nor is it explicitely described on the Wikipedia article on Dijkstra's algorithm as far as I can see. There is just a small unspecific note "Other data structures can be used to achieve even faster computing times in practice." with a reference to a paper in which the authors found that Dijkstra's algorithm was fastest with binary trees and no decrease-key. I cite: "Overall, our results show that using a standard priority queue without the decrease-key operation results in better performance than using one with the decrease-key operation in most cases."
- This paper also says about the binary heap without pointer stability/decrease key: "This is a completely array-based heap." This implies that according to their opinion the "implicit_n" implementation by Larkin, Sen and Tarjan isn't really "array-based".
- I agree that the paper by Stasko and Vitter is of little relevance to the statement. We probably should reference your paper and https://www3.cs.stonybrook.edu/~rezaul/papers/TR-07-54.pdf instead.
- I decided to add a short description of the decrease-key-less approach to the Dijkstra's algorithm article where it very vaguely just mentioned the "other data structures" before. 2A02:810D:4BC0:33B8:5408:DD09:5FC0:F54F (talk) 20:18, 3 September 2020 (UTC)
- In the case of Dijkstra's, admittedly the inferior asymptotic time consideration from not supporting decrease-key is largely theoretical; "normal" graphs don't lead to many decrease-keys. There are other algorithms like Bentley-Ottmann where decrease-key is more frequent, and that's where I've personally seen pointer-based heaps show a clear benefit over binary heaps.
- I think we're pretty much on the same page at this point... I put together an edit, what do you think? Sneftel (talk) 09:34, 4 September 2020 (UTC)
- I decided to add a short description of the decrease-key-less approach to the Dijkstra's algorithm article where it very vaguely just mentioned the "other data structures" before. 2A02:810D:4BC0:33B8:5408:DD09:5FC0:F54F (talk) 20:18, 3 September 2020 (UTC)
- The changes do look good to me. Thanks for doing the work. I would maybe mention the general d-ary heaps explicitely in the first sentence too but it's not crucially important to the point. 2A02:810D:4BC0:33B8:B8E0:D0DD:67A8:DFB4 (talk) 15:14, 4 September 2020 (UTC)
Pointer-based implementation
editThe statement "representing the children of a node by a singly-linked list: a pointer to the node's first child, one to its next sibling, and one to its previous sibling (or, for the leftmost sibling, to its parent)" is contradictive. Nodes in singly-linked lists do not have pointers to their previous member. Either, we change the wording to "doubly-linked list" or we switch the description with that of the alternative with the end-of-list flag. We might also add a link to Left-child right-sibling binary tree since this covers the idea in more detail. Albrecht J. (talk) 08:22, 5 November 2020 (UTC)
- I looked into who would add such blatantly self-contradictory information, and whaddaya know, it was me. I'd favor your first idea - switching to "doubly-linked list" in that sentence and then describing the alternative, as the paper does. Note that Left-child right-sibling binary tree is not *quite* enough: there has to be a parent pointer somewhere in order to implement decrease-key. Sneftel (talk) 11:07, 5 November 2020 (UTC)
- I changed "singly-linked" to "doubly-linked" because that's a no-brainer, and added a mention of the Left-child right-sibling binary tree. I think it could use some better wording from a more experienced editor though, since when talking about representing one tree as a different tree it's hard to avoid confusion. Olus2000 (talk) 07:55, 28 March 2024 (UTC)