Talk:High-availability cluster

Difference between Mirroring, Failover, Clustering and Load balancing? edit

Latest comment: 16 years ago3 comments2 people in discussion

What is the difference between Disk mirroring, Failover, High-availability clusters (a.k.a. Clustering?) and Load balancing?

I would like to see a concise comparison of concepts across all of these pages, as all of these concepts seems tightly bound, and perhaps some of them are identical.

--Eptin 22:00, 9 September 2007 (UTC)Reply

They are all different but overlapping to some degree; a clear explanation would help. Let me review all the articles and see where to begin. Georgewilliamherbert 20:40, 10 September 2007 (UTC)Reply

I added a Node reliability section to the article here on HA clustering, which should address some of your questions. To explain a bit here in summary:

Disk mirroring is a technology to help keep computers from failing. It's usually used in HA clusters, to make the individual computer nodes used as reliable as possible. It's often used by non-clustered systems, for the same reason.
Failover is the term for the process when a service is moved to another system, either due to a crash or due to manual intervention. HA clusters use failover as the way to get around system or application crashes.
Load balancing clusters are a different type of cluster, where the application can be run with many copies at the same time (like webservers) safely, and some type of system in front of it parcels out the work to be done. HA clusters are for when only one copy of the application can be running - if the one computer running the application fails, then it's moved elsewhere. Load balancing has several copies running at once. Any of them can serve a request, and if one crashes others will keep going.

I hope this is helpful. Georgewilliamherbert 21:09, 11 September 2007 (UTC)Reply

Article semiprotected for 1 week edit

Latest comment: 12 years ago5 comments2 people in discussion

New and IP editors are currently disabled from editing the article, as they keep restoring the spam-ful vendors section.

This is not OK behavior on Wikipedia - see WP:SPAM Georgewilliamherbert (talk) 02:46, 30 September 2010 (UTC)Reply

Actually the article seems to have been copied from some Linux-HA manuals and is just too narrow in focus. And has errors, like much of the series on cluster computing. History2007 (talk) 21:59, 28 December 2011 (UTC)Reply

I think you may have the attribution backwards for Linux-HA manuals, though I haven't compared them to the article here recently. It was not written drawing from them.

I'm a major author of the article here and I've been doing HA clusters since the late 1990s, originally in Veritas VCS and Sun Cluster, and through to the current day with current HA technology. I've been a systems architect for longer than that. Can you specify what errors you think you see here and where the focus is too narrow? Georgewilliamherbert (talk) 19:46, 29 December 2011 (UTC)Reply

I may get to do more on this at some point, but not immediately. However, at the very least this article should mention and explain terms such as "fencing", "checkpointing", "MTTR", "STONITH", etc. etc. and their role in availability management. The term "hearbeat" appears in the article just once without any explanation of the concept whatsoever. Statements such as "many clusters consist of many more, sometimes dozens of nodes" just seem strange, given that many clusters now have thousands of nodes. Statements such as "HA clusters usually utilize all available techniques to make the individual systems and shared infrastructure as reliable as possible" are just not true, given that different clusters may use different techniques. And of course, there are "zero references". I think as a start one can start by directly referencing Robertson's early paper, show the evolution of the ideas and then build up to modern HA architectures. And of course, inline WP:RS references are essential in Wikipedia. But I will stop now. History2007 (talk) 04:22, 31 December 2011 (UTC)Reply

There are certainly more than zero references now. Inline references are nice, but somewhat of an academic exercise IMHO. That doesn't mean someone shouldn't add them, of course. The style guideline which over-emphasizes the requirement for them was written by an academic editor, and while not spurned or scorned by everyone else, is not slavishly followed.

I would caution against approaching this overly much as an academic exercise. Typical readers aren't CS students or industry practitioners.

That said, explaining terms would be a significant value-add to the article, the now-apparently-vanished explanation of heartbeat networks should reappear in some form, and the statement you sited above about "utilizing all available..." certainly leaves something to be desired. Most clusters implement slightly more than minimum required techniques, very good ones implement well much more than minimum requirements but still don't strive for "all available" which would require numerous experimental and in many cases architecturally redundant or conflicting concepts.

Georgewilliamherbert (talk) 04:33, 31 December 2011 (UTC)Reply

Add topic