Talk:Standard RAID levels/Archive 1

Latest comment: 8 years ago by Dsimic in topic Reducing seek time

Statistically independent

The article said due to the same environment the chance of failure for discs in a RAID wouldn't be statistically independent. That's not true, because though both will suffer the same impact on failure rate, the chance of a single disc failing is still independent from another one failing.

Only if failure of one disc would increase the chance of failure in the other, that would make it statistically dependent.

For example in standard condition chance of failure would be 5%, and in damp environment it'd be 10%. Then the chance of both discs failing would be 0,25% and 1%. --Bur (talk) 14:30, 3 September 2008 (UTC)

Actually the article mentions environmental conditions as one possible shared cause of failure, and lists a reference for it. Other things that were mentioned, and which you deleted, are common origin of the disks, and so forth. I believe deleting this paragraph was a mistake.

Hpa (talk) 23:32, 3 September 2008 (UTC)

Introduction

Admittedly, this page is likely to be accessed only be those with some prior knowledge of the subject, but even so, the page ought to begin with a more substantial introduction to orient the reader more clearly to the subject, in the manner of an encyclopedia entry. R.braithwaite 20:47, 30 December 2006 (UTC)

Granted this article is very lacking in an introduction, but doesn't RAID cover the basics to what RAID is? Cburnett 06:23, 31 December 2006 (UTC)
The article was split off in the first place because the original article was 213213246513 kb long! Sheesh. The whole point was to cut them down, not make each one as long as the last. Summarizing the section from the orig. article should be adequate.// 3R1C 15:20, 25 January 2007 (UTC)
I whole-heartedly disagree. Each RAID level could easily be made into its own article and should be encouraged to do so. Each one made into its own article can then be summarized in a paragraph on this page with a {{main}} link to the article. Cburnett 15:35, 25 January 2007 (UTC)

I've added a draft for the RAID 4 missing section. Hernandes 21:14, 23 January 2007 (UTC)

Mixing RAID modes

Is it possible to mix RAID modes? I remember reading about RAID 1+0. Is it possible to use RAID 0 with any other mode? It seems like you'd have twice as many drives to support the addition of RAID 0. (So RAID 6+0 would need a minimum of eight drives?) --72.202.150.92 05:54, 30 January 2007 (UTC)

Absolutely. See nested RAID levels. Cburnett 06:01, 30 January 2007 (UTC)

RAID-6 Resource

I personally found the description of RAID-6 presented here to be rather lacking on how the double parity works. A PDF at http://www.infortrend.com/3_support/pdf/RAID6_White%20Paper.pdf seems to go into a more detailed explanation, but I'm not informed enough to add it here. Maybe someone with more knowledge would be able to help? If nothing else, it is a nice resource for others.Quad341 20:18, 27 February 2007 (UTC)

The infortrend link above seems broken; but this seems like a good resource for the dual parity concept: http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf. — Preceding unsigned comment added by 171.66.169.117 (talk) 01:06, 25 May 2012 (UTC)

left-symmetric and left-asymmetric

While setting up a RAID system recently, I wondered whether I should choose the "left-symmetric" vs. "left-asymmetric" option.

A quick Google led me to "Left-symmetric offers the maximum performance on typical disks with rotating platters." [1] which answered my immediate question, but didn't really explain what it is. Later I found an illustration that seems to explain the difference: "Left-symmetric and left-asymmetric algorithm are demonstrated in Figures" [2].

Using our current Wikipedia notation, "left-symmetric" and "left-asymmetric" look like this:

left-asymmetric:

 A1 A2 A3 Ap
 B1 B2 Bp B3
 C1 Cp C2 C3
 Dp D1 D2 D3
 E1 E2 E3 Ep

left-symmetric:

 A1 A2 A3 Ap
 B2 B3 Bp B1
 C3 Cp C1 C2
 Dp D1 D2 D3
 E1 E2 E3 Ep

If I understand correctly, our RAID 5 illustration currently shows the "left-asymmetric" variant, which (if I understand correctly) no one really uses -- everyone uses "left-symmetric" because of its greater performance.

(By inspection, I can see that "left-symmetric" has better performance, because if you read any 4 consecutive blocks -- for example, if you read A2, A3, B1, B2 -- left-symmetric can read them all simultaneously from all 4 disks, while left-asymmetric maps 2 of them to the same disk -- in this case, A2 and B2 -- so the read from B2 must wait until A2 is done).

Could someone confirm for me that everyone that uses RAID5 really uses "left-symmetric"? (And if not, why would anyone use anything else?)

Could someone update the RAID-5 section to illustrate the type of RAID5 that, in practice, everyone uses?

--68.0.120.35 01:29, 11 March 2007 (UTC)

RAID 6 explanation

The figure used in the explanation of RAID 6 seems to be very confusing as it mentions two parity blocks per row, although the second syndrome is definitely not parity, as explained in the HPA's paper cited below.

MJ 18:12, 29 May 2007 (UTC)

Some of the content in the RAID 6 section is copied verbatim from the following:

http://support.dell.com/support/edocs/storage/RAID/PERC6ir/en/HTML/en/chapter1.htm#wp1053533 —Preceding unsigned comment added by 97.81.31.51 (talk) 18:22, 13 July 2008 (UTC)

RAID 5 Performance

The article says that raid 5 performance is nearly as good as a raid-0 array with the same number of disks. This isn't quite accurate.

In theory at least, a raid 5 array should be the same performance as a 1-disk smaller raid-0 array (works out at the same amount of usable storage). Using a 4-disk raid 5 array as an example, every 4th block on the disk would be a parity block, which would need to be skipped when reading. Skipping a block should take the same amount of time as reading it, after all the disk still has to spin the same amount. This means that each disk only spends 3/4 of it's time reading. Multiply this across all 4 disks (parallel reading), it works out at 3x single disk speed, or the same as a 3-disk raid-0 set.

If anyone can verify this it should be added to the page.

85.133.27.147 13:33, 15 June 2007 (UTC)

You are 100% right about reads but RAID5 always had a lacking performance of writes. When you are writing a small amount of data (less than a full stripe) RAID5 has to: read the old data blocks, calculate parity, write new data blocks and the new parity block. In the disk arrays I've tested (EMC Cxx, IBM DS4xxx, Sun FLX) such writes have practical 50% to 65% performance impact in terms of writes/second. Full stripe writes are neither theoretically or practically affected. --Kubanczyk 06:23, 25 June 2007 (UTC)
This section needs to be made consistent with the main RAID article. There, the performance problems of RAID5 is debunked as "myth". Are there really no data available on this? Ketil (talk) 08:28, 9 February 2011 (UTC)
I would not be so fast with this conclusion since the problem that writes do need potential multiple reads still exists, however it is somewhat masked by the use of cache to avoid reading blocks for parity generation. In a system with no or just limited cache, write performance is poor. Depending on the algorithm used to calculate the parity, performance can be worse the more disks are used. Even so it is potentially possible to recalculate the parity block based on the changed block and the previous parity block, and in such an action only having to read two and write two blocks, the operation of reading and writing the same block usually requires a full spin of the drive more for a commit than on a regular single drive. So from a average of at least .5 spins to write a block we get to 1.5 spins basically tripling write access time! In practical applications one can eliminate this problem partially by using large amounts of cache (Which is why Kubanczyk is not seeing any performance issues in the platfroms he mentions!) Write-Back cache (which always should go with a battery backup for the cache) will further help to eliminate these problems! --Fgwaller (talk) 04:56, 12 May 2011 (UTC)

Origin of watercooler raid.jpg?

Does anyone know the copyright status of this image? It's an easy to understand illustration of RAID levels, and might make a great addition to the article, possibly in a redone/computer generated format. Is it accurate too? http://www.edvt.net.nyud.net:8080/Pictures/raid.jpg Family Guy Guy (talk) 23:05, 14 February 2008 (UTC)

RAID 1 vs 0

The section on RAID 1 seems to imply that there is a data access performance improvement for RAID 1 configuration. I just built a new computer and configured RAID 0 believing that would yield the best performance (albiet with some degradation in reliability). Data access seems to be very fast, but would RAID 1 have been just as fast, with higher reliability? From the data published here, it seems like the best approach would be to add a third drive and move to RAID 5. Is that correct?

Dadman 5 (talk) 21:03, 22 April 2008 (UTC)

0 & 1 provide no data integrity.
5 does parity so you can at least detect data corruption (no requisite that periodic integrity checks actually be done though), but 5 comes with a computational cost (particularly for software raid).
What is "correct" depends on your goal. You speak of performance and reliability which are, [very] generally speaking, exclusive. Wikipedia is also not a help forum so I suggest finding a forum somewhere... Cburnett (talk) 00:01, 23 April 2008 (UTC)
Are you sure? Level 1 and 5 provide similar (i.e. no) integrity: they can both detect data corruption, but neither will point out where the corruption took place, or what the correct data should be. RAID 6 can do this, but data corruption is not high on anyone's agenda yet. Rightfully, disk failure has been the main concern, as losing a disk has much direr consequences than losing a bit. It would be easy and desirable though to have raid 6 solutions that periodically run integrity checks and fix corrupt data. See an informal discussion on so-called bit-rot here. This extra level of protection will make raid 6 more and more popular I hope.

116.89.224.136 (talk) 07:35, 3 October 2008 (UTC)

RAID 1 failure rate

Both disks failing within 3 years is not equivalent to data loss. For data loss, both disks must fail *at the same time*. Otherwise, the failing disk can be replaced and the RAID restored.

In summary, to calculate the probability of data loss, one has to make some other assumptions:

  • Time to replace a failed disk and restore the RAID: 24 hours
  • Failure within 3 years: 5% (uniform)
  • Disk failure independency

Giving:

  • P(overlap) = 2 / (365*3)
  • P(data loss) = 5% * 5% * p(overlap) = 0,0005%

For simplification, I considered that the prob of a disk failing twice is zero. —Preceding unsigned comment added by 85.62.99.137 (talk) 13:48, 25 February 2009 (UTC)

RAID controller

Does RAID must require a RAID controller, I know BIOS support is a must, but unsure about the controller part. --Ramu50 (talk) 04:18, 11 July 2008 (UTC)

RAID 5

How many additional drives are need to implement RAID 5 on the system, if the implement method of RAID requires an operating system be in place on Drive C: Kagemaru the Ninja of the Shadows (talk) 03:12, 6 December 2008 (UTC)--Kagemaru the Ninja of the Shadows (talk) 03:12, 6 December 2008 (UTC)

Source problems

I've done some reworking and cleanup of the section on "RAID 5 disk failure rate" because, while cleaning up other issues like letter-case and unnecessarily long sentences crammed with technical details, I discovered significant problems with several sources which seem to be endemic to this article (and indeed to many Wikipedia articles). I've fixed a few, but there remains much else to be done. Besides reformatting some of the sources in order to be able to see the prose well enough to clarify it:

  • I removed most of the latter part of the second paragraph (discussing "the relatively stagnant unrecoverable read-error rate") because it was hard to understand the point trying to be made without reading all the sources. Those, in turn, left something to be desired: two blogs and two anonymous discussion-group posting. I left the first two in, as they may be salvageable or replaceable. But discussion group postings are nothing but anonymous opinions and assertions and cannot be used as references. (See WP:RS.) In place of the second portion, I rewrote what I believe to be the point of that half-paragraph, and stuck {{fact}} tags on my totally unsourced assertions. (After all, officially I'm no more reliable that any other unpublished writer.)
  • I rewrote the text on STEC's Zeus SSDs because it talked about transactional performance in a section on failure rates. In fact, I completely changed the promotional sense of the info on STEC's Zeus (based on the corrected source for that info) because, ironically, much faster performance should lead to much smaller MTBF (at the same error rate, which the source suggests is the case). More fact-tags here as well. I'll leave the question of comparing sector failures (STEC's specs) to bit-read failures (Western Digital's specs) for someone else to write (backed by reliable, independent sources). I suspect the answer isn't as simple as a mere unit conversion.
  • Actually, now that I think about it, I didn't notice any of the comparisons made in our article between HDs and SSDs (related to RAID-5 failure rates) in the sources, only some single-drive performance comparisons from STEC. (I wasn't as thorough on this, so I could be wrong.) It's not enough to source the raw data; we need to source the conclusions drawn (including mine!) or delete it an unsourced original research.
  • I removed a comment stuck on one sub-heading:
===RAID 5 performance=== <!------[RAID] links here-------->
I couldn't figure out what this was supposed to mean. First of all, [RAID] is not a link. If they meant that [[#RAID]] (an internal link) elsewhere in the article is linked to this heading, they're wrong – it links to the heading/anchor "RAID", which doesn't exist. (I didn't find any such link anyway.) If they meant that one or more other articles were linking to this section (the usual rationale for such a comment), they should have made this clearer, so editors here would reconsider any attempt to change the heading. If they meant to insert another link to this heading (using a <span id> or <div id>), they forget to add the tag (or someone deleted it without deleting the comment). In any case, an HTML anchor for "RAID" is a bad idea because the entire article is about "RAID", not just this section. Any link to this section can use regular wiki syntax: [[Standard RAID levels#RAID 5 performance]].

I also did some checking on references and cleaning them up, since bare links and external links labeled only with a rephrasing of the content are almost guaranteed to fail eventually. (Always include the title of the content, verbatim, as well as the work it belongs to, the website name or publisher, the author, and the date. Many of these are missing from web pages, but putting in as much of this as you can will help others (maybe even yourself) later when some of these links break.) But I've just run out of time after all the other research, so I'll have to leave it to other editors. Feel free to post to my talk page if you have any questions. ~ Jeff Q (talk) 01:40, 28 June 2009 (UTC)

Arggh! I just noticed that the section on RAID-5 performance has significant overlap with "failure rate", too! ~ Jeff Q (talk) 01:41, 28 June 2009 (UTC)

There is a problem with how raid 5 failure is described. in Section "RAID 5 usable size" it is stated 'One caveat is that unlike RAID 1, as the number of disks in an array increases, the chance of data loss due to multiple drive failures is decreased. This is because there is a reduced ratio of "losable" drives (the number of drives which may fail before data is lost) to total drives.' This in fact is wrong. The chance of data loss due to multiple drive failures is actually increased as you can only have 1 drive fail in a set (until the set is rebuilt) before the set fails. Hence a larger number of drives results in a larger chance of more than one drive failing before the set is rebuilt. Incidentally this also means there is an increase in potential failure of the set as the element size increases too. The statement containing 'ratio of "losable" drives' will also need to be rewritten I believe. 58.6.5.175 (talk) 00:15, 23 November 2011 (UTC) David Godfrey

It already says precisely what you have said. If there is a problem it is one of comprehension. Crispmuncher (talk) 13:58, 23 November 2011 (UTC).
One version says "increased" and the other "decreassed", so how come you see these as saying the same thing Crispmuncher? Of course the problem is one of comprehension, however the issue is how to make that which is communicated clearer LookingGlass (talk) 14:15, 13 May 2012 (UTC)

Raid 5 Gotchas

Now that disks are relatively large (1TB is not uncommon), there exists a substantial chance that while an array is being rebuilt, at least one single-byte error will occur on one of the remaining disks. The raid controller will then throw the 2nd disk out of the array. As a result, instead of silently corrupting a single byte (which we probably wouldn't even notice in most cases), the array will totally self-destruct.

This happened to me; it's especially problematic with a group of disks all from the same batch (as when one dies, the rest are potentially failing, and doing a complete read may trigger an error).

My suggestion is to use only Raid 1 or Raid 6; Raid 5 is now too risky for large arrays. For more, see eg: http://www.maximumpc.com/article/news/raid5_may_soon_be_obsolete —Preceding unsigned comment added by 87.194.171.29 (talk) 15:33, 29 July 2009 (UTC)

Error in RAID 5 calculation?

I'm really not an expert in RAID, but... The RAID 5 section states:

RAID 1 or RAID 1+0, which yield redundancy, give only s / 2 storage capacity, where s is the sum of the capacities of n drives used. In RAID 5, the yield is s x (n - 1).

The first part looks incorrect to me. If I understand RAID 1 right, it's the size of the smallest unit that should be considered, not the total size; that statement assumes equal-size disks. Instead, it should probably be something like

... give only s storage capacity, where s is the capacity of the smallest disk (or whatever word is right, since it also discusses RAID 1+0 - maybe array would be better than disk?)

Then the s formula of RAID 1 and the s * (n-1) formula of RAID 5 are consistent: with n = 2, then RAID 1 and RAID 5 have the same capacities.

Did I get all this right? (I didn't make the change in the text because I wanted someone else to confirm that I'm not crazy :P) --67.22.228.34 (talk) 16:10, 22 September 2009 (UTC)

I think there's an error in the calculation for capacity for RAID 5 in the initial summary. If s is the sum of the capacities, then four 1 TB drives will have s=4. Going by the calculation, the yield will be 4 *(4-1) = 12 TB! It's not s * (n-1) but (capacity of single disk) * (n-1). Since I'm not familiar with RAID very much, I'm not yet making a change. But I'm still fairly certain that I have it right. Opinions anyone? Bhagwad (talk) 16:25, 26 October 2009 (UTC)
I've made the changes... Bhagwad (talk) 16:29, 27 October 2009 (UTC)
Good change. --Enric Naval (talk) 01:24, 30 October 2009 (UTC)

Outdated Information

According to the page, "JEDEC is preparing to set standards in 2009 for measuring UBER (uncorrectable bit error rates) and "raw" bit error rates (error rates before ECC, error correction code)". What was the result? —Shanesan (contribs) (Talk) 20:43, 21 April 2011 (UTC)

Orthography.

I don't know much about RAID levels so I came here to learn. However, it seems that it needs a little bit of work on the grammatical side. I'm not an expert so I'd rather leave it for someone who knows more about the subject. I just wanted to point it out. — Preceding unsigned comment added by 58.164.218.183 (talk) 05:39, 23 June 2011 (UTC)

RAID 5 latency formula

In the section on RAID-5 performance a formula is given to calculate the rotational latency of a raid-5 write request in terms of number of disks in the set. I added a "citation needed" since the formula is not commonly known (and conflicts with others I have seen). What is the origin of this formula? Can the author post the derivation? — Preceding unsigned comment added by 128.221.197.57 (talk) 12:46, 12 September 2011 (UTC)

The previous formula (1-2^-n) was incorrect: it calculates the relative frequency at which none of the platters require more than half a revolution, not the expectation value (mean) of the number of revolutions. I have corrected the formula, and added the formula for the median which is also quite informative. I don't have any textbooks handy so I couldn't find a reference, but these formulae _are_ well known, and easy to verify by numerical simulation even if you don't understand the probability & statistics. So, I don't think this formula really needs a citation—but it certainly wouldn't hurt if somebody comes across one. (It should be a textbook though: the incorrect formula has been on Wikipedia for some time now. That misinformation has probably propagated.) I'll keep an eye out for a reliable source. Gerweck (talk) 03:25, 23 April 2012 (UTC)

The N/(N+1) fraction you present is only correct for full stripe writes, which are uncommon in most computing systems, particularly so in OLTP workloads.
For the most common case, non-full stripe writes, the formula you present is incorrect. The N/(N+1) fraction is the latency for an arbitrary RAID level where the ganged write size is N, not just RAID-5. For instance, in RAID-5 you must update 2 physical disks for l logical write, N here is 2. Thus the average latency is: 2/(2+1) or 2/3 of a revolution. For RAID-6 the ganged write size is 3, as 3 disks must be updated for each logical write and so the average latency is 3/(3+1) or 3/4. This is the reason that an average RAID-6 write is always slower than its RAID-5 counterpart, even on an idle array. R6 is slower (longer) by 3/4-2/3 or 1/12 of a revolution, about 0.67 ms on a 7200 RPM drive. In either there is absolutely no dependence on the number of disks in the array, it is on the ganged write size (1 for RAID 0 and single disks, 2 for RAID-4, RAID-5, and RAID-1, 3 for RAID-6).
The exception is the very uncommon RAID-3 where N is the disk count in the array, because in an R3 write, each disk must always be updated, i.e. all writes are full stripe writes. Non-standard mirroring (RAID-1, e.g. in OPENVMS) allowing more than 2 disks is also an exception for the same reason and in the same way, i.e. N is the number of mirrors. — Preceding unsigned comment added by 168.159.213.53 (talk) 19:10, 14 June 2013 (UTC)

Striping /RAID 0 failure

This section has been unsourced since March 2010. Anyone have a source? - CompliantDrone (talk) 23:54, 19 September 2011 (UTC)

Propose moving RAID 6/Double parity here from /wiki/Non-standard_RAID_levels

RAID 6 has become very common. Propose moving it here from Non-standard_RAID_levels. Thoughts? Ktoth04 (talk) 21:29, 8 September 2012 (UTC)

Image keys needed

Somebody desperately needs to add a sentence or three to each "RAID x" section, explaining what the "A1", "B3", etc, in each of the images mean exactly. Apparently the meaning is slightly different for each RAID level, but that's not obvious. Currently there are no explanations of the letter+number image labels anywhere on the page. 99.102.75.166 (talk) 19:20, 22 April 2013 (UTC)

Changing the RAID 0 and RAID 1 diagrams

I agree with a prior post that the numbering scheme on the diagrams varies subtly. I'm willing to fix this (and the textual references to the diagrams), but am a bit worried by the fact that many other docs point to the same diagrams. What is the best way to undertake the fix? Upload new files with different names? What happens to all the translated pages?

Thanks,

Alan G. Yoder, Ph.D. 23:13, 22 April 2013 (UTC) — Preceding unsigned comment added by Alanyoder (talkcontribs)

Is this really RAID0?

"RAID 0 can also be used with a single disk in cases where it is desired to have no RAID properties, but the disk controller requires mounting drives with RAID." – While this very well can be called RAID0 by the controller software, is it really? You could call it RAID1 just as well: striping without stripes is just like mirroring without a mirror. It's a matter of definition, but is it encyclopedic? No offense meant, I'm open for discussion. Zac67 (talk) 21:16, 25 June 2013 (UTC)

Somebody added that line a mere hour before you saw it. Since I'm inclined to agree with you and they did not cite their sources, I've removed the text. – voidxor (talk | contrib) 05:00, 26 June 2013 (UTC)

RAID-0 performance

The statement about performance of Raid0 in games yielding negligible benefits is incorrect, games regularly access large chunks of data not just during level loads but also during gameplay, Oblivion is a perfect example, the difference between loading times for new area content between Raid0 and non Raid would most likely be more than 20 seconds each time, meaning it's essentially the difference of smooth uninterrupted gaming compared to sitting and waiting for content to load, hardly "minimal". —Preceding unsigned comment added by 203.214.17.51 (talk) 05:19, 8 September 2007 (UTC)

Get more RAM. One point that seems to come up pretty frequently in these types of discussions is that RAID-0 boosts overall performance when there is a virtual memory bottleneck. However, you can solve this problem with RAM, which is cheaper, more robust, and easier to setup. Afterwards, the virtual memory bottleneck goes away and RAID-0 once again has no measurable effect. Of course, if you have reliable sources to the contrary, share them. Ham Pastrami 09:36, 8 September 2007 (UTC)
The 4th reference from the page is obsolete in my opinion. It's 5 years old and since then many things happened in the industry and RAID-0 is still a very good choice - performance wise (eg. when high density platter harddrives / solid state disks with SATA-II interface are used) —Preceding unsigned comment added by Nashu2k (talkcontribs) 21:19, 15 June 2009 (UTC)
It is NOT a good choice.. RAID 0 performance gains were a MYTH, and the only real winners in that setup were the HDD manufacturers.
One could argue that any RAID config is obsolete by today's standards (mid 2011). We now have a wide variety of external hard drives on the market, sky drives, multiple TB drives, SSD drives, instant streaming/syncing for backup possibilities, hard drives on our mobile devices, etc. People were initially setting up these RAID configs *BEFORE* this type of technology was around, in order to combat and deal with the complete lack of backup sources at the time.
But even back a decade ago, the whole RAID 0 performance gains were ridiculously minimal. Maybe 1 out of every 10 games you had would load you into a new map a few seconds sooner than the rest of the server.. and then you'd have to run all sorts of synthetic benchmarks to convince yourself you actually gained something out of that little venture.
Not to mention RAID 0 doubles your risk of data loss, since all you need is ONE of the two drives to fail, and you're out 100% of your data. If there is even a SINGLE person still interested in a RAID 0 configuration, I'd be dying to know WHY.
The answer to your question is: RAID 10 and RAID 100. — Preceding unsigned comment added by 208.85.211.196 (talk) 19:43, 26 April 2012 (UTC)
RE: RAID 0 for gaming, If you played games with large complex maps or a wide range of textures loaded often (in particular MMO titles, but also more demanding FPS multiplayer games like Battlefield) RAID 0 delivered a huge boon in loading performance and certainly not a myth (quite apart from first hand knowledge it was well written up with benchmarks posted by major tech sites). It was most noticeable if you had a high end CPU and plenty of RAM as the bottleneck to level loading would more obviously be the storage volume. I concur in thinking the section is inaccurate and also out of date now and with the advent of much more affordable SSD drives I don't think many gamers do it now (and that extends to non-gaming scenarios too). — Preceding unsigned comment added by 78.144.248.168 (talk) 18:29, 29 September 2013 (UTC)

RAID 0 statistics

There is unfortunately no evidence provided to suggest that the lifespan of a disk shortens simply because it is in an array as the article would have you believe. The formula supplied suggests that with just two disks the chance of failure nearly doubles! Where does this pearl of wisdom come from? MTTF is MEAN time to failure - i.e. average - and has never espoused to be a specific value.

Consider this - MTTF for a two disk array should in fact be double based on the work done by the drive - the drives should each read and write HALF of what a double capacity single disk would have to do, since the work the drive does is split. At worst the reliability of the RAID 0 drives will the average of their original respective reliability, since MTTF is very specifically involved with averages. There is absolutely no support at all for the baloney theorem proposed, and it's curious but not surprising that this article has been contested the way it has. —Preceding unsigned comment added by 75.156.38.133 (talk) 02:26, 22 March 2011 (UTC)

There is, without doubt, much wrong with this article. However, there is even more wrong with this person's argument. He clearly doesn't understand probability theory. 81.136.202.93 (talk) 13:18, 2 October 2013 (UTC)
I agree; he is incorrectly applying elementary arithmetic to probability theory. – voidxor (talk | contrib) 05:36, 4 October 2013 (UTC)

RAID 1

The article claims that most RAID 1 implementations do not read from all mirrors simultaneously to improve performance. This seems incorrect. I know of no implementation that does not have better read performance with RAID 1 than non-RAID. The APPLE cite no longer exists. — Preceding unsigned comment added by 64.199.166.14 (talk) 15:19, 6 January 2014 (UTC)

Since the content was not cited, other than the broken link which I could not locate a copy of, I removed the dubious content. Thanks for the heads up! – voidxor (talk | contrib) 07:31, 9 January 2014 (UTC)

Summary table

I like the idea of a summary table. But what does Increased Space? mean? Shouldn't it be "Amount of Lost Space" as a percentage, maybe? For instance, Shouldn't Improved Speed be split into read and write speed? --Pmsyyz (talk) 15:10, 22 January 2014 (UTC)

I removed this table as one just like it (but cited, more evolved, and less vague) appears on the RAID article. I don't know why we have two articles that compete with each other to list RAID levels. Also, several numbers in this table contradicted with what's in the other table. – voidxor (talk | contrib) 06:30, 23 January 2014 (UTC)

Dynamic disk pooling

I think a discussion of dynamic disk pooling belongs in this article. This basically uses the RAID 5/6 technique on units smaller than a disk, for example with blocks assigned in 8+2 RAID 6 arrangement across a much larger number of physical disks. The blocks are somewhat randomly assigned to the disks, making rebuilding faster because the lost blocks on the failed drive can be distributed to multiple drives during the rebuild. Warren Dew (talk) 04:19, 15 April 2014 (UTC)

technical errors

The RAID-0 section has some errors such as saying that the seek time will be divided by the number of drives if you're reading less than the stripe size. It also is discussing an arbitrary array and suddenly starts assuming two disks. It sort of looks like someone who didn't understand the existing text added some sentences at the end of a previously good paragraph. — Preceding unsigned comment added by 62.232.55.118 (talk) 16:47, 5 June 2007 (UTC)

Over a single disk

'over a single disk' at the end - shouldn't this be 'acting/appearing as a single disk to the OS' or something? Didn't edit it myself as not confident I'd not miss some obscure edge cases. Also the 'reliabilty' link is of tenuous relevance at best. Grumpypierre (talk) 12:10, 21 May 2014 (UTC)

I fixed both issues. Thanks for pointing these out! – voidxor (talk | contrib) 05:14, 22 May 2014 (UTC)

Space efficiency of RAID 1+0

I changed the Space efficiency of RAID1+0 from 2/n to 0.5 and that of RAID1 from 1/n to 0.5, because with an increasing number of discs, the efficiency does not decrease. Wbloos (talk) 11:49, 12 November 2014 (UTC)

Hello! Actually, space efficiency for RAID 1 decreases with more disks added into the same RAID set; for example, space efficiency is 1/3 for a RAID 1 over three disks, and 1/2 in case of a more standard setup with two disks. Thus, 1/n is the right formula for RAID 1's space efficiency, where n is the total number of drives involved.
With RAID 1+0, which basically involves striped RAID 1 sets, things are just a bit more complicated, please allow me to explain. For the most basic RAID 1+0 setup, which involves a total of four drives arranged in two striped RAID 1 sets, space efficiency is 2/4; for a more complicated RAID 1+0 setup with a total of six drives arranged in three striped two-disk RAID 1 sets, space efficiency is 3/6. At the same time, nothing forbids three-disk RAID 1 sets to be striped; in that case, a total of six drives would yield a 2/6 space efficiency. Thus, the correct space efficiency formula for RAID 1+0 would be stripes/n.
Went ahead and edited the summary table that way. Hope you agree. — Dsimic (talk | contribs) 01:28, 13 November 2014 (UTC)
Dsimic, you sound like an expert! Could I coax you into looking to where you originally learned such things, and sprinkling some much-needed references into the comparison table? – voidxor (talk | contrib) 07:04, 13 November 2014 (UTC)
Well, thanks, I know a thing or two. :) Over time I've played with different RAID setups and associated things such as speed/space efficiencies, so it's quite hard to provide a pointer to an isolated source of information. Just as you've pointed out, our comparison table chapter needs references badly, and I've tried to find good ones – but, for some reason it's a difficult job. Maybe this chapter from the Operating Systems: Three Easy Pieces book could be used? I haven't read that chapter yet, but reading a few other chapters from that book turned them out to be quite good. — Dsimic (talk | contribs) 07:26, 13 November 2014 (UTC)
Just went quickly through that RAID chapter, and unfortunately it lacks a few things. For example, it assumes that a RAID 1 set always involves two drives, while, for example, Linux mdadm can use more than two:
A RAID1 array can work with any number of devices from 1 upwards (though 1 is not very useful). There may be times which you want to increase or decrease the number of active devices. Note that this is different to hot-add or hot-remove which changes the number of inactive devices.
I'll keep looking for good references, or better said, references that wouldn't require too much WP:SYNTH. — Dsimic (talk | contribs) 08:15, 13 November 2014 (UTC)

RAID 0 data redundancy

I just removed link to data redundancy because this has nothing to do with the redundancy in database design. If there are some storage experts, can you try to modify this text a bit more so that someone else does not link this incorrectly again. I am a database administrator, not a storage expert. Emilijaknezevic (talk) 17:14, 16 January 2015 (UTC)

Hello! Well, IMHO the trouble lies within the Data redundancy article itself, as there are no reasons to restrict the concept of data redundancy only to databases. For example, if there are two identical files on an SD card or wherever, that's still a clear example of data redundancy. That article should be called something like Redundancy (databases), and Data redundancy should be a disambiguation page. Thoughts? — Dsimic (talk | contribs) 18:03, 16 January 2015 (UTC)
Thanks for pointing this out – I've started adding a bit of general lead to the article, so we should be able to link it again. -- Zac67 (talk) 18:28, 16 January 2015 (UTC)
Thank you both Zac and Dragan. This is much better now. As for the "data redundancy" (noticed also changes there), in my opinion data redundancy in terms of database denormalization is a logically different thing than data redundancy for error correction. I would also add the third meaning of redundancy: duplication (i.e. mirroring, in storage terms). Disambiguation page or just sub-titles within the page will do. I might try to write something later; although I am rather reactive and not proactive in editing Wikipedia. Emilijaknezevic (talk) 20:52, 16 January 2015 (UTC)
Exactly, data redundancy in databases isn't there for the purpose of recovering from data loss; instead, it's a kind of anomaly in a database design, which may also sometimes be intentional. That should be clearly noted in the Data redundancy article. — Dsimic (talk | contribs) 21:02, 16 January 2015 (UTC)

Misleading terminology

This article uses misleading terminology for the levels. The Standard, cited in reference [1], uses Raid-0, Raid-1, etc. while all three options (Raid 0, Raid0, and Raid-0) can be found, I think it is appropriate to follow the original standard. In checking for proper usage, I came first to Wikipedia and saw "Raid 0" but that is the least common usage in my experience so, thankfully, Wikipedia had the link to the original standard so I was easily able to check that and see that it uses "Raid-0". While this is a nit, we should propagate the correct usage.Rlhess (talk) 15:53, 14 September 2015 (UTC)

Hm, I wouldn't call that "misleading terminology". However, non-hyphenated variants (RAID 0, RAID 1, etc.) are much more used all around, and that supports the decision to use them in the article. — Dsimic (talk | contribs) 01:19, 15 September 2015 (UTC)
Agreed there is nothing misleading in not using a hyphen and OBTW, it should always be RAID, never Raid. Tom94022 (talk) 05:36, 15 September 2015 (UTC)
WP:COMMONNAME says to use the most popular name and spelling. That, by far, would be "RAID 0". See the Google results lists: "RAID-0" (8.6 million results; catches "RAID 0" too, of which the overwhelming majority of results format it) and "RAID0" (a mere 0.9 million results). – voidxor 19:08, 15 September 2015 (UTC)

Reducing seek time

Reducing seek time is a measurable effect of RAID-0 (or just about any RAID level). If you look at a single disk:

seek - read - seek - read

and a RAID-0 with two disks simultaneously doing:

seek - read
seek - read

both seek times overlap and are thus just counted half (two seeks in the time window of a single one).

Plus I agree with Tom, it should always be RAID --Zac67 (talk) 16:49, 15 September 2015 (UTC)

Since RAID 0 is striping I don't think there is any reduction in seek time and if disks are not synchronized it might accually increase access time. All disks in a RAID 0 implementation must complete their seeks, rotate and transfer data before the access is complete. In the example above
  • A single disk, one read:
  1. seek - rotate - read
Access complete----------^
  • A RAID-0 with two disks, one read:
  1. seek - rotate - rd
  2. seek - rotate ------ rd
Access complete-------------^
The illustration depicts unsynchronized RAID 0 taking longer than a single disk. Depending upon the degree of syncronization and data transfer length the time to complete the access might be longer or less than a single disk.
RAID 1, mirroring, on the other hand can overlap seeks and will reduce access time for reads. Write performance will depend upon synchronization, buffering rules, etc. Tom94022 (talk) 17:16, 15 September 2015 (UTC)
Right, RAID 0 can, at least in theory, reduce seek times if the data to be read is contained within a single chunk because it is the scenario in which it can overlap seeks. However, any interleaved requests for data not contained within a single chunk would prevent those benefits in seek times. It may sometimes reduce seek times for larger reads as well, but that wouldn't be guaranteed as something purely dependent on the positioning of drive heads and relative positioning of the platters. — Dsimic (talk | contribs) 19:34, 15 September 2015 (UTC)
Tom, why does the second read only take place after the first one? This wasn't even true for parallel SCSI (with disconnect/reconnect) and certainly isn't with SATA, SAS, or FC. My "seek" included the rotation latency, this isn't really significant in this case. Of course, you'd need overlapped I/O and a decent queue depth. Chances are that when one I/O has been initiated and is pending, another I/O can be started for the 2nd (3rd, ...) disk and both (all) seeks can overlap which does reduce average seek time. Spindle synchronization is totally neglectable once you have a decent amount of cache and command queueing. Dsimic, you're right in that RAID-1 can reduce seek times in any case while RAID-0 can only reduce it when the workload fits: a 2 disk array results in a 50% chance for a successive random access to go the other disk, thus 100% seek time (first I/O) + 50% seek time + 50% no extra time (second I/O) → 150% seek time ÷ 2 I/Os = 75% seek time overall. A 3 disk array with 3 I/Os: 100% seek time (1st disk) + 33% seek time + 67% no time (2nd disk) + 67% seek time + 33% no time (3rd disk) → 200% seek time ÷ 3 = 67% seek time overall, and so on. For sequential disk access, seek times are (nearly) irrelevant, just throughput is. --Zac67 (talk) 21:30, 15 September 2015 (UTC)
I was trying to illustrate the effect of lack of spindle synchronization on a single access. Overlapped I/O and queuing multiple accesses should increase overall system performance (IOPS) but it really can't do anything but hurt any one access. Assuming the same drives at the same location at the start of an access, the seek times will be essentially the same and the read time in the RAID 0 will be 1/2 that of the non-RAID single drive (same amount of data, two paths). Drive 1 of RAID 0 will have the "same" average latency (1/2 revolution) as the non-RAID Drive 1 (so I illustrated them identically) but if Drive 1 and Drive 2 of the RAID 0 system are not synchronized then on the average Drive 2 will be 1/2 a revolution away from Drive 1 giving the additional latency to its read. Since rotational latency is long compared to seek time this can be big. If one queues a number of accesses on either system seek ordering reduces access time but, as I think as they say in the communications space, only God can reduce latency :-). Even read before doesn't reduce the latency, it just makes the read time even smaller. If someone has implemented seek to an angular spot on a track (so as to truly minimize seek and latency) and the spindles are not synchronized then it gets really interesting and longer since the access mechanisms might not be going to the same place, Tom94022 (talk) 22:25, 15 September 2015 (UTC)
While the command queueing and resulting on-device optimization/reordering of command execution usually improve the overall performance, they also take out any reasonable chances for doing some math on what's to be expected performance-wise. Here are some benchmarks that compare the latency, IOPS and bandwidth of various RAID layouts: http://louwrentius.com/benchmark-results-of-random-io-performance-of-different-raid-levels.html. As visible in this chart, RAID 0 has worse latency than RAID 10, which pretty much translates to associated seek times. — Dsimic (talk | contribs) 02:50, 16 September 2015 (UTC)
Thanks for the charts; I'll look at them later. FWIW I suspect it is really hard with intelligent interfaces to separately measure seek time and latency time. I suppose u could do it statistically by repeating the same pairs of reads (having long seek distances and short data lengths) and ascribing the difference to rotational latency. The more I think about it, the more I conclude there is no seek overlap in any RAID except RAID 1 (and its variants) where the actuators can operate independently on reads. In all the other RAIDs don't the actuators operate in synchronization - they all move simultaneously to their target blocks and then transfer in parallel. More later :-) Tom94022 (talk) 05:38, 16 September 2015 (UTC)
The test results "RAID 10 seems to have the upper hand in all read benchmarks," comports with my analysis that a mirror, in this case a striped mirror (a three stripe mirror perhaps), can during read overlap the two stripes while all other configurations have synchronized actuators. As expected seek and rotational latencies were measured together. I don't think this was a particularly good test since it is a software RAID thru a file system, as the author says, too many variables. Note the write I/O latency graph, the RAID 10 (two sets of data written) is about twice the time of RAID 0 (only set of data written) which makes some sense, the effect of the software RAID is shown by the RAID 5 and RAID 6 being even higher than RAID 10 when u would expect them to be somewhere's in between but for the time to calculate the parity block(s). And the periodic spiking of RAID 6 I/O latency makes no sense from a RAID perspective, as the author says, "... with RAID 6, so there must be something wrong". Tom94022 (talk) 16:05, 16 September 2015 (UTC)
Actually, even RAID 5 and 6 have the possibility for doing some reads partially in parallel, as they leave one or two drives idle during each read operation, respectively, and those idle drives could be doing some other read operations at the same time. However, that's more of a theoretical possibility for doing reads in parallel, and equates to a slightly similar real-life situation as it's the case for RAID 0; though, RAID 5 and 6 can't do any complete reads from the idle drive(s) no matter how small they are (so there's no possibility for reduced latency), which RAID 0 is capable of for in-chunk reads. Speaking of the benchmark, I agree that measuring everything through a file system mangles the things a bit, but we should still have a good relative comparison of different RAID levels. Also, as you've noted, having a battery-backed controller might help in improving the performance of RAID 5 and 6, compared to the performance of a software RAID implementation that doesn't assume use of any special controllers. — Dsimic (talk | contribs) 16:39, 16 September 2015 (UTC)