Talk:Storage virtualization

Latest comment: 2 years ago by Peter Flass in topic Seems very muddled

Article Improvements edit

  • Suggested structure.... deleted as its now in place in the article ... Notice the complete lack of vendor and product references. Some of the storage virtualization technologies are still emerging and others are waning, so it would be hard to give a fair treatment to all. If anyone wants to take that on, be my guest! Plowden 20:57, 19 August 2006 (UTC)Reply
I'll happily start drafting some content as outlined above, may need some grammatical checking, but content wise not a problem (been working in Storage Virt for 5+ years) Agree that leaving vendor specifics and product references out is a good idea - also an unbiased view of implementation approach is needed, as each vendor tends to try and sell theirs as the 'correct' approach. Baz whyte 11:38, 25 February 2007 (UTC)Reply
Sure, go ahead. It looks quite good. I may also chip in sometimes. :) --soumসৌমোyasch 02:42, 26 February 2007 (UTC)Reply
  • OK, so I've made a stab at this, hopefully a vast improvement on what was there, but still needs a few bits filled in - and most def needs gramatical checking / wikifying! Baz whyte 20:37, 26 February 2007 (UTC)Reply
  • I have undone the edits changing the grammatical correctness of "data" being a plural noun. In the general case "data is" is almost always used rather than the strange looking "data are" - while I understand that a single piece of data is a datum - this almost never used. data being both singular and plural is the more common use in the IT industry and as such "data are" does not read well. Baz whyte 18:52, 29 July 2007 (UTC)Reply
  • A subjective statement such as '"data are" does not read well' doesn't stand up well against objective facts of grammar. In fact, to many, "data is" is just plain grating and distracting. Then again, I also tend to use predicate nominatives correctly, which sounds odd to some folks. In any case, most technical journals require that "data" be used as a plural noun (see the Wikipedia on Data), and this is a technical article. Rknasc 20:08, 30 July 2007 (UTC)Reply
  • OK, so I bow down. I've corrected all meta-data instances to use the correct grammar and replaced the plan "data is" with "information is" - hopefully everyone is happy :) Baz whyte 20:44, 3 August 2007 (UTC)Reply
  • I see some issues with this article. I will try to classify them:

1. Section "Implementation Approaches" -> "Storage-device based" -> "Cons" "Storage utilisation optimised only across the connected controllers" - That's pretty obvious fact and it could not be defined as disadvantage.

** What I meant here was that take USP-V as an example, you cannot virtualize other storage connected to the SAN, only controllers connected to the internal switch ports of the USP. Baz whyte (talk) 21:04, 25 February 2009 (UTC)Reply

If you do not connect a controller by some means to the hosts (by FC SAN, iSCSI LAN, proprietary connection etc.), it will not be available anyway. I would say instead, that "Depending on the specific storage virtualization product it is possible that some controllers could not be virtualized, because of interface incompatability or design limitations". Most of the cases when a storage controller could not be virtualized fall in the following categories:

- The controller has proprietary / obsolete interface not supported by the storage virtualization product(i.e. different from Fiber Channel or Ethernet)
- The controller requires host-based drivers / management software for fail-over or other purposes
- The controller is "locked" by design to interoperate only with specific (usually) same-vendor host (e.g. HP NonStop Modular I/O System, IOAM / FCDM)

"Replication and data migration only possible across the connected controllers and same vendors device for long distance support" - Absolutely not true - most of the storage virtualization solutions allows you all kind of remote replication services in purely heterogenous environment. I would say this is one of their greates benefits - just to name some of them: DataCore SANMelody / SANSymphony, Hitachi Data Systems USP V Enterprise Storage Controllers, EMC Invista, IBM SVC etc. etc.

    • Again here I guess I was meaning more that you need one of those controllers at the other side (in the case of USP-V) whereas DataCore/SVC type appliances you just need the appliance - but I guess the difference is minimal Baz whyte (talk) 21:04, 25 February 2009 (UTC)Reply

"Downstream controller attachment limited to vendors support matrix" - That's a valid statement, but it should be part of the story about the proprietary and non-industry standard controllers. Also "Downstream controller" phrase is a bit confusing, becasue the underlaying controllers service both down & up streams of data ;-). Usually the storage virtualization vendors use the term "Back-end storage" "I/O Latency, non cache hits require the primary storage controller to issue a secondary downstream I/O request" - This is generally true, but the same happens in every storage controller if there is cache miss - the I/O is being requested from the slow hard-drives. The main reason for the added latency is the fact that you have more "hops" from the host to the "downstream" (back-end) storage controller - the I/O request goes from the Host through the SAN switch to the Virtualization Appliance / Server / Controller, then if the cache misses, a secondary I/O request is generated that goes to though the SAN switch to the "donwstream" (back-end) storage controller, which checks its cache, and if it misses as well, the I/O request is serviced by the hard-drives. The way back is again through the Storage Virtualization Appliance / Server / Controller. On the other hand the added latency is neglible in 99% of the cases (a few microseconds per I/O cache miss) in comparison with the typical I/O response time which is in the range of milliseconds. It is important to note another fact - the DataCore software works on any x86 server and uses up to 80% of the available RAM for block-level caching. The latest 32-bit SANSymphony supports 32GB of cache and this cache is A magnitude cheaper than the storage controllers' cache memory (10-20 times lower price in USD per Gigabyte) and also a magnitude faster than the controllers' cache (e.g. DDR2 1066MHz vs. SDRAM 266MHz). So, in a real-world scenario the virtualization usually improves the overall performance considerably, in terms of response time, IOPS and throughput (MB/s). 2. Section "Implementation Approaches" -> "Storage-device based" -> "Specific Examples"

- IBM SVC is not exactly a "Storage-device based", because it is an appliance that could be added to any SAN and could utilise wide range of 3rd party storage controllers
- DataCore software products are just software products that could be run on any industry standard x86 server with Windows 2003 Server OS. They could utilise an extremely wide range of 3rd party storage controllers AND appliances, including any internal storage presented in the x86 server they are running on. The supported interfaces to the back-end storage controllers are Fiber Channel (1/2/4/8Gb), iSCSI (10/100/1000/10G Ethenret), Infiniband 4x. The storage compatability matrix is only limited to the storage controllers capable of presenting LUNs to Windows 2003 Server through any of the previously mentioned interfaces, which practially means almost every storage controller on the market I could think of.

3. Other suggestions of improvements - A bit more detailed explanations about the Thin Provisioning / Dynamic Provisioning features provided by most of the storage virtualizaiton products (more than 10 vendors currently) - the thin provisioning is cute, but it has drawbacks as well - deleting files from a virtual volume (VLUN etc.) DOES NOT reclaims / frees-up data blocks from the back-end storage. It can't by design, because all solutions on the market work below the Host OS level and THEY DO NOT understand what's going on in the file sytem of a particular host. If you delete a file, only the file system of the host knows which blocks are logically free. The fact that some file systems or 3rd party utilities can zero-out the blocks once occupied by deleted files does not help. A block full of zeroes is still a block with data. Unfortunately, there is no "NULL" or "EMPTY" value in the mathematics and no file system could write a "NULL" or "EMPTY" value in a storage block. This is done by the various RDBMS... - Addition to the "Capabilities" section - Interface bridging - almost all available solutions could bridge Fiber Channel SANs to iSCSI LANs regardless of the hosts and the backed storage controllers - Addition to the "Capabilities" section - Vendor / Storage Arragy Generation bridging - try to implement replication of volumes between IBM DS400 and EMC Clarion storage controllers for example. No way, unless you virtualize them before... - Addition to the "Capabilities" section - Policy Based Quality of Service - Hitachi and DataCore support it in their latest products. Pooling is one thing, setting IOPS and Throughput QoS polices per VLUN / Storage Pool is a much bigger thing. - Addition to the "Capabilities" section - Continuous Data Protection / Storage Transaction Journaling - currently this feature is truely supported only by DataCore Traveller Add-On. This allows in-band synchronous replication through Fiber Channel to a CDP server which logs in-order all block-level transactions to a LOG database (just like Oracle or MS SQL RDBMS do). This gives you the possibility to select any past moment in time (even years ago, up to a minute) and then generate a 100% consistent virtual representation of a CDP protected VLUN containing the data as they looked in the selected past moment. In order to achieve consistency the system looks for the last successfull commit flag. This works for any VLUN, regardles of the file-system or the host that uses the LUN... - There are even more thins to incude, but I am running out of time now.

Please, provide some feedback and I could do some editions to the article.

10th of May 2008 - Vasko —Preceding unsigned comment added by 213.240.236.130 (talk) 21:53, 9 May 2008 (UTC)Reply

Needing work edit

Host based - need some more detail and contents on the various styles of host based storage virtualization
  • Almost all commercial OSs ship with a stipped down version Veritas (symantec) Volume Manager whic enables very basic storage virtualization at the host level with capabilities like, but not limited to, RAID and volume management.
  • Block vs NAS - needs completing
    NAS
  • NAS or Network Attached Storage is a dedicated file serving computer whose primary finction is to store and retrieve files in rsponse to clients which connect through, normally and most often, an ethernet LAN.
  • Since the NAS box serves files, most NAS boxes run a file system locally, like WAFL in NetApp Appliances
  • Mostly implemented in environments where file sharing is a dominant feature and file storage consolidation becomes priority.
  • NAS is most preferred when Applications are concerned prmarily with file-level operations and have inherent tolerance towards delays in file operations
  • Relatively cheaper than a Fiber-Channel SAN solution because it leverages existing Ethernet LAN infrastructure for transport
  • SAN

    Will add soon :)

    John.joji (talk) 16:05, 31 August 2008 (UTC)Reply

    Thoughts ? edit

    Vasko, added a couple of comments to clarify above, but otherwise feel free to make ammendments.

    I am concerned that this page is reverting back to its old state which used to just list all of the vendors doing SV products. The point of the re-write was to remove ANY vendor specific references and provide an independent view of what storage virtualization is, and what approaches can be taken to implement virtualization NOT a list of products that can do it - I propose we remove the Spam that is the lists of products - otherwise it ends up with vendor X coming along an saying - we do that and adding a reference to their product. If nobody objects I will remove in a week or so. Baz whyte (talk) 21:04, 25 February 2009 (UTC)Reply

    storage virtualization is the apparent pooling of data from multiple storage devices , even different type of storage devices , into what appears to be a single device that is managed from a central console.

    Seems very muddled edit

    This article seems a bit confused. It never gives a good overview or introduction to the topic. There's a lot of material here, but its organization seems to be unclear. Peter Flass (talk) 15:59, 11 May 2021 (UTC)Reply

    Other concepts edit

    This page talks about virtualizing subsystems and networked devices. Is there a place to talk about virtualizing a single device? CMS Minidisks were an early example of making a single physical disk look like multiple smaller devices. I would think that PC-DOS partitition tables {Disk partitioning) are later examples of the same. Then there are RAM drives, which also implement a virtual disk. Peter Flass (talk) 15:57, 11 May 2021 (UTC)Reply