Open main menu
Backup was a good articles nominee, but did not meet the good article criteria at the time. There are suggestions below for improving the article. Once these issues have been addressed, the article can be renominated. Editors may also seek a reassessment of the decision if they believe there was a mistake.
September 10, 2007Good article nomineeNot listed
WikiProject Computing / Software / Hardware / Security (Rated C-class, Mid-importance)
This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
Taskforce icon
This article is supported by WikiProject Software (marked as High-importance).
Taskforce icon
This article is supported by Computer hardware task force (marked as High-importance).
Taskforce icon
This article is supported by WikiProject Computer Security (marked as High-importance).
WikiProject Digital Preservation (Rated C-class)
This article is within the scope of WikiProject Digital Preservation.
C-Class article C  This article has been rated as C-Class on the quality scale.


Contents

"Archive file" as a term used in this articleEdit

The old lack-of-common-terminology problem in this article , especially in the "Enterprise client-server backup" section, has surfaced again. I added, as the second paragraph in the lead of the Archive file article, the following (I have changed the formatting of the refs): "The word "archive" in this term of art goes back in computing at least as far as Multics However that use of the word conflicts with its non-computer definition: "A place or collection containing records, documents, or other materials of historical interest: old land deeds in the municipal archives". What is stored in an archive file is not necessarily old or of historical interest. Consistent with the non-computer definition, several enterprise client-server backup applications—which do not use the term "archive file"—use the term "archiving" to describe a backup operation that deletes data from a client source once the data's backup is complete.[3][4][5]".

JohnInDC didn't like that paragraph; he deleted it with the Edit Summary "Rv GF edit - article is about what an archive file is, not what it isn't; this doesn't belong in the lead in any case". I had put the paragraph in that article because its first two sentences would IMHO be just as intrusive as the second paragraph in the lead of this article. AFAICT this article is the only one in Wikipedia to use "archive file" to mean the destination of a backup—rather than as a file formatted in special ways that go beyond the features in the OS's normal filesystem. The creators of Multics needed a jargon term to distinguish that special formatting, so they concocted "archive file"—disregarding a couple of thousand years of use of the word "archive" in its non-computer sense.

The jargon word "archive" persists in computer systems mainly abbreviated as 'ar' in filename extensions such as .tar and .jar. However AFAICT no backup applications use "archive file" as the term for the destination of their backups. The old version of Code42's CrashPlan application used to use the term when CrashPlan could back up to a folder on a computer or an external drive, but the term was apparently dropped when CrashPlan was limited to backing up to cloud destinations. The term is also used in MS Outlook, and in a securities application named TradeStation, but neither of these is a backup application.

Moreover the destinations for backups by Apple's Time Machine, a backup application that is very widely used and is mentioned in this article, are not specially-formatted files but normal macOS filesystem files. DovidBenAvraham (talk) 06:20, 27 December 2018 (UTC)

There's no confusion. An "archive" is a place for old documents, records, etc. of historical interest. An "archive file" - a different term - is where backups or other data are stored, whether or not they're important, or historical, or what have you. Twain said, “The difference between the almost right word and the right word is really a large matter. ’Tis the difference between the lightning bug and the lightning.” Which is true; but it's funny because in everyday use, people know the difference (and don't need it pointed out to them). JohnInDC (talk) 15:48, 27 December 2018 (UTC)
What you mean is that in your opinion there should be no confusion. However I think you are failing to consider that at least the first 7 pages of this article are supposed to be written for readers who are not already familiar with computer jargon, and the third paragraph of my comment that starts this section adequately demonstrates that "archive file" is sparsely-used computer jargon.
If that concept is difficult to understand, think about a reader of your Twain quotation who has grown up in a big city and has never seen a "lightning bug". I grew up in the suburbs, but I've lived in Manhattan for 50 years—which is less time than since my brief encounter with the predecessor of Multics—and I don't think I've ever seen a "lightning bug" there. If you look at references 24 and 25 for the article linked to in the first sentence of this paragraph, you'll see (my Portuguese is sketchy, but my knowledge of French enables me to get the general sense of reference 24) "Our findings suggest that light pollution is likely to adversely impact firefly populations ...." per reference 25. It may not be the result of light pollution, but AFAICT "archive"—in its non-computer sense—is likely to be much more familiar to the average reader of the article than "archive file".
That's why I intend to turn the first through third sentences of my rejected Archive file paragraph into a note following the term "archive file" in the first sentence of this article. I'll leave out the fourth sentence, since that's already in a note in the lead of the "Enterprise client-server backup" section of the article. DovidBenAvraham (talk) 21:45, 27 December 2018 (UTC)
This isn't Wikitionary. We don't have to define ordinary words for ordinary users. Please don't clutter this, or any other, article with this superfluous clarification. We don't need it any more than we need a note saying that the "lightning" in "lightning bug" isn't actually electrical or dangerous. Even though many of our readers are from cities. JohnInDC (talk) 22:07, 27 December 2018 (UTC)
Ah, I see you went ahead and added the note anyhow. I disagree that it's needed at all; but if it's going to be included it only needs to make the simple point that "archive files" aren't an "archive". We don't need to know about Multics or any of the rest of it. JohnInDC (talk) 23:22, 27 December 2018 (UTC)
Your simplification is fine; it gets the necessary distinction across. Thanks, and Happy New Year. DovidBenAvraham (talk) 01:11, 28 December 2018 (UTC)

I've been wondering for weeks what idiot 😉 introduced the term "archive file" in the first sentence of this article's lead. I just searched View History, and the idiot was me on 03:24, 13 May 2018 (UTC). Since we seem to be stuck with "archive file" as the best non-proprietary term for what a backup is copied into (I'd prefer "media set", but only Retrospect and MS SQL Server use that term), I've changed other terminology to that wherever appropriate—mostly in the " Enterprise client-server backup" section. DovidBenAvraham (talk) 04:29, 2 January 2019 (UTC)

‎Changed "data repository" to "archive file" in the "Manipulation of data and dataset optimization" section only; in other sections "data repository" means something larger than a single archive file. BTW, that "something larger" in this article doesn't correspond to what's defined in the "Data repository" article; however I'll let someone else deal with that problem—which is another example of the unreferenced received wisdom as of 2007 which pervaded the first 7 pages of this article. DovidBenAvraham (talk) 05:48, 15 January 2019 (UTC)

What I said about Time Machine in the last paragraph of the first comment in this section turns out not to be strictly true. Further research shows it depends on local destination disks allowing hard links to directories, which is a not-standard-Unix capability of Apple's old HFS+ filesystem but not of its new APFS filesystem. So I guess you could say that Time Machine, too, uses specially-formatted "archive files" as local disk destinations. DovidBenAvraham (talk) 16:33, 23 April 2019 (UTC)

In the article, I globally changed the term "data repository" to "information repository". When Austinmurphy inserted "data repository" in this article around 2006, he couldn't have known that it would later be made a synonym for data library—apparently around 4 July 2017 by JakobVoss. What Austinmurphy did in 2007 is a textbook example of Making Stuff Up as it used to be done on WP. The two refs he gives under Backup#Managing the information repository do not, as far as Google Books will let me see them, use the term "data repository". DovidBenAvraham (talk) 16:09, 29 April 2019 (UTC)

"Information repository", too, turns out to be an example of Making Stuff Up in 2007—for the apparent purpose of selling a (proposed at that time?) product. Both references in that article are dead, but I looked them up on the Wayback Machine. The first ref is just a session listing on a conference agenda, but the second one actually gives an session agenda detail. The session instructor is "Mark Armstrong, President, SoleraTec". The editor who created that article is User:SoleraTec, who has since had an article deleted "because the article appears to be a clear copyright infringement." And would you believe that SolaraTec LLC sells surveillance-oriented products, which are "based on the Phoenix Information Repository [my bolding], an active, tiered secondary-storage environment comprising a mixed set of storage resources"?
Maybe it would add to the value of the link from this article if I add a lead paragraph to the information repository article, quoting and referencing this 2005 definition of "repository" by Margaret Rouse. Do you think Soleratec LLC and/or Mark Armstrong—who is its founder and CEO—will object if I do that? DovidBenAvraham (talk) 17:31, 30 April 2019 (UTC)
I added that lead paragraph to the information repository article, and so far have had no objection. Let me clarify that nobody seems to have a better term than "information repository" for the unstructured/full-imaging/incremental/differential/reverse-delta/CDP organizing strategy described in the information repository models subsection of this article. The Kissell ref talks about "varieties" of "archives", but that would be confusing considering the way this article uses the term "archive file". DovidBenAvraham (talk) 03:48, 5 May 2019 (UTC)
I re-arranged the Information repository models sub-section lead paragraph for clearer distinction between repository organization and backup rotation scheme, and inserted or substituted "organization" in each of the type paragraphs. There doesn't seem to be any other group noun for the unstructured/full-imaging/incremental/differential/reverse-delta/CDP organizing strategy, and "strategy" didn't seem to quite fit. DovidBenAvraham (talk) 10:50, 9 May 2019 (UTC)
Now that I've inserted or substituted "repository organization", it would be very easy to change "organization" to "model", which may be what Austinmurphy wanted back in 2004. "Model" sounds even fuzzier to me; what do you editors think? DovidBenAvraham (talk) 00:27, 11 May 2019 (UTC)
I substituted "backup method" for "organization" in most occurrences in Information repository models, thus shifting to more-standard terminology and avoiding confusion between sense 1 of the noun "organization" as used in this section and sense 2 as used elsewhere in the article. Feel free to revise this edit, as I'm still struggling with lack of standard terminology. DovidBenAvraham (talk) 01:21, 18 May 2019 (UTC)

Is Rmokadem's added reference in the "Performance Impact" paragraph of the "Limitations" subsection spamming?Edit

(adapted from a section I put into Rmokadem's Talk page)

I noticed his 23 February 2019 addition to the "Performance Impact" paragraph of the "Limitations" subsection. To be frank, I don't understand the relevance of that added reference to the text of the subsection—but then I only have a non-PhD-track Master in Computer Science degree. I see that his contributions on 23 February to other WP articles also consist of references to academic papers of which someone with the name Riad Mokadem is a co-author.

Is he just spamming non-relevant references to those papers into WP articles to promote his career? If he doesn't explain the relevance of his added reference in this section of the Talk page, I'm going to revert his edits to the article.

His "Disk Backup Through Algebraic Signatures in Scalable Distributed Data Structures" paper says in the Abstract that it is about backing up the RAM on each storage node onto the local disk. However the Backup article's lead is fairly clear—and I've now made its first sentence clearer—that it is about backing up data already on disk into an archive file. That is why I don't think his article is relevant. DovidBenAvraham (talk) 07:15, 25 February 2019 (UTC)

I moved the Rmokadem ref in this article to the Clustered_file_system article, which is where it seems to be applicable. DovidBenAvraham (talk) 05:20, 3 March 2019 (UTC)

Rewrite of Continuous data protection sub-section by User:Pi314mEdit

Yesterday User:Pi314m merged the former Continuous data protection article into this one. He thereby wiped out a separate article without any prior discussion on that article's Talk page. I believe that's a violation of WP rules, and I intend to be up his tuchus about that.

But what really bothers me is that, after I spent about 5 hours editing his inserted sub-section into early this morning, Pi314m reverted all my editing. My editing was necessary because the Continuous data protection article left out an inconvenient fact about many recent "CDP" backup applications, was poorly worded in places, and had references from 2007 - 2012 that were basically marketing blurbs for software that no longer exists—in one case written by a marketer whose software company went out of business after an uncontested fine for bribery.

The inconvenient fact is that many recent backup applications that call themselves "CDP" are really "near-CDP", meaning that they are actually doing incremental backup at short intervals to track changes—well-known examples being Apple Time Machine and CrashPlan. My editing included a paragraph that revealed the inconvenient fact, but Pi314m reverted that paragraph out. Instead he substituted "An alternative is snapshots, a bear-continuous [sic] solution, whereby restore points are periodically created to track changes", which links to a non-existent sub-section of the article using a non-standard definition of "snapshots".

As to poorly worded in places, let's first consider "Ideal continuous data protection is that the recovery point objective is unlimited in content". My edit changed that to "In true CDP the recovery point objective is zero", which is consistent with definition in the WP article I linked to. Let's next consider "CDP differs from RAID, replication, or mirroring by enabling rollback to any point in time. A related technique is journaling." My editing changed that to "CDP is often done by saving byte or block-level differences rather than file-level differences, making it dependent on journaling."

The references that were basically marketing blurbs included those by Bezad Behtash, Posey, and the infosectoday article by the uncredited Pat Hanavan. The eWeek article 's author, Bobby Crouch, was in a class by himself; in 2010—when he wrote the article—he was the Product Marketing Manager at FalconStor Software, which in 2012 agreed to pay $5.8 million in fines for bribery. The company was further charged with falsifying its corporate books and records associated with the bribery. Reliable sources, indeed! DovidBenAvraham (talk) 19:57, 22 May 2019 (UTC)

As to Continuous Data Protection, my 17:42, 6 June 2019 (UTC) comment below is a later and clearer explanation of how, and IMHO why, Pi314m messed up my revisions after his initial merge-in. DovidBenAvraham (talk) 02:08, 7 June 2019 (UTC)
Here's the procedure that Pi314m should have followed. I think the merger itself was uncontroversial, so it needn't have been discussed on the merged-in article's Talk page. It's Pi314m's wholesale reversion of my edits afterwards that should have been discussed on this Talk page. In the first paragraph of my section-starting comment I've summarized 3 types of fault in the merged-in article—ones that my edits attempted to correct. I've now discovered a 4th problem with Pi314m's post-merger edits; he moved the pre-existing "Create synthetic full backups" paragraph from the "Performance" sub-section to a new named paragraph in the "Incremental" sub-sub-section. Pi314m evidently didn't understand "from one archive file to another" in the first sentence of the paragraph he moved, which explains why the paragraph—with a clarification of its second sentence that keeps the refs—belongs underneath the "Enterprise client-server backup" section. The single-sentence paragraph just above the paragraph, in its current position, describes an operational technique used by such non-enterprise backup applications as Apple's Time Machine for condensing a single archive file. The moved paragraph OTOH describes an enterprise backup administrator facility for creating a copy of an archive file, such as a longer-term tape copy of a disk archive file. This second copy is typically created to satisfy legal retention requirements, and may therefore intentionally omit some backups—either because there is no need to retain them or because retaining them would violate regulations such as the European GDPR Right_to_erasure. I intend to move a clarified version of the moved paragraph back to the "Performance" sub-section, while enhancing the single-sentence paragraph just above it in the "Incremental" section to explain that it refers to an operational technique for condensing a single archive file. DovidBenAvraham (talk) 00:21, 26 May 2019 (UTC)
I did what I said I intended to do in the last sentence of the preceding comment, but Pi314m promptly messed that up by moving my clarified version of the paragraph from "Performance" under "Enterprise client-server backup] up-article to Synthetic full backup" under "Storage, the basis of a backup system". This shows conclusively that Pi314m hasn't read enough of the article to understand one basic thing: the first seven screen pages were written starting around 2007 for a person who needs to know enough to set up backup for his/her individual computer, but the last 2.5 pages were written—primarily by me—for a person who needs to set up backup for his/her enterprise. So Pi314m shouldn't have moved the clarified paragraph from the back to the front of the article, but he tried to make up for that with a cutesy-poo trick: he wrapped the enterprise-applicable part of the moved paragraph in "ref" tags—which didn't identify it as a Note because he omitted "Group=note" from the lead tag. His having not read the article is further demonstrated by his beginning that moved paragraph with "Tapes of disk archives ..."; if he had read the first sentence of the article he would have seen that it establishes "archive file" as the term—used consistently throughout—for the output of a backup, and "Tapes of disk ..." sounds like Pi314m is still mentally stuck in the days of IBM System/370. And, BTW, Pi314m totally wiped out the "Automated data grooming" paragraph because he couldn't logically move it up front under "Backup types". DovidBenAvraham (talk) 05:05, 27 May 2019 (UTC)
I've put a copy of the original "Create synthetic full backups" paragraph in front of the copy of the original "Automated data grooming" paragraph that was already in this Talk sub-section below. I did it that way in order to have only one place for Notes and References. DovidBenAvraham (talk) 17:19, 28 May 2019 (UTC)

I just discovered another thing that Pi314m did that's definitely a violation of Wikipedia rules. He "merged" the first paragraph of "Information Repository" into this article (before "Backup types", from which which he later deleted the "Unstructured" paragraph), and then deleted that entire article. As you can see from this previous version, and also from the WikiVisually copy (made before I demoted the previous contents to a Federated Information Repository section with modernized refs and added a new lead—which is the only part that Pi314m kept), Pi314m wasn't entitled to delete the article under rule 4 of the Wikipedia:Deletion policy. That's because IMHO the article does have "relevant or encyclopedic content", even though it describes a system that is a superset of what SoleraTec had developed by around 2008. Pi314m's tuchus is likely to be very populated, especially after I inform SoleraTec LLC of what Pi314m has done. DovidBenAvraham (talk) 06:12, 26 May 2019 (UTC)

I've put a copy of the edited-out "Federated Information Repository" section of the "Information Repository" article in back of the copy of the original "Automated data grooming" paragraph that was already in this Talk sub-section below. I did it that way in order to have only one place for Notes and References. DovidBenAvraham DovidBenAvraham (talk) 20:43, 28 May 2019 (UTC)

I'm about fed up with pi314m's "my way or the highway, even if I don't understand what I'm editing and violate WP rules" approach to this article. If I don't get a response from him by 3 p.m. EDT this afternoon on this Talk page, I'm going to file for a 3O. DovidBenAvraham (talk) 05:30, 27 May 2019 (UTC)

This is not a response to the anatomy-attacking and other threats, but just to highlight that "Automated data grooming" (which perhaps I should have worked on earlier) is now ahead of "Consolidation." Explanation? The flow/sequence is now Deletion ("Automated ..."), then consolidation, followed by compression, etc. Pi314m (talk) 07:11, 27 May 2019 (UTC)
Sorry, but Pi314m's messed-up response (including refs that aren't to the articles intended) is not nearly good enough to make me put off the 3 p.m. deadline. He seems to have a conceptual problem with the basic sequence of the article, which has been—for over 1.5 years—features needed for individual backup (first 7 screen pages) followed by features needed for enterprise backup (last 2.5 screen pages). For a reason I can't understand, Pi314m has decided "Automated data grooming" is a feature needed for individual backup—which AFAIK it isn't and is thus absent in its formerly-described form from applications intended for that purpose.
DovidBenAvraham (talk) 12:54, 27 May 2019 (UTC)

In creating the promised Third Opinion Active Disagreement statement tonight, I discovered that what had I described in this section's beginning paragraph as a "cutesy-poo thing" is not that at all. It is instead a simply a repeat of what's on a WP user's Talk page. I have therefore revised that beginning paragraph to eliminate some some snark, while leaving in the phrase "up his tuchus"—referring to my questioning what a user is allowed to do via the moving-an-article facility (we'll have to let the 3O sort that question out, particularly in regard to Pi314m's subsequent move of "Information Repository" which wiped-out all but the lead two sentences of that article). I belatedly apologise to pi314m for my unjustified snark. DovidBenAvraham (talk) 04:05, 28 May 2019 (UTC)

I accept the words "I belatedly apologise to pi314m" as is, and don't see need for "belatedly" Pi314m (talk) 07:59, 28 May 2019 (UTC)

Early this morning I revised the 3O description of the dispute. I've replaced Pi314m's and my "handles" with "Editor #1" and "Editor #2", and made it—I hope—a bit more dignified and less whiny. DovidBenAvraham (talk) 15:14, 29 May 2019 (UTC)

In regard to "automated data grooming", Pi314m's crypto-Note "usually implemented as a customizable feature" is an incorrect and totally inadequate substitute for the descriptive paragraph that was in the "Performance" sub-section—which you will find I've copied into this Talk sub-section below. Personal backup applications usually don't have this as a customizable feature (CrashPlan was an exception, but that turned out to be a designed enterprise "push" application that for a few years was also marketed as a personal backup application). OTOH enterprise backup applications have to have this as a very customizable feature, because each enterprise has its own "regulatory requirements"—as stated in the descriptive paragraph Pi314m wiped out in creating the inadequate sentence in the front part of the article. If you want to know what "very customizable" means, read the Kaczorek and Jain and Dorion references for that paragraph. DovidBenAvraham (talk) 02:57, 3 June 2019 (UTC)

We haven't had a reply from Pi314m yet, and my thoughts turned to Windows File History—which someone recommended yesterday on an Ars Technica thread. Let's look at the key reference for that WP article sub-section. It says "... its closest analog is Mac OS X’s Time Machine .... The basic function of File History is to periodically [my emphasis] back up your Libraries (your documents, music, pictures, videos) to another hard drive. These backed up files are saved as versions, which you can easily browse through and restore with a couple of clicks". That reference goes on to say "By default, File History backs up a version of your files every hour. If you head into Advanced Settings, you can change this to a value between “Every 10 minutes” and Daily; personally, I opted for every 10 minutes (and even then, it would be nice to have an option for every 60 seconds — maybe it’s possible via a registry hack)." BTW Apple's Time Machine backs up once an hour, and only an independently-written add-on can change that.

So that covers the two most-readily-available examples of "continuous data protection" backup software; we see that neither of them "allows restoring data to any point in time". In my 06:51, 22 May 2019 (UTC) edit to Pi314m's original merge-in, I wrote (refs omitted) "However, because of the performance penalty imposed by necessary tight integration with the filesystem, a frequently-encountered alternative is near-CDP (often wrongly referred to as "CDP"), wherein restore points are created at short intervals to track changes. Nevertheless, given the proper precautions for live data, changes captured by near-CDP can provide fine granularities of restorable objects ranging from crash-consistent images to logical objects such as files, databases and logs."

My edit to the first part of the "Backup" article is certainly sufficient for the user of a personal backup application, and it includes a caution needed by the user of an enterprise backup application. But Pi314m reverted that edit, leaving "Continuous data protection ... refers to backup of computer data by automatically saving a copy of every change made to that data, essentially capturing every version of the data that the user saves. It allows restoring data to any point in time"—referenced by what I have characterized up-section as "marketing blurbs". This is a blatant example of what I said in the Talk section below this, which is "The overall picture that emerges is of Pi314m deciding without any discussion to consolidate a whole series of related articles into a single article that conforms to his concept of the subject matter." DovidBenAvraham (talk) 17:42, 6 June 2019 (UTC)

[Copied, with changes from "you" to "Pi314m", from a portion of a 05:39, 2 June 2019 (UTC) comment I made here on Pi314m's personal Talk page] Pi314m also seems to have an strong urge to merge and simplify descriptions, but accompanied by a willingness to sacrifice the precision of those descriptions. An example is what Pi314m did for the "Continuous_data_protection" subsection of the article. The reason I called the references there "marketing" is that they all basically say "it's nice to have backups at more frequent intervals than is normally done with scheduled scripts", but they don't talk about any performance hit. But if you look at the 2017 Carbonite reference Pi314m left in (Mozy has been merged into Carbonite), it says "we noticed no performance hit at all while using Carbonite to back up about 0.5GB worth of frequently-changing files ... That's probably because it's not actually done in real time, just on a tight schedule (okay, so maybe there is scheduling): 10 minutes if a file is saved once, 24 hours if it's save[d] more than once." By contrast my 2010 ComputerWeekly reference Pi314m deleted says "Because true CDP copies all delta changes, a system can be restored to any point in time required. This can be especially useful if you need to roll back to a point before a corruption event took place, for example. [new paragraph] Because they depend on fixed-interval copies, near-CDP/snapshots only allow you to roll back to a given point in time. For this reason, true CDP offers a recovery point objective (RPO) of zero [my emphasis], while the equivalent for near-CDP/snapshots is the last time a copy took place." Pi314m's link for "snapshots" at the end of the sub-section doesn't go anywhere, which is just as well because a correct WP link to "snapshots" goes to an article on "the state of a system at a particular point in time"—a capability used for near-CDP backups instead of a kind of backup (the ComputerWeekly author also got the terminology wrong in 2010). Isn't this, as I suspect, too technical for Pi314m—so he considers it too technical for any article reader? DovidBenAvraham (talk) 15:26, 17 June 2019 (UTC)

Descriptions in the two paragraphs as they originally were in the "Performance" sub-section, plus the parts of the "Information_repository" and "Continuous data protection" articles that were edited outEdit

"Performance" subsection of "Backup articleEdit

Create synthetic full backups
For example, onto tapes from existing disk archive files—by copying multiple backups of the same source(s) from one archive file to another. This is termed a "synthetic full backup" because, after the transfer, the destination archive file contains the same data it would after a full backup.[1][2][3] One application can exclude[note 1] files and folders from the synthetic full backup.[4]
Automated data grooming
Frees up space on disk archive files by removing out-of-date backup data—usually based on an administrator-defined retention period.[5][6][1][7][8][9][note 2] One method of removing data is to keep the last backup of each day/week/month for the last respective week/month/specified-number-of-months, permitting compliance with regulatory requirements.[10] One application has a "performance-optimized grooming" mode that only removes outdated information from an archive file that it can quickly delete.[11] This is the only mode of grooming allowed for cloud archive files, and is also up to 5 times as fast when used on locally stored disk archive files. The "storage-optimized grooming" mode reclaims more space because it rewrites the archive file, and in this application also permits exclusion compliance with the GDPR "right of erasure" [12] via rules[note 1]—that can instead be used for other filtering.[13]

"Federated information repository" section of "Information Repository" articleEdit

A federated information repository is an easy way to deploy a secondary tier of data storage that can comprise multiple, networked data storage technologies running on diverse operating systems, where data that no longer needs to be in primary storage is protected, classified according to captured metadata, processed, de-duplicated, and then purged, automatically, based on data service level objectives and requirements. In federated information repositories, data storage resources are virtualized as composite storage sets and operate as a federated environment.[14]

Federated information repositories were developed to mitigate problems arising from data proliferation and eliminate the need for separately deployed data storage solutions because of the concurrent deployment of diverse storage technologies running diverse operating systems. They feature centralized management for all deployed data storage resources. They are self-contained, support heterogeneous storage resources, support resource management to add, maintain, recycle, and terminate media, track of off-line media, and operate autonomously.[15]

Automated data managementEdit

Since one of the main reasons for the implementation of an federated nformation repository is to reduce the maintenance workload placed on IT staff by traditional data storage systems, federated information repositories are automated. Automation is accomplished via policies that can process data based on time, events, data age, and data content. Policies manage the following:

  • File system space management
  • Irrelevant data elimination (mp3, games, etc.)
  • Secondary storage resource management

Data is processed according to media type, storage pool, and storage technology.

Because federated information repositories are intended to reduce IT staff workload, they are designed to be easy to deploy and offer configuration flexibility, virtually limitless extensibility, redundancy, and reliable failover.

Data recoveryEdit

Federated information repositories feature robust, client based data search and recovery capabilities that, based on permissions, enable end users to search the information repository, view information repository contents, including data on off-line media, and recover individual files or multiple files to either their original network computer or another network computer.[15]

Edited-out portions of "Continuous data processing" articleEdit

CDP runs as a service that captures changes to data to a separate storage location. There are multiple methods for capturing the continuous changes involving different technologies that serve different needs. CDP-based solutions can provide fine granularities of restorable objects ranging from crash-consistent images to logical objects such as files, mail boxes, messages, and database files and logs.[16]

Differences from traditional backupEdit

Continuous data protection is different from traditional backup in that it is not necessary to specify the point in time to recover from until ready to restore. Traditional backups only restore data from the time the backup was made. Continuous data protection has no backup schedules. When data is written to disk, it is also asynchronously written to a second location, usually another computer over the network. This introduces some overhead to disk-write operations but eliminates the need for scheduled backups.

Continuous vs near continuousEdit

Some solutions marketed as continuous data protection may only allow restores at fixed intervals such as one hour or 24 hours. Such schemes are not universally recognized as true continuous data protection, as they do not provide the ability to restore to any point in time. These solutions are often based on periodic snapshots, an example of which is CDP Server, disk-based backup software that periodically creates restore points using a snapshot and volume filter device driver to track disk changes.

There is debate in the industry as to whether the granularity of backup must be "every write" to be CDP, or whether a solution that captures the data every few seconds is good enough. The latter is sometimes called near continuous backup. The debate hinges on the use of the term continuous: whether only the backup process must be continuous, which is sufficient to achieve the benefits cited above, or whether the ability to restore from the backup also must be continuous. The Storage Networking Industry Association (SNIA) uses the "every write" definition.

Differences from RAID, replication or mirroringEdit

Continuous data protection differs from RAID, replication, or mirroring in that these technologies only protect one copy of the data (the most recent). If data becomes corrupted in a way that is not immediately detected, these technologies simply protect the corrupted data with no way to restore an uncorrupted version.

Continuous data protection protects against some effects of data corruption by allowing restoration of a previous, uncorrupted version of the data. Transactions that took place between the corrupting event and the restoration are lost, however. They could be recovered through other means, such as journaling.

Backup disk sizeEdit

In some situations, continuous data protection requires less space on backup media (usually disk) than traditional backup. Most continuous data protection solutions save byte or block-level differences rather than file-level differences. This means that if one byte of a 100 GB file is modified, only the changed byte or block is backed up. Traditional incremental and differential backups make copies of entire files.

Risks and disadvantagesEdit

The protection afforded by continuous data protection is often heralded without consideration of the disadvantages and challenges that it can present. Specifically, the continuous bandwidth usage can adversely affect network performance, especially in operations where file sizes are large, such as multimedia and CAD design environments. To mitigate this risk, companies employ throttling techniques that prioritize network traffic to reduce the impact of backup on day-to-day operation.[17]

See alsoEdit

NotesEdit

  1. ^ a b Exclusion and/or inclusion is done with Selectors in the Windows variant; this misleading term has been changed to Rules in the Macintosh variant.
  2. ^ Some backup applications—notably rsync and CrashPlan—term removing backup data "pruning" instead of "grooming".[1][2]

ReferencesEdit

  1. ^ a b "New EMC Dantz Retrospect 7 Improves Data Protection for SMBs and the Distributed Enterprise". DellEMC [current]. EMC Corp. [orig. publisher]. 31 January 2005. Retrieved 23 November 2016.
  2. ^ "About synthetic backups". Veritas Support. Veritas Technologies LLC (US). 25 September 2017. Retrieved 18 November 2017.
  3. ^ "Symantec Backup Exec: About the synthetic backup feature". Helpmax.net. HelpMax Software Help & Shop Inc. Retrieved 13 January 2018.
  4. ^ "Retrospect ® 12 Windows User's Guide" (PDF). Retrospect. Retrospect Inc. 2017. pp. 30-31(deduplication via "Snapshots"—a Retrospect term which predates and is distinct from Snapshot_(computer_storage)), 31-32(Dashboard), 41-43(removable disk drives), 216-218(selector as subset filter for synthetic full backups), 230-233(Scripted Verification), 280(Multiple Executions), 369(Duplicate Execution Options), 420(Startup Preferences—Launcher for auto-launch), 426-427(E-mail), 433-434(Open File Backup Tips—VSS snapshot at natural pause), 530-544(SQL Server Agent—coordinating VSS snapshot), 545-566(Exchange Server Agent—coordinating VSS snapshot). Retrieved 2 September 2018.
  5. ^ Preimesberger, Chris (31 March 2017). "World Backup Day 2017: 'We Don't Know the Day Nor the Hour'". eWeek. QuinStreet. Ian Wood of Veritas. Retrieved 11 November 2017.
  6. ^ Fernando, Sal (30 April 2008). "Combine disk, tape benefits to protect data". ZDNet. Retrieved 13 November 2017.
  7. ^ Kaczorek, Mariusz (15 August 2015). "NetBackup Storage Lifecycle Policy (SLP): Overview". Settlersoman. Settlersoman. Retrieved 2 February 2018.
  8. ^ Jain, Hemant (14 April 2015). "VOX Knowledge Base: Data Protection Knowledge Base: Data Protection". VOX. Veritas Technologies LLC. Retrieved 13 January 2018. Employee [of Veritas]
  9. ^ Dorion, Pierre (January 2007). "IBM Tivoli Storage Manager vs. traditional backup". TechTarget. Tech Target Inc. Backup versions. Retrieved 30 October 2018.
  10. ^ "Retrospect ® 12.0 Mac User's Guide" (PDF). Retrospect. Retrospect Inc. 2015. pp. 8-9(Improved Grooming). Retrieved 28 December 2017.
  11. ^ Schmitz, Agen (5 March 2016). "Retrospect 13". TitBITS. TidBITS Publishing Inc. Retrieved 27 October 2016.
  12. ^ "Support: Knowledge Base". Retrospect. Retrospect Inc. 24 April 2019. #Resources (Auto Launching Guide ..., ... difference between "Backup" and "Duplicate", Avid Support ..., Instant Scan FAQ, Can't use Open File Backup ...), #Email Backup, #Top Articles (BackupBot – Deep Dive into ProactiveAI, How to Set Up Remote Backup, GDPR – Deep Dive into Data Retention Policies, Deep Dive - Components [and phases] of a Retrospect Backup, How to Set Up the Management Console, Management Console - How to Use Shared Scripts, How to Use Storage Groups, Support End-of-Life Announcement for Mac OS X 10.3, 10.4, and 10.5, Retrospect Compatibility with Apple File System (APFS)), #Hooks (Script Hooks: External Scripting with Event Handlers, Script Hooks: How to Protect MongoDB with Retrospect, Script Hooks: How to Protect MySQL with Retrospect, Script Hooks: How to Protect PostgreSQL with Retrospect). Retrieved 4 May 2019.
  13. ^ Schmitz, Agen (28 May 2018). "Retrospect 15.1.1". TitBITS. TidBITS Publishing Inc. Retrieved 20 June 2018.
  14. ^ Armstrong, Mark (9 August 2007). "Benefits of a Federated Information Repository as a Secondary Storage Tier". SNIA Enterprise Information World 2007 Conference. Storage Networking Industry Association (SNIA). Retrieved 1 May 2019.
  15. ^ a b "Area Under Surveillance". SoleraTec. SoleraTec LLC. 2019. Phoenix RSM: (Record, Store, Manage), Surveillance Video Management (information repository), Ultra-fast Search and Playback (content-based search queries). Retrieved 6 May 2019.
  16. ^ "An Overview of Continuous Data Protection". Infosectoday.com. Retrieved 2011-11-12.
  17. ^ Off-Site Backup - The Bandwidth Hog Archived 2011-07-07 at the Wayback Machine

DovidBenAvraham (talk) 20:43, 28 May 2019 (UTC)

3OEdit

Now you've actually started talking to each other, can we remove the article from the 3O page? Or do you still need help? Satyris410 (talk) 19:20, 2 June 2019 (UTC)

No, don't remove it—because Pi314m hasn't started talking to me (other than one comment accepting my apology for a snarky erroneous criticism that I have now removed from the first sentence that started the section above). Last night I made the latest of several comments (starting 31 May UTC) on Pi314m's personal Talk page, "a set of comments intended to convince you that your trying to "help out" with the "Backup" article is not in fact helpful, because you didn't ask anybody on the Talk page what would be helpful." Pi314m hasn't yet made any reply to those comments either.

I can think of only two possible reasons:

The first is that Pi314m's religious sensibilities were immediately offended because I used the phrase "up his tuchus" in the first comment of the section above. Since I only used the phrase to catch his attention—I'm usually very careful not to use offensive language, I'd be happy to apologize for using it.

But I think the real reason is that Pi314m does have a "my way or the highway, even if I don't understand what I'm editing and violate WP rules" approach. Read the preceding sections on his personal Talk page, and then sample his recent contributions. If there's any indication of his interacting with other WP editors, I didn't see it.

Satyris410, given what I've just said will I need to need to repost the article on the 3O page in two days? I'd really like to handle this problem through a 3O, but if necessary I'll proceed to an RfC—because I consider this problem to be very serious for the article. DovidBenAvraham (talk) 20:32, 2 June 2019 (UTC)

If I do have to repost the article on the 3O page, or proceed to an RfC, I think I'd have to use the word "vandalism" that is a WP-fraught synonym for substantial deletion of useful text. I don't want to do that, because of the future consequences for Pi314m as a WP editor if I can justify it—which IMHO I can. DovidBenAvraham (talk) 21:41, 2 June 2019 (UTC)

So, it seems there are several issues here. First, it may do well to lower the temperature a bit. I think both of you are acting in good faith, and just disagree. It's a lot easier to talk to one another, instead of past one another, if you start with that view, rather than the view that the person who disagrees is malicious. I don't believe either of you are acting maliciously. That aside, it is permitted to move, merge, or redirect an article, and it is just as permissible to reverse such an action if you disagree with it. If it comes to that point, discuss the issue, and if need be involve other editors; a request for comment can be placed to bring in some fresh eyes. The prior continuous data protection was rather thin on sourcing, so it may not hurt for that information to be here for a while. If more sources come about later, and the section grows too large and unwieldy to fit here, it can always be split again at that point. Also, there seems to be some substantial disagreement over whether certain sources are or are not reliable. The best thing to do with that is to ask at the reliable sources noticeboard; some of our best at analyzing sources watch and participate there, so you should get some good advice there on the reliability of the sources you're proposing to use. Seraphimblade Talk to me 22:15, 4 June 2019 (UTC)

Last night I looked at what Pi314m did to the "Outsourcing" article from January to May 2019, and IMHO I've discovered both the motivation and the main technique for what he does with existing articles. His motivation is "my redefinition of the article's subject or the highway". His technique is merging related articles into the article he has chosen to redefine, and then deleting any part of the merged-in article that doesn't fit his definition of the merged-into article's subject.

Let's first see what happened in January 2019 after Pi314m merged the "Insourcing" article—without any discussion on "Talk:Insourcing" (hatnote—what hatnote?)—into "Outsourcing". He then expanded the "Insourcing" sub-section on that subject from 0.25 screen-pages to 0.75 screen pages—mostly referenced by magazine articles, but cut the "Standpoint of government" section from 2 screen-pages to 1 screen page—with cites of a 2006 semi-academic article by Richard Baldwin cut from 6 to 1. Pi314m commented "This article is not meant [my emphasis] to be at the PhD level, nor is it meant [my emphasis] to be about unemployment. Now the word unemployment only appears five times, two from a NYTimes financial writer."; that's what I mean by redefinition of an article's subject.

In February 2019 Pi314m deleted a paragraph beginning "Further, the label outsourcing has been found to be used for too many different kinds of exchanges often in confusing ways." from the "Insourcing" sub-section, moving it into the article A. Aneesh about the author of its main reference. Sorry, the fact that the "outsourcing-based market model fails to explain why these [global software] development projects are jointly developed, and not simply bought and sold in the marketplace" contradicts Pi314m's redefinition of the article's subject to fit the outsourcing-based market model.

In March and April 2019 Pi314m merged "Engineering process outsourcing", "Business process outsourcing", "Information technology outsourcing", and "Farmshoring"—all done without any discussion on their Talk pages—into the "Outsourcing" article. In all three cases he soon deleted all or most of the merged-in articles' text.

In March and April 2019 Pi314m also merged "Regional insourcing", "Homeshoring", "Personal offshoring", and "Nearshoring" into the "Outsourcing" article. However in these cases he seems to have kept at least the key definitions from the merged-in articles, so maybe the merging-in of those articles—which again was done without any discussion on their Talk pages—didn't actually delete much text.

The overall picture that emerges is of Pi314m deciding without any discussion to consolidate a whole series of related articles into a single article that conforms to his concept of the subject matter. He can technologically get away with this flouting of Wikipedia rules, apparently because because he is doing a copy-paste of the merged-in article's text followed by replacing that text with a redirect to the moved-to article. The only reason I caught Pi314m is because he did copy-pastes between a sub-section of the "Enterprise client-server backup" section and preceding sections of the same Backup article. As is his custom, he did not discuss these "merges" on that article's Talk page; if he had, I would have carefully explained (as I now have on Pi314m's personal Talk page as well as in the section above this) that application feature descriptions in the last section of that article may seem like duplicates of the same-named feature descriptions in preceding sections of the article—but they're not.

IMHO the underlying problem is that—as I've shown in the preceding paragraphs—Pi314m believes that Wikipedia gives him the right to be the sole decider of the subject and contents of an article, even when it's partly about subject matter of which he knows nothing. That would explain why he has not responded to my subject-related comments in the Talk page section above, and why he reverted my 26 May edits that put back the two feature-description paragraphs he had deleted from the "Performance" subsection of the "Enterprise client-server backup" section; he considers that I have no right to second-guess his decisions. It's debatable whether a 3O will be sufficient to change Pi314m's belief; I now think that it will require an RfC at the least. But good luck as we follow the prescribed process, Seraphimblade! DovidBenAvraham (talk) 06:28, 5 June 2019 (UTC)

In regard to the next-to-last paragraph of my 06:28, 5 June 2019 (UTC) comment, I later found from this section on his personal Talk page that Pi314m was cautioned in January 2017 by Diannaa not to do "cut_and_paste moves". That caution didn't stop him; it's what he's continued to do both on the "Outsourcing" article and the "Backup" article. I wouldn't be surprised if Pi314m likes the idea that, as Diannaa said, "it splits the page history, which is legally required for attribution." In any case, it didn't stop him from doing another "cut_and paste_move" almost exactly a year later. In that January 2018 case Matthiaspaul said on P314m's personal Talk page "As I told you already, don't carry out such edits without prior discussion or against consensus, as you did twice already. If you continue these kinds of edits, they will have to be regarded as vandalism which may led [sic] to a block [my emphasis]." DovidBenAvraham (talk) 01:22, 9 June 2019 (UTC)

_un-discussed_ text-destroying "merging-in", both of other articles and of paragraphs within this article, by Pi314mEdit

Should Pi314m be permitted to, without prior or subsequent discussion, merge other articles into this one, merge paragraphs from the rear "Enterprise client-server backup" section of this article into preceding sections—and then immediately delete most of the text from what has been merged-in?

What Pi314m initially did to this article—without discussion—from 21 May through 27 May is described at the beginning of the preceding "Rewrite ..." section of the article's Talk page: merging another article, cutting much of what was merged in, and then immediately reverting my edits responding to that merging-in. What he also did to the article's pre-existing text during that same time period is also described in that the preceding "Rewrite ..."section of the article's Talk page: essentially destroying two paragraphs in the article's "Enterprise client-server backup" section by trying to merge them—grossly-simplified—into one of the article's preceding sections that deals with personal backup applications. In this section of Pi314m's personal Talk page, I supplemented an invitation to him to discuss the change on the article's Talk page with an explanation of the two-audience-level structure of this article; Pi314m never gave any indication that he had read any of what I had written there.

In addition Pi314m merged-in a second article, except that after that merge-in he deleted all but the lead two sentences of the merged-in article. That, as I pointed out to him—also in this section of Pi314m's personal Talk page—is something he is not entitled to do under rule 4 of the Wikipedia Deletion policy, because IMHO that article had "relevant or encyclopedic content" that had nothing to do with the "Backup" article. I have preserved the deleted content of that article (preceded by the content of the two "Enterprise client-server backup" paragraphs before Pi314m grossly simplified them while "internally merging" them) in the preceding "Rewrite ..."section of the article's Talk page.

Pi314m has a distinct fondness for "cut-and-paste moves" that he calls "mergers", which he does without community consensus—frequently starting in January. He did one in January 2017,and was rebuked for it by Diannaa. He did another one in January 2018, and was warned about it by Matthiaspaul—who said "If you continue these kinds of edits, they will have to be regarded as vandalism which may led to a block." On the Talk page for that second article, Matthiaspaul said "No, this is not how it works! It is good that you are trying to be constructive, but your edits are not. You are already edit-warring over it and if you continue to try to force your undiscussed changes into the articles, this may led [sic] to a block. [new paragraph] Such changes require prior discussion and won't be carried out unless the outcome of such a discussion (after a reasonable amount of time for other editors to see, think about it and react - typically months) would be consensus for a merge." Pi314m quoted part of that on his personal Talk page, saying "I too can and hopefully will learn from what you said on the article talk page". However starting in January 2019 he did another series of "cut-and-paste moves" without community consensus, which nobody caught him doing—so I have described them starting in the 9th paragraph of the preceding "3O" section of this article's Talk page.

The preceding "3O" section of this article's Talk page was named "3O" by Satyris410 because it was supposed to be where some third editor —Seraphimblade eventually—would provide his/her Third Opinion. However Pi314m never responded on the preceding "Rewrite ..."section of the article's Talk page, which was the section I had listed in my request for a Third Opinion, other than to graciously thank me for my apology for a later-removed bit of (as I later found out, unjustified) snark I had put into the first sentence of that section.

The title of this section is Request for comment on _un-discussed_ text-destroying "merging-in", both of other articles_and of paragraphs within this article, by Pi314m. Sorry I didn't know I'd have to repeat it myself in this section.

DovidBenAvraham (talk) 01:46, 12 June 2019 (UTC)

The RfC process is not to be used for discussing the conduct of another user, see WP:RFC#About the conduct of another user. --Redrose64 🌹 (talk) 07:06, 12 June 2019 (UTC)
Are you saying that I should delete the fourth paragraph, the one beginning "Pi314m has a distinct fondness for "cut-and-paste moves" that he calls "mergers", ..." from the comment beginning this section? My purpose in filing this RfC is to stop Pi314m's oversimplifications of the "Backup" article, since it is clear—despite my explanation on his personal Talk page—that he doesn't understand that the "Enterprise client-server backup" section is written for a different audience than the preceding sections of this article. One solution (which reinstates something I thought of doing in September 2017) would be to create a separate "Enterprise client-server backup" article. But the problem is that Pi314m, as shown in the fourth paragraph of this section, has an obsession with merging smaller articles into larger articles that have a related subject. So if I created that separate article, it is highly likely that I'd be back in the same situation a week from now—after Pi314m merged the new separate article back into the "Backup" article and over-simplified its merged-in contents. So this RfC inescapably deals with Pi314m's conduct, because what he's done to the "Backup" article is the continuation of a behavioral pattern that has lasted for at least two years with other articles. What Wikipedia process other than an RfC would you suggest I use, and wouldn't that—at least until we've tried an RfC—be overkill? DovidBenAvraham (talk) 10:04, 12 June 2019 (UTC)
If you hold an RfC it should be purely about the content of the article, and the statement should be neutral and brief; but as it stands, it's neither brief nor neutral. In fact Pi314m is mentoned no fewer than twelve times before my post - and that's not counting your reply to me. RfC is not for discussing user conduct, for which other avenues are available. --Redrose64 🌹 (talk) 12:55, 12 June 2019 (UTC)
In the cold light of mid-day, you're right about both the neutrality and the briefness; I thought only the initial pragraph had to be neutral and brief. I'll prepare another RfC based on most of the content of this earlier section of the article Talk page, omitting anything having to do with the body of the "former "Information Repository" article that Pi314m deleted when he "merged" that article into this one. As for that deletion, I'll make it plus at least a couple of the content-deleting "mergers" into Outsourcing—which I mentioned in a paragraph in the preceding "3O" setion of this Talk page—the subject of several Administrators' Noticeboard complaints. These, especially the "Information Repository" deletion—of content unrelated to "Backup", IMHO really amount to vandalism a WP-fraught synonym for substantial deletion of useful text. As I pointed out in the fourth paragraph of this section, Pi314m was previously warned about that in 2017 and 2018—so the consequences for him this time may be severe. I was trying to avoid that, but—as I said in a non-neutral quote of myself I immediately deleted from this section (which I can put back in now that you've removed the template)—IMHO the underlying problem is that Pi314m believes that Wikipedia gives him the right to be the sole decider of the subject and contents of an article, even when it's partly about subject matter of which he knows nothing. The fact that he essentially never responded on the preceding "Rewrite ..."section of the article's Talk page, which was the section I had listed in my request for a Third Opinion, reinforces my reluctant acceptance that the severe consequences may be necessary. DovidBenAvraham (talk) 16:25, 12 June 2019 (UTC)

should a new-to-subject editor be allowed to damage two-audience-level structure of article because of his urge to "simplify"Edit

Should a new-to-the-subject editor be allowed to damage the two-audience-level structure of the "Backup" article simply because of his irresistible urge to "simplify"?

The title of this section is Request for comment on whether new-to-subject editor is allowed to damage two-audience-level structure of article because of his urge to "simplify". DovidBenAvraham (talk) 05:51, 13 June 2019 (UTC)

Mainly between 2007 and 2011, the first 7 screen-pages of the article evolved as a comprehensive summary of what every computer-using person should know about backing up his/her data. In November 2017 I moved the description of certain backup features from another article to a new 2-screen-page "Enterprise client-server backup" section at the end of the article. The lead of that section clearly says it is about "a class of software applications that back up data from a variety of client computers centrally to one or more server computers, with the particular needs of enterprises in mind." The section goes on to describe special features typically incorporated in that class of applications, with an explanation of the enterprise need for each feature.

On 21 May 2019 an editor new to the article, whose contributions list since August 2016 shows no edits to articles dealing with IT less than 25 years old, did a "cut-and-paste move" of the "Continuous data protection" article into the "Backup article. The Talk page for that article shows there was no previous discussion of the "merge", and the only non-bot comment on that Talk page said in 2011 "It would be good to have a section discussing real-world implementations of CDP: which companies provide such a service, which tools they use to provide it, etc.". I pointed out here in this Talk page that most backup applications that say they do CDP really do near-CDP via incremental backups every few minutes, but the new-to-the-subject editor reverted the edits I had done after his "merge" that pointed out this "inconvenient fact".

On 22 May 2019 the same editor new to the article did another "cut-and-paste move" of the "Information Repository" article into the "Backup article. Again the Talk page for that article shows there was no previous discussion of the "merge", and this time the new-to-the-subject editor immediately deleted the entire "merged" article contents—except for the new lead I had added on 1 May. I have copied the body of the "Information Repository" article here in this Talk page; you can see that the new editor deleted the article body because it had nothing to with backup. Couldn't the new editor have been satisfied with what I did to the "Backup" article on 1 May, which was to put in a link to the "Information Repository" article in order to use the backup-related lead I had just added to that article?

The fact that he wasn't satisfied is why I have written the two preceding illustrative paragraphs. Together they show that the new editor has an IMHO irresistible urge to simplify several Wikipedia articles into a single one. He carried this urge further on 22 May 2019 when he "merged" two paragraphs from the "performance" sub-section of the "Enterprise client-server backup" section forward into the personal sections of the article. I have copied the original versions of those paragraphs here in this Talk page; you can see that the feature descriptions in both paragraphs explain their importance to enterprise backup administrators. The new editor promptly simplified those moved descriptions so that they would fit into the worldview of a reader needing to know about personal backup applications, and deleted my attempts to add clarified versions of the original feature descriptions back into the "Enterprise client-server backup" section. You can see that the personal backup versions of the descriptions of the two features, here and here, have been pruned of so much information as to be essentially useless.

Since the new-to-the-subject editor has previously contributed to WP articles about IT hardware and software used 25 years ago by enterprises, he surely has some basic understanding that the backup needs of an enterprise are more extensive than those of a personal computer user—for legal and business continuity reasons. We could split off the "Enterprise client-server backup" section into a separate article. But it's highly probable that, based on what the same editor did to the cluster of "Outsourcing"-related articles as described in another section of this Talk page,the new-to-the-subject editor would in a week or two succumb to his urge to "merge" the split-off "Enterprise client-server backup" article into the "Backup" article—repeating the same "simplifications". DovidBenAvraham (talk) 05:51, 13 June 2019 (UTC)

  • Worst RfC Ever - This RfC is sooooooooo bad. Seriously dude.... take a chill pill. You gotta relax. You'd find RfC's are much more effective if you simply and neutrally state the question. You could have done this entire RfC by simply asking "Should 'Continuous data protection' be split into its own article?". Instead, you spent several paragraphs raging out at someone that no one cares about. Just chill. NickCT (talk) 19:51, 13 June 2019 (UTC)
I totally agree, but my previous RFC attempt was even worse—because I didn't suppress my rage. My ideal question would be "Should 'Enterprise client-server backup' be split into its own article, and—if I do that—can you editors lay down a set of comments that will persuade 'new-to-the-subject editor' not to merge it back in to 'Backup' and dumb it down again?" The problem is that, IMHO for a combination of psychological and cultural reasons, "new-to-the-subject editor" simply won't listen to anyone's comments. I initiated a 3O, and he simply refused to respond to the Third Opinion editor. I'm hoping an RfC will have more influence on him, but I'm reluctantly prepared to go to Administrator's Noticeboard or Arbitration. I'd prefer not to get "new-to-the-subject editor" banned, because I think contributing to WP is an important part of his life, but I think that some of his conduct in connection with this and other articles would support doing so. As you can see it's a very tricky situation—which WP no longer permits dealing with directly via an RfC, and I'd appreciate any advice you editors can offer. DovidBenAvraham (talk) 21:08, 13 June 2019 (UTC)
  • Retry RfC? I'd be glad to give my opinion like many other wikipedians if this RfC were to be done right. Its too inconveniencing for anyone who wants to help. However it sounds like this editor has a WP:ITSCRUFT problem perhaps. Information relevant to backup should stay on backup and it can refer to other articles with useful information if necessary. --NikkeKatski [Elite] (talk) 15:38, 15 June 2019 (UTC)
@DovidBenAvraham: Yeah as pointed out by redrose you should recreate the RfC and make it more neutral and aimed at the actual article rather than the person. If we gain consensus for the obviously superior your version of the article then any attempts to revert it can probably be considered edit warring (if it wasn't considered that already) and would be more easily punishable. --NikkeKatski [Elite] (talk) 15:57, 15 June 2019 (UTC)

Removed the template from this sub-section; my third try is below in a new sub-section. NikkeKatski [Elite], based on what the other editor did on 22 May 2019, as described in this same subsection in the paragraph beginning "The fact that he wasn't satisfied ...", the other editor again wouldn't let a version of the article improved by me exist long enough for any of you editors to see it—that's just the way he has demonstrated he operates. DovidBenAvraham (talk) 00:04, 16 June 2019 (UTC)

Request for comment on whether "Enterprise client-server backup" should be split from "Backup" into a stand-alone article, and if so how to protect it from "simplifying" re-mergerEdit

There is a clear consensus that the "Enterprise client-server backup" section at the end of the article should be separated from the "Backup" article into a stand-alone article. The content is now at Enterprise client-server backup.

Cunard (talk) 00:18, 28 July 2019 (UTC)

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Should the "Enterprise client-server backup" section at the end of the article be separated from the "Backup" article into a stand-alone article? If so, how would I protect that separate" article from an instant "simplifying" re-merger into "Backup"? DovidBenAvraham (talk) 05:19, 16 June 2019 (UTC)

SurveyEdit

*No split necessary for any section per WP:CONSPLIT and could be considered a redundant WP:CFORK. --NikkeKatski [Elite] (talk) 14:28, 18 June 2019 (UTC)

I find it unnecessary to take away such a section from backup. The article is pretty well-built with it. I have no doubt that it COULD make a stand-alone article but IMHO I see no significant reason to make this split especially if it were to be merged again. I would revise my !vote if Pi314m were to make a statement denying that he would re-merge the article in the future without reaching consensus in this talk page. --NikkeKatski [Elite] (talk) 16:02, 18 June 2019 (UTC)
  • Split Since User:Pi314m is clearly not trying to split in bad faith and believes that a working 'History' of the topic is achievable I do think we should go through with a split. --NikkeKatski [Elite] (talk) 23:26, 18 June 2019 (UTC)
  • Maybe? - I don't see a WP:CONSPLIT rationale, but if Enterprise client-server backup meets WP:GNG it could reasonably be spun out into a seperate article. NickCT (talk) 15:53, 18 June 2019 (UTC)
  • Split; reluctantly, see Discussion for my reason. In the 06:51, 22 May 2019 version—before the specific other editor started vandalizing it performing a WP-fraught synonym for substantial deletion of useful text—there were 24 second-party references (as opposed to cites of the refs) out of 54 for that section (the other 30 refs were first-party from backup application developers). Therefore it could stand alone, although there would need to be many links from it to the "Backup" article and vice-versa. DovidBenAvraham (talk) 19:51, 18 June 2019 (UTC)
  • Agree. Split - hands off, while the author of the (new) Enterprise article obtains the courtesy and full opportunity that comes with "In use"/Under construction" (honoring it, whether or not it's physically there). Shortly thereafter, With other editors contributing (including myself somewhere down the road, particularly in a HISTORY section) there would be no "urge to merge." As for the present Backup article, I'd also be hands-off for a while, to facilitate his work. Is this the statement you're seeking? Pi314m (talk) 21:11, 18 June 2019 (UTC)

DiscussionEdit

Mainly between 2007 and 2011, the first 7 screen-pages of the "Backup" article evolved as a comprehensive summary of what every computer-using person should know about backing up his/her data. In November 2017 I moved the description of certain backup features from another article to a new 2-screen-page "Enterprise client-server backup" section at the end of the article. The lead of that section clearly says it is about "a class of software applications that back up data from a variety of client computers centrally to one or more server computers, with the particular needs of enterprises in mind." The section goes on to describe special features typically incorporated in that class of applications, with an explanation of the enterprise need for each feature. Should that end section be separated from the "Backup" article into a stand-alone article?

There is evidence that having a single article with sections aimed at audiences with different levels of IT knowledge is confusing for some readers. On 26 May 2019 another editor deleted these two paragraphs from the "Performance" sub-section of the "Enterprise client-server backup" section. He then inserted greatly simplified descriptions of the same features into the "Backup types" sub-section and "Manipulation of data and dataset optimization" section of personal backup sections of the article, evidently to fit those features into his own knowledge of the backup process. Unfortunately his knowledge, because of his evident unfamiliarity with enterprise IT much beyond the early 1990s, does not encompass the application features described in "Enterprise client-server backup"—all of which were developed sometime after 2005 as a result of advances in hardware and operating systems. The fact that his personal backup level of knowledge will be shared by many Wikipedia readers argues for splitting the "Enterprise client-server backup" section off into a separate article, with sufficient two-way linking to guide interested readers to the enterprise features while explaining those features as sophisticated extensions of their simpler roots.

Unfortunately that particular other editor has a distinct fondness for "cut-and-paste moves" that he calls "mergers", which he does—frequently starting in the month of January—without community consensus. He did one in January 2017, and was rebuked for it by Diannaa. He did another one in January 2018, and was warned about it by Matthiaspaul—who said "If you continue these kinds of edits, they will have to be regarded as vandalism which may led to a block." On the Talk page for that second article, Matthiaspaul said "No, this is not how it works! It is good that you are trying to be constructive, but your edits are not. You are already edit-warring over it and if you continue to try to force your undiscussed changes into the articles, this may led [sic] to a block. [new paragraph] Such changes require prior discussion and won't be carried out unless the outcome of such a discussion (after a reasonable amount of time for other editors to see, think about it and react - typically months) would be consensus for a merge." The other editor quoted part of that on his personal Talk page, saying "I too can and hopefully will learn from what you said on the article talk page". However starting in January 2019 he did another series of 9 "cut-and-paste moves" without community consensus into the "Outsourcing" article, which nobody caught him doing; I have described them starting in the 9th paragraph of the "3O" section of this article's Talk page. Starting in late May 2019 that other editor editor without community consensus did "cut-and-paste moves" of two other articles into the "Backup" article, after the second of which he deleted the entire contents of the merged-in article except for the two-sentence lead—because the body of that second article (which I've copied here) discussed an application not directly related to backup.

NikkeKatski [Elite] suggested above "If we gain consensus for the obviously superior your version of the article then any attempts to revert it can probably be considered edit warring (if it wasn't considered that already) and would be more easily punishable." The problem with that approach is that, based on his history I've noted in the preceding paragraph, this particular other editor will do a "cut-and-paste move" of a separate "Enterprise client-server backup" article back into the "Backup" article almost instantly. So there would be no time to gain any community consensus—edit war (which I'd rather avoid) or no edit war. In any case this particular other editor editor doesn't pay any attention to any other editor; he refused to respond to either me or Seraphimblade in the Third Opinion section of this Talk page. So how would I protect a separate "Enterprise client-server backup" article from an instant "simplifying" re-merger? DovidBenAvraham (talk) 04:38, 16 June 2019 (UTC)

DovidBenAvraham I don't see any reason to remove that section per WP:CONSPLIT and one could argue that it would be giving it WP:UNDUE weight. Also potentially you could purposely start an RfC asking "should we split" (RfC doesnt always have to be worded in your favor, but your !vote would be in your favor) and if we get enough involved wikipedians to participate in the RfC any attempt to go against consensus directly could justify punishment especially considering past behavior. Theres always the chance that wikipedians may agree with the split but hey if that happens then just let it be for the time being. --NikkeKatski [Elite] (talk) 22:07, 16 June 2019 (UTC)

In case you haven't noticed, Elitematterman (talk · contribs), this section of the article Talk page in fact starts such an RfC—my third attempt to produce one that's reasonably neutral and doesn't mention the "handle" of the particular other editor. As I said two paragraphs above this, "let it be for the time being" would mean that other editor would instantly "merge" a separate "Enterprise client-server backup" article back into the "Backup" article. And, as I didn't repeat there but said in the fourth through sixth paragraphs of this section of the Talk page, once he "merged" the separate article back in he would instantly delete any text that didn't agree with his own level of IT knowledge. So, unless some editor contributes to this RfC a brilliant suggestion on how to prevent the particular other editor from "doing his thing", I'm going to have to go to the Administrators' Noticeboard tomorrow with a request that the particular other editor be banned—either totally or from editing particular articles. DovidBenAvraham (talk) 02:43, 17 June 2019 (UTC)
You do have a good case to take to the noticeboard and it would probably fix the problem alone even if they decide not to ban him as in my experience they usually end up with something like "no, but if he persists..." And as for RfCs your last three seemed more like lengthy discussions rather than the traditional RfCs i've encountered where the format is something more like..
Request for Comments on Deleting fortnite article
(RfC template)
Fortnite is just a bad game and isnt WP:NOTABLE anymore -PUBGFAN123 (The proposed change)
(!votes below)
  • no its still a worldwide phenomenon -Jonesy
  • yes per WP:BADGAME -EpicGames
  • Close RfC this request is dumb. -NikkeKatski
Discussion
where actual discussions happen and we more thoroughly explain our !votes. And an RfC AFAIK doesn't have to go this way but this is usually how people reach consensus on a matter. the proposed change also is usually neutral (completely) and merely states the proposed change where as your case can be entirely confined in the designated discussion section. -Nikke
dude why are you talking to yourself, also fortnite is bad per WP:GRAPHICS. -TotallyDovidLol
ok im done with this reenactment. I'm probably spending too much time doing this when its most likely unnecessary. But taking it to the noticeboards is a better idea like you stated because it can prevent all future/similar actions too. I think you should go forward with that first. --NikkeKatski [Elite] (talk) 11:59, 17 June 2019 (UTC)

I've now copied the edited-out portions of the "Continuous data processing" article into an earlier section of this Talk page. The editing-out is what happened immediately after the first of the particular other editor's "cut-and-paste moves" of two other articles into the "Backup" article; I only alluded to that move in the last sentence of the third paragraph of this "Discussion" sub-section. If you compare what was edited-out to this sub-section of the article, you'll realize that the particular other editor's level of IT knowledge has left him unable to deal with the idea that all "CDP" personal backup applications are in fact "near-CDP". A "near-CDP" backup application in fact does un-scripted incremental backups once every 10 minutes to once an hour, which is sufficient for a personal user but not for an enterprise administrator (think about the consequences if the backup administrators at Target had only been able to restore cash-register transactions to 10 minutes before their nationwide system failure a couple of days ago). The SNIA's "every write" definition quoted here effectively means that a "true-CDP" backup application must be tied into a virtual machine, an approach currently available in several applications which only became available at great expense 10 years after the particular other editor's presumed departure from enterprise IT. FYI, I've done the copying merely as additional evidence for the Administrative Noticeboard; I thought you other editors might be interested in reading it. DovidBenAvraham (talk) 02:14, 18 June 2019 (UTC)

It helps both from a WP:AN perspective and an article upkeep perspective ;p --NikkeKatski [Elite] (talk) 12:46, 18 June 2019 (UTC)

  • Comment - Echoing some of User:Elitematterman's thoughts, I'm not seeing the WP:CONSPLIT rationale for splitting. Can the nominator in two sentences or less, explain why he thinks the article should be split? That explanation shouldn't be "b/c I don't like the editor who merged the topics". NickCT (talk) 13:43, 18 June 2019 (UTC)
Pi314m: Could you participate in the RfC if you intend to split any part of this article? If not please reveal whether or not you do intend to split content. Thanks~ (Also re-ping of @NickCT: .. incase he's still interested.) --NikkeKatski [Elite] (talk) 14:24, 18 June 2019 (UTC)
There really is enough material "out there" on Enterprise (backup) to justify a standalone article, especially if a history section is given its due. Pi314m (talk) 15:40, 18 June 2019 (UTC)
@Pi314m: - Bit confused. DovidBenAvraham said you were merging these articles. Your comment suggests they should be separate. You feel as though these topics should be separate articles? NickCT (talk) 15:54, 18 June 2019 (UTC)
@NickCT: I can understand why you were confused as I was under the same assumption for a time during his first two attempts at RfC. What Dovid is trying to claim is that Pi314m has a tendency to split sections from articles without consensus and then re-merge them while simplifying the section. His problem (which i would want to avoid too) is that he does not wish for said section to be simplified as it tarnishes the article. --NikkeKatski [Elite] (talk) 16:09, 18 June 2019 (UTC)
@Elitematterman: - Thanks for that explanation. These two seem like a few too many chips went into the cookie.
Pi314m - If we split this section out, can you state simply that you do not plan to merge it later? NickCT (talk) 16:14, 18 June 2019 (UTC)
I'd be glad to revise my !vote to be more neutral if he were to agree to not merge it later without consensus in this talk page. --NikkeKatski [Elite] (talk) 16:20, 18 June 2019 (UTC)
I reluctantly think the article must be split, so that via the Administrator Noticeboard I can get the particular other editor (I'm not going to mention his "handle", but it begins with the letter 'P' and he's commented above) banned from editing the split "Enterprise client-server backup" article. Given what the particular other editor did to the "Outsourcing" article after promising not to do such things—all described in the third paragraph of this "Discussion" section—I consider his promise to temporarily act as if such a ban were in effect on a par with the promises of Old Golden Boy. DovidBenAvraham (talk) 21:51, 18 June 2019 (UTC)
@Elitematterman: You've misunderstood what I have been saying about the habits of the particular other editor. He doesn't split related articles; instead he finds other articles which he considers related to one article, and merges-in the other articles—discarding without community consensus any text that he considers unrelated to or in conflict with the article he's merged them into. DovidBenAvraham (talk) 22:11, 18 June 2019 (UTC)
That explains a couple things i've seen, but I still would only allow this if the standalone article wouldn't get merged into anything else (what else would it be merged into anyways other than backup lol). Also for future reference we should usually abide to results in WP:AN down to the letter. If they asked you to stop referring to the "handle" (big If) we usually don't try to hint at it. --NikkeKatski [Elite] (talk) 23:20, 18 June 2019 (UTC)
All the particular other editor is promising in the "Survey" sub-section is "Agree. Split - hands off, while the author of the (new) Enterprise article obtains the courtesy and full opportunity that comes with 'In use'/'Under construction' .... Shortly thereafter, With other editors contributing (including myself somewhere down the road, particularly in a HISTORY section) there would be no 'urge to merge.'". That's a promise worthy of Old Golden Boy, because it'll take me at most one day to establish the links to the "Backup" article—at which point the "under construction" conceptual sign would have to come off because the section as of 12:02, 20 May 2019 would be fine as a separate article! My counter-proposal is that the particular other editor shall be banned from editing the split-off article unless and until he finds an RS for a "History" section. He's unlikely to find that RS because, as I explained in the second paragraph of my 04:31, 31 May 2019 (UTC) comment on his personal Talk page, "There followed an epic-but-friendly three month battle with editor JohnInDC, during which I found terminology for the equivalent features in two competing enterprise client-server backup applications—and was therefore able to make the new section reasonably application-independent by adding references to those equivalent features." Of the 5 applications currently referenced in the "Enterprise client-server backup", 4 have WP articles with a "History" section—but each of those sections is couched in the terminology of its subject application. BTW nobody identifiable as an administrator asked me not to use the particular other editor's "handle"; I just thought it not using it would seem more NPOV. DovidBenAvraham (talk) 03:55, 19 June 2019 (UTC)
@DovidBenAvraham: - Can we quit the talk of banning all together? It's challenging WP:CIVILITY. It looks like the other editor mostly agrees with your position at this point. Just do the split that you want to do. If there's another errant merger attempt, call NikkeKatski [Elite] and I back here, and we'll take care of it. Simple. No need to bicker. NickCT (talk) 12:46, 19 June 2019 (UTC)
Echoing NickCT but also Right now Pi314m is our friend. We should be accepting his help as much as we can and if any issue should come up we can deal with it when we cross that bridge. For now we should work on contributing to an encyclopedia. These principles such as WP:ASSUMEGOODFAITH and WP:NPOV are the foundation of wikipedia. Yes we will have to deal with a couple bad eggs but there's a reason many admins are so reluctant to take harsh action. --NikkeKatski [Elite] (talk) 13:21, 19 June 2019 (UTC)
@NickCT and Elitematterman: I think you editors (are you administrators?) should do a close re-read of the fourth paragraph here. The particular other editor promised on his personal Talk page in January 2018 "... at the end of this discussion (typically after several months), there will be consensus for a merge, we can carry it out. Otherwise, the articles stay separate", and then did 9 such un-discussed merge-ins to another article exactly a year later—followed in May 2019 by the two he did into the "Backup" article. I find WP:ASSUMEGOODFAITH rather difficult to follow in his case. Besides, if you look at his 21:11, 18 June 2019 (UTC) comment in the Survey sub-section, he says "With other editors contributing (including myself somewhere down the road, particularly in a HISTORY section) there would be no 'urge to merge.' As for the present Backup article, I'd also be hands-off for a while, to facilitate his work." That amounts to the particular other editor's saying "If you allow me to mess up both the split-off article and the "Backup" article, then maybe I won't feel the urge to re-merge them again." Good faith or not, I simply don't feel he has complete control of his editing impulses; that's why I prefer to go to the Administrators' Noticeboard for at least a selective ban rather than humor him until January 2020 has passed—and IMHO so should you. DovidBenAvraham (talk) 15:51, 19 June 2019 (UTC)
If this were to happen again we are all on top of this already. You, me, and NickCT are all here and I'm pretty sure we have it under control. Any attempt at a re-merger can and Will be opposed by one or all of us. Such action after all this would also hasten decision making when brought up to WP:AN. It may go un-discussed but it won't go unnoticed. Also this split can and will be carried out however it does not need to stay split. If the splitoff article is unsatisfactory we can discuss restoring backup to its former glory (without simplification). If it really is bad faith then I can wait. --NikkeKatski [Elite] (talk) 16:15, 19 June 2019 (UTC)
Acceptable with an additional condition that the particular other editor will have to agree to, which is that I'll recreate the "Continuous data protection" and "Information Repository" articles exactly as they were before he deleted them—and he'll have to agree not to edit them (unless he can add RS'ed "History" sections, which is unlikely). Let me add that I have no connection with SoleraTec, which is still marketing a "federated information repository" application—as is IBM from what I can tell. I'll bet that the particular other editor won't agree to that condition, because IMHO one of his guiding principles is "if I can't understand this article, then no other readers should be allowed to see it on Wikipedia" (see for example his ""This article is not meant [my emphasis] to be at the PhD level ...." comment about the "Outsourcing" article I've recounted here). DovidBenAvraham (talk) 18:15, 19 June 2019 (UTC)
@Pi314m: Perhaps you should start working on a draft and potentially recruit people to help you. Mainly focusing on the history of the enterprise section. And IMO Dovid's requests seem 'reasonable' if you ignore the context. You could probably pull it off. Just avoid being too primary source heavy on everything you work on. --NikkeKatski [Elite] (talk) 18:22, 19 June 2019 (UTC)
I've just discovered a further complication with implementing this additional condition. On 22 May, when the specific other editor started his merge-ins to the "Backup" article, he also did something rather sneaky to the "Data repository" article. Although the lead sentence of that article says "A data library, data archive, or data repository is a collection of numeric and/or geospatial data sets for secondary use in research", the editor inserted an "Information repository" sub-section. That sub-section simply lists three uses of the term "information repository" for "sets" of data that is neither numeric or geospatial in New York Times articles, two referencing IBM and Microsoft applications in 1989 and 2003 and one referencing the Mount Vernon NY Public Library in 1996. I can move these back to the re-established "Information Repository" article, but my inclination is simply to delete at least the Mount Vernon one from where it is now as now as sheer obfuscatory cruft. FYI the "Backup" article originally used the term "data repository", but—as noted here on this Talk page on 16:09, 29 April 2019 (UTC)—I had to change that term to "information repository" because in 2017 Jacob Voss made the former term a synonym for "data library". DovidBenAvraham (talk) 00:26, 20 June 2019 (UTC)"
I recreated the "Continuous data protection" and "Information Repository" articles exactly as they were before the particular other editor deleted them. I was able to recover the revision history for "Information Repository", because User:Christian75 had edited-in {{R with history}} and {{R to section}} after the original re-direct. However I was unable to recover the revision history for "Continuous data protection", even though I reverted my recreation after the fact and put in the templates. I moved the specific other editor's sneaky additional section and sub-section in the "Data repository" article back to the re-established "Information Repository" article, including the one referencing the Mount Vernon NY Public Library. DovidBenAvraham (talk) 02:34, 20 June 2019 (UTC)
I created the "Enterprise client-server backup" article from the section of the "Backup" article as of 20 May, and updated it with two later enhancements. Still to come: (1) Update any links from "Enterprise client-server backup" to "Backup". (2) Update any links from "Backup" to "Enterprise client-server backup". (3) Create new mentions with links from "Enterprise client-server backup" to "Backup". (4) Create new mentions with links from "Backup" to "Enterprise client-server backup". DovidBenAvraham (talk) 04:14, 20 June 2019 (UTC)
I rewrote the "Continuous data protection" sub-section of the "Backup" article using the particular other editor's references, but showing them distinguishing between true CDP and near-CDP per the second through fifth paragraphs here. I left out the ref by Bobby Crouch, which is a pure marketing blurb from a company that pleaded guilty to bribery. DovidBenAvraham (talk) 04:31, 21 June 2019 (UTC)
I rewrote the "Automated data grooming" paragraph of the "Backup" article, restricting the capability to that found in personal backup applications. The particular other editor's references were all to EMC Retrospect Windows 7.5, which was the first version of that application with enterprise client-server features. DovidBenAvraham (talk) 09:58, 21 June 2019 (UTC)
I clarified the difference between creating a synthetic full backup for a single archive file and creating a second archive file from a first. This eliminated the need for the separate "Synthetic full backup" sub-sub-section within the "Incremental" sub-section of the "Backup" article that the particular other editor had moved from the "Enterprise client-server backup" section. DovidBenAvraham (talk) 20:33, 21 June 2019 (UTC)
I deleted the "Enterprise client-server backup" section, since it is now a separate article. As they say at Hogwarts, "Mischief managed". DovidBenAvraham (talk) 20:42, 21 June 2019 (UTC)
I reverted the "Information repository models" sub-section lead—with clarified wording—and sub-sub-section structure to what they were as of 12:02, 20 May 2019, re-inserted the "Unstructured" method in "Backup methods", and replaced the Biersdorfer NYTimes ref in "Remote backup service" with a re-cite of the Forbes ref. IMHO what the particular other editor had done was mostly out of his zeal to merge-in the "Information repository" article—which I have now re-established per my 02:34, 20 June 2019 (UTC) comment, for whatever peculiar reasons. We may not like "Unstructured" as a repository organization method, but it is in fact what a lot of users of "personal" backup applications start out doing. DovidBenAvraham (talk) 04:21, 28 June 2019 (UTC)
The article split has already produced one unfortunate consequence, and IMHO may in future produce another. The first unfortunate consequence is that the combined average weekday pageviews of the split articles (eyeballed by me as 570 +15 = 585) have decreased from the average weekday pageviews of the un-split article exactly a year ago (eyeballed by me as 670) by about 13%. If the particular other editor were editing a periodical for a publishing company, and made changes to the periodical's content that decreased the average readership by 13%, I think he would shortly be out of a job. The second unfortunate consequence may result from my having kept the full "Continuous data protection" sub-subsection that he introduced, only rewording it and quoting the particular other editor's own references. In doing that I have taken pains to preserve the distinction between "true CDP" and "near-CDP" that is in the references when they are read in full. IMHO we shouldn't describe "true CDP" in the "Backup" article, only mention it with a link to a description in the "Enterprise client-server backup" article. That is because, as part of the split, I added a sentence to the first paragraph of the lead "This article focuses on features found even in personal backup applications, as opposed to features found only in enterprise client-server backup applications." As my improved version of the sub-subsection states—conclusively referenced, "true CDP backup must in practice be run in conjunction with a virtual machine—which may rule it out for ordinary personal backup applications." An unfortunate consequence would result if the particular other editor then took offense at my having deleted part of his treasured sub-subsection from the "Backup" article, and pulled the kind of shenanigans he did when he did a "same-article-merge-in" of two paragraphs in the "Performance" subsection of the "Backup" article into earlier sections—with very-inaccurate simplifications—in late May 2019. Are you trusting enough to think his "With other editors contributing (including myself somewhere down the road, particularly in a HISTORY section) there would be no 'urge to merge'" end-of-the-preceding-Survey-section pledge would deter the particular other editor from doing that? DovidBenAvraham (talk) 04:01, 1 July 2019 (UTC)
Started dealing with the second unfortunate consequence by adding several "Backing up interactive applications via true Continuous Data Protection" paragraphs to the new "Enterprise client-server backup" article. These paragraphs are excerpted from the particular other editor's "Backup" sub-subsection—as improved by me, but omit any substantial mention of "near-CDP" backup. DovidBenAvraham (talk) 20:00, 2 July 2019 (UTC)
Continued dealing with the second unfortunate consequence by enhancing "Continuous Data Protection" article with text and references previously added by me to Backup#CDP sub-subsection. DovidBenAvraham (talk) 06:36, 3 July 2019 (UTC)
Finished dealing with the second unfortunate consequence by changing sub-subsection name from "CDP" to "Near-CDP", moving it under "Incremental" because that's what it's a variety of, and getting rid of all discussion of true CDP except explanation of why this is near-CDP. The particular other editor wants easier-to-understand, so that's what this is. DovidBenAvraham (talk) 03:01, 5 July 2019 (UTC)

Those who haven't (including Pi314m) should cast !votes under survey section if they have come to a conclusion as to what their opinion on the matter is. --NikkeKatski [Elite] (talk) 15:49, 18 June 2019 (UTC)


The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Simplifying lead's readability for average encyclopedia reader, taking out value judgments and adviceEdit

I put in a request for advice from Melcous, who put in the {{incomprehensible}} tag. She replied on her Talk page "my opinion would be that it currently comes across as more of a "how to" rather than an encyclopedia article", and was kind enough to go so far as to make a first-pass revision to the article. I like what she's done, although Pi314m may not. My only revisions so far are—in the article lead—to add back the Note and Kissell ref on "archive file"—because that term needs definition for its use in the rest of the article, to add back the specification that backed-up data must already be in secondary storage—to prevent a repeat of the Rmokadem maneuver, and to add back a link to the "Enterprise client-server backup" article—per the outcome of the RfC. I won't go into the reasons for these add-backs, because they were discussed ad nauseam—as linked to—on this Talk page when I originally added them to the article months ago. DovidBenAvraham (talk) 01:02, 10 August 2019 (UTC)

After simplifications to "Near-CDP", clarified what it can and can't do for complex files and applications. DovidBenAvraham (talk) 19:43, 11 August 2019 (UTC)

In going over article after my latest edits, discovered an old Tom's Hardware article is no longer accessible directly and has been removed from the Wayback Machine. This is a disturbing trend I already encountered for another publication ref'd in the Continuous Data Protection article. DovidBenAvraham (talk) 04:30, 15 August 2019 (UTC)

Melcous has now made more changes that in general make the article easier to understand. What she has edited out is mostly text that was added from 2004-2011, or by Pi314m back in May 2019. However there are a couple of exceptions to this, which I'll have to remedy one way or another. One exception is that, by deleting "True" as the first word of the "Near-CDP" sub-section, she concealed a distinction I was at pains to make between true CDP—a feature which is very expensive but vital for high-intensity interactive applications—and near-CDP—a feature which at most is adequate for low-intensity interactive applications (as I stated in part of the "Live data" sub-section text she deleted). The other exception is that the text she deleted included old mentions of enterprise client-server backup features, but also links I had added. Once I originally established "Enterprise client-server backup" as the last section of the "Backup" article in late 2017, those old mentions no longer needed to be there. However I'm worried about pageviews statistics showing fewer people than I think should be are looking at that section since it was split off into a separate article. I've therefore used those mentions to put in appropriate links to the split-off article, but she has now deleted several of those links—leaving no place to put them back in. I'll discuss these problems on the article Talk page, as I think she has suggested; other editors may have suggestions. DovidBenAvraham (talk) 19:06, 18 August 2019 (UTC)

I took a shot at remedying the exceptions. I added maybe 10 screen lines, without disturbing the beauty of Melcous's editing. DovidBenAvraham (talk) 04:02, 19 August 2019 (UTC)

Somehow, in all these years, nobody ever put into the article any mention of the versioning backup feature—other than a mention of a versioning filesystem—that even most personal backup applications have! Motivated at least in part by Pi314m's "internal merge" of "automated data grooming", I have now put in the proper mention—and its justification of a user-initiated backup and restore objective. DovidBenAvraham (talk) 09:59, 22 August 2019 (UTC)

A couple of comments DovidBenAvraham, I don't pretend to have expert knowledge on this subject, but in terms of making it readable and accessible, I don't think the use of italics for emphasis is often helpful ((see MOS:ITALICS) and so would suggest rewriting sentences where this has been done to avoid this. I also think the repeated links to the same page (Enterprise client server backup) add to the clutter and confusion and think in many cases this content could be left out rather than needing to be mentioned here. Melcous (talk) 13:00, 22 August 2019 (UTC)
If Melcous is concerned about the use of italics in my 09:59, 22 August 2019 (UTC) comment above, I did so because I thought the italicized words needed emphasis—and I still think so. OTOH I don't use italics very often in actual articles. DovidBenAvraham (talk) 04:00, 23 August 2019 (UTC)
I belatedly realized that what Melcous is concerned about is my use of True as the first word in the "Near-CDP" sub-subsection of the article. If she reads the "Continuous vs near_continuous" section of the "Continuous Data Protection" article, she will realize that what the original author and I have had to deal with is the perversion of a term for an IT feature to refer to a different feature. The only way I can think of to specify that "CDP" is being used in its original meaning is to prefix its use with true. I could instead use underlining, but I know that's not allowed in articles. If she can think of another easy-to-understand way to specify that "CDP" is being used in its original meaning, I'd love to hear it. DovidBenAvraham (talk) 04:04, 24 August 2019 (UTC)
I think Melcous has been led astray by her web browser as to "the repeated links to the same page (Enterprise client server backup)". Those links are mostly links to sections within that article, which—because I hadn't converted paragraphs within those section into sub-sections—I had to supplement with names of paragraphs to clue the reader as to which specific paragraph to read. I have now converted those paragraphs to sub-sections, and revised the links within the "Backup" article to point directly to those sub-sections. (Note that I haven't used any italics within this comment, even though IMHO some emphases would have made it easier to read.) DovidBenAvraham (talk) 04:00, 23 August 2019 (UTC)
Return to "Backup" page.