Wikipedia talk:Close paraphrasing/Archive 1


Intent & intervention

Great idea, Dcoetzee, and much needed. :) I've added a bit on intent, so my observations suggest that most contributors who do this are not aware that they are doing something improper. I'm wondering if we should add something about where to go for help if somebody encounters someone who persists in doing this or is clearly doing it intentionally. For that matter, I'm wondering where they would go for help. I've addressed a few cases of this through WT:CP, but I'm not sure if that's the best venue since it gets limited response. --Moonriddengirl (talk) 00:10, 31 January 2009 (UTC)

Thanks. :-) Hmm, that's a tough question - I see WP:CP as more for specific instances of close paraphrasing, whereas a user with a persistent issue with it is best treated as a problematic user to be reported at Commons:Administrators' noticeboard/User problems following efforts to rehabilitate them. If their action is intentional and egregious, it might qualify as vandalism, but it'd have to be a pretty bad case for me to make that call. Dcoetzee 02:22, 31 January 2009 (UTC)

Proposed as guideline

Withdrawn: I'm noting that I've withdrawn the proposal to make this a guideline, as I think it needs more time, content, and review to develop. I still appreciate any input you have to offer, however! Dcoetzee 03:32, 24 February 2009 (UTC)

I'm proposing to make this essay into a guideline, so that it will have clear force and consensus, especially since it's linked from the cleanup template {{close paraphrase}}. Please let me know if you support this or if you have any suggestions for improvements (or just be bold and implement them). Thanks! Dcoetzee 20:37, 13 February 2009 (UTC)

  • Oppose as written. You've identified a real problem but the solution will be elusive. "Close paraphrasing" with respect to copyright is very tricky. More than one court case has been won or lost over some judge or jury's interpretation of this. To make matters worse, context is everything. In some contexts, the exact wording is the only copyright-protected element, and a close paraphrase is perfectly legit. In other contexts, such as a newspaper editorial, short story, or book, the ideas and their "overall presentation" are also subject to copyright and a "close paraphrase" would likely be a copyright-violating derivative work. Our policies and guidelines should not impose any burden higher than copyright law plus the requirement to cite references. I'm afraid the best we can do as a guideline is to encourage people to not be lazy when they "rewrite things in their own words" but rather rewrite the content so that it doesn't "sound like" the original if that's at all possible. — Preceding unsigned comment added by Davidwr (talkcontribs)
    • If it's tricky, I think that emphasizes the need for us to explain it somewhere, because some contributors really don't understand the problem - I should emphasize that we're really only concerned with copyright violation and attribution, not purely ethical concerns, so I don't think this is adding new requirements that aren't set by law (this point deserves clarification). I'd also like to incorporate examples of cases where "the exact wording is the only copyright-protected element" to clarify when this is okay, could you suggest some? Dcoetzee 23:49, 13 February 2009 (UTC)
      • I think it's best explained in an essay rather than a guideline. davidwr/(talk)/(contribs)/(e-mail) 00:55, 14 February 2009 (UTC)
    • I made some revisions to try to address some of your concerns, let me know what you think. Dcoetzee 00:27, 14 February 2009 (UTC)
      • I also made a revision. I still think an essay is the way to go with this. davidwr/(talk)/(contribs)/(e-mail) 00:55, 14 February 2009 (UTC)
    • I could write any intermediate versions of the example, and there would be some which would be ambiguous. I could change the tone. Or i could change the sentence structure and the order of sentences. There is no clear line between where it's plagiarism and not. A good case could in fact be made that the 2nd sample is sufficiently distinctive.; a good case could also be made that the third sample isn't distinctive enough. This is different from the RW, where copying someone's ideas without attribution is plagiarism, not just copying their words. Here, we don't contribute our original ideas, but we present other peoples', so we don't have the benefit of that part of the meaning.
The sentence also are not that good a choice because this is the lowest level of factual information, and there is no way of writing a single general paragraph on the cat that does not essentially match some previously written paragraph. There are cases that are even harder: the initial sentences of most biographies will inherently be very similar to descriptions in the CV: there are a limited number of permutations, and they all are similar.
I think this needs more careful presentation with a range of examples , perhaps quoted from a range of authoritative sources for such things, with attribution, but it needs considerable thought in how to adapt it to our specific circumstances. I unfortunately do not have thechance to do it in this weekend, or I would. DGG (talk) 00:39, 14 February 2009 (UTC)
Hi DGG, thanks for your helpful comments. I'd love to get some better examples - can you recommend sources for this? Thanks again. Dcoetzee 22:06, 16 February 2009 (UTC)
  • Question. I would like to see it get a bit of traction before we discuss promoting it to a guideline. Template:Close paraphrase is only 4 weeks old. At this moment it is posted on four articles, but that leaves unanswered the question of how many articles have already been cleaned up. How many has it been used on over its life? --Hroðulf (or Hrothulf) (Talk) 05:30, 14 February 2009 (UTC)
    • I can't give you a number, but I've used it on several--say, three to five? (I'm guessing.) I'm not comfortable letting such remain, though, so if it isn't addressed within a day or two I go back and clean it up. --Moonriddengirl (talk) 12:13, 14 February 2009 (UTC)
  • Oppose pending input from the Foundation's legal counsel on the extent to which this is an issue worth worrying about. We are not lawyers, and there is quite enough copyright paranoia on Wikipedia as it is. Skomorokh 01:54, 24 February 2009 (UTC)

Withdrawn: I'm noting that I've withdrawn the proposal to make this a guideline, as I think it needs more time, content, and review to develop. I still appreciate any input you have to offer, however! Dcoetzee 03:32, 24 February 2009 (UTC)

Merge into Plagiarism

If the Plagiarism proposed guideline gets promoted, merge this into that. davidwr/(talk)/(contribs)/(e-mail) 00:46, 14 February 2009 (UTC)

So far as I can tell, there's no rush on that one. --Moonriddengirl (talk) 00:50, 14 February 2009 (UTC)

Issues

I have the following issues with the essay (and certainly with it as a guideline):

  • No distinction is made with regard to length. A close paraphrase of a sentence is totally different from a close paraphrase of two long paragraphs.
  • Facts cannot be copyrighted. The sentence "Daniel Smith received the Nobel Peace Prize on February 12, 2008", for example, requires no rewording, synthesis of multiple sources, or anything else in order to avoid copyright violations; there is nothing in that sentence that is actually capable of being copyrighted.
  • The proposal that The right way to use [a] source would be to read it, read other sources about cats, internalize the information, and then write original content without looking at the structure of the sources would, if actually followed, end Wikipedia as we know it. 99% of present editors lack the time to meet this standard, even if it were necessary (which, I argue above, it is not).

If Wikipedia were actually being repeatedly sued, even if unsuccessfully, for copyright violations, then maybe we'd need this essay as a guideline. But in fact Wikipedia isn't being sued. And the problem we know that we're having, with regard to copyright violations, is of the cut-and-paste variety, of large amounts of text, rather than close paraphrasing. And that problem falls under the topic of plagarism. -- John Broughton (♫♫) 21:52, 16 February 2009 (UTC)

Thanks for your comments. It is true and useful to note that length is a factoring in determining fair use (which I'll note), and that factual sentences which cannot be substantially reworded are fine. There are actually extensive problems with close paraphrasing actually occurring; this is not theoretical. And to say that it's fine as long as we're not getting sued - well, we could say the same thing about all our copyright policy, it's not a long-term viable practice for creating a free encyclopedia. The allusion to plagiarism is just confusing; cut-and-paste of large amounts of text from a non-free copyrighted source, with or without attribution, is already addressed by our copyright policy and handled adequately by existing mechanisms. Dcoetzee 22:04, 16 February 2009 (UTC)
(edit conflict) Actually, we get many articles reported at WP:CP for close paraphrase (sometimes intermingled with cut-and-paste infringement). For a single example, see Amalie von Wallmoden, Countess of Yarmouth, where the author of the original source at Oxford Dictionary of National Biography himself complained. Distinction with respect to length could be useful, as might other information, but close paraphrasing can all too easily cross the line into copyright infringement. This is why I am of the opinion that guidance on such concerns is much needed, whether as an essay or a guideline. --Moonriddengirl (talk) 22:05, 16 February 2009 (UTC)
Thanks for the example. [I note, in passing, that the articles on Falkes de Breauté and William Tresham may still be problematical with regards to close paraphrasing of the ODNB; I see no deletions or substantial changes to the significant (otherwise unsourced) content added by User:Ironholds in October 2008.] -- John Broughton (♫♫) 18:55, 17 February 2009 (UTC)
Thanks for reminding me. I had overlooked those in addressing the articles raised at that time. I'm somewhat challenged by cleaning these, because I don't have access to the ODNB originals, but I'll see if I can get the originals for comparison so that I can take care of that. Meanwhile, I'm in a massive clean-up effort of close paraphrasing & cut-and-paste infringement at User:Moonriddengirl/sandbox. (Fortunately, I've had help; and I have to note for clarification that some of those do not cross the line into infringement and are easily resolved, while some--like Joe B. Fields--went way into the land of infringement.) --Moonriddengirl (talk) 19:12, 17 February 2009 (UTC)

Crepuscular vs. nocturnal

I think I detected a distracting error in the example. IIRC, "crepuscular" doesn't mean "mainly active at night" as the exemplar paraphrase indicates, but "mainly active in the twilight hours, that is dusk & dawn". Nocturnal means "mainly active at night". (Note: I don't have my dictionary at hand, so I'm relying on my memory here, so it may actually mean "solely active at night". This is one of the places where a dictionary is invaluable: for understanding nuances of words.) I think it is important -- whether this becomes another policy, or it remains an essay -- to make sure the information in the example is correct. -- llywrch (talk) 18:23, 23 February 2009 (UTC)

Another example

In mind of DGG's comments above, I've crafted an original passage on an imaginary fashion model that I thought might make a good example. I want to post it here to see if others feel it might serve this essay.

  • Example: source

Felice Phillipe knew from the time she was six years old that she was interested in a career in modeling. Throughout her childhood, she pored over fashion magazines, paying attention not to the fashions, but the deportment of the models. But she was not simply attempting to cultivate their stances, their expressions, not, in a word, simply mimicking these women: Felice was trying to understand what magic they possessed, trying to identify what it was that made a supermodel. In this, she seems to have been successful. Emile Aiton of the Elite Agency, who responded to the portfolio Felicia submitted to Elite when she was only 14, says that it was not Felice's appearance that caught his eye, but the way she effortlessly owned her skin. "Beauty is nothing," he says. "We get dozens of beautiful girls every day. I do not care for beauty; I care for charisma. It is charisma I want. Felice, she has that in abundance." The rest of the fashion world seemed to agree. Felice's debut campaign, an international splash print ad for Calvin Klein jeans that was featured in all the top fashion magazines, created an immediate buzz. Aiton remembers, with a chuckle, "The phone was ringing off of the hook. They all wanted to know her name. That all wanted Felice." And the rest, as they say, is history.

  • Example: close paraphrase

Phillipe knew from the time she was six years old she was interested in modeling. When she was a teenager, she pored over fashion magazines. She wasn't trying to mimic the models' stances and expression, but to understand what magic they possessed, the deportment of the models. She did. Emile Aiton of the Elite Agency, who got the portfolio she submitted when she was only 14, says she has charisma in abundance. When her debut campaign ran in all the top fashion magazines--an international splash ad for Calvin Klein jeans--everybody started talking. Aiton chuckled as he remembered how the phone at Elite rang off the hook with people trying to hire her. The rest is history.

  • Example: good paraphrase

From the time she was six, Phillipe was studying fashion models in magazines to help her become a success in the modeling career she already knew she wanted. Elle suggests this may have contributed to her hiring at age 14 by the Elite Agency's Emile Aiton, who told the magazine he had been drawn by her charisma. When Calvin Klein featured her in an international print campaign promoting its jeans, her first ad, she achieved widespread notice in the industry.

I'm trying to get my "close paraphrase" far enough away that it might not be immediately tagged {{copyvio}}, but close enough that the problem is obvious. I'd like my example to fail the "substantial similarity" test without being straightforward copy-paste such that it wouldn't even be considered a "paraphrase" in any legitimate sense of the word. :) If others feel that this example might be useful, I'm more than open to alterations in the text to help achieve those goals. :) Thoughts? --Moonriddengirl (talk) 19:03, 23 February 2009 (UTC)

Newspaper examples

I thought I'd present some examples of paraphrasing from newspapers here. Newspaper reporters are, it occurred to me, in a similar position to WP editors. If they write an article about history or science, they are not usually writing as historians or scientists in their own right, but as reporters summarising the work of others. In other words, like us, they are not expected to contribute original research to match the research of the people they are citing.

This is a key difference between academia and Wikipedia. An aspiring academic is accused of plagiarising when they do not add any of their own original thought to their essay or thesis and rely solely on the work of others. Wikipedians, on the other hand, are told to avoid adding their own analysis and original thought, and to confine themselves to reproducing the ideas of others.

So when it comes to how much paraphrasing we should do, and what plagiarism means in WP, as opposed to academia, we should take our cue from newspaper articles rather than the instructions given to undergraduates. Note that in all the examples below, the author is named explicitly in the text. I searched the papers' websites with search strings such as "in his book" and "wrote that". I have focused on cases where the paper paraphrased. In the majority of cases, the papers gave full quotations marked with quotation marks.


Examples from The Independent

The Independent: Tom Flanagan, a former campaign director for Mr Harper, called it an eloquent speech in his book, Harper's Team, and wrote that they printed out the speech in pamphlet form and sent out thousands of copies. [1] (fittingly, an article about plagiarism)

Source: Stephen responded on 20 March with an eloquent speech in the House of Commons … [2] We printed the speech in pamphlet form and mailed out thousands of copies. [3]

Comments: Book title named as well as the author. Unmarked verbatim and close paraphrase.


The Independent: Steel wrote about the union movement's treatment of the Liverpool dockers. (Steel wrote that Morris's version of trade unionism was little more use than the Yellow Pages, as they both seem to be able to help you get insurance and credit cards: “If your union cannot defend you against a ruthless employer, there is no point ringing Direct Line,” he wrote.) [4]

Source: Union leaders ponder how to stop the decline in union membership, and produce insurance schemes and credit cards to attract members. The trouble is you can get these things anyway with the help of Yellow Pages, but if your union cannot defend you against a ruthless employer there is no point in ringing Direct Line instead. [5]

Comments: Author named in text. Fairly close paraphrase and a verbatim clause ("If your union ...") marked by quotation marks.


The Independent: Directed towards biology by A.G. Lowndes, his teacher at Marlborough College, Young later wrote that he loved it from the first day, dissecting a rabbit before breakfast. He went up to Magdalen College, Oxford, as a Demy, to read zoology ... [6]

Source: He realized that I was no good at chemistry and suggested that I study biology. I loved it from the first day, dissecting a rabbit before breakfast. […] After one year under the guidance of Lowndes, I became a Demy (scholar) of Magdalen College, Oxford, and there studied zoology … [7]

Comments: Author named. Unmarked verbatim, arguably involving creative expression.


Examples from the New York Times

New York Times: In a new biography of the former President, a historian, Herbert S. Parmet, wrote that as Mr. Quayle became the butt of jokes, Mr. Bush "did everything he could to show his confidence in the younger man." But in his diary, Mr. Bush wrote, "It was my decision, and I blew it, but I'm not about to say that I blew it." [8]

Source: Dan Quayle came out of New Orleans as the easy butt of jokes, an implausible vice president. Bush, however, did everything he could to show his confidence and loyalty to the younger man, gaining, in the process, admiration for standing by him. In the privacy of his diary, however, Bush wrote at the end of that convention week that "it was my decision, and I blew it, but I'm not about to say that I blew it." [9]

Comments: Author and book named. "Butt of jokes" not marked as verbatim. Very close paraphrase of the diary clause. Marked verbatim, but incorrectly reproduced ("confidence and loyalty to").


New York Times: She wrote that Mr. Stratemeyer was bitterly disappointed with the first one, saying the heroine was "too flip." But it and the next two books were huge financial successes … [10]

Source: Mr. Straterneyer expressed bitter disappointment when he received the first manuscript, The Secret of the Old Clock, saying the heroine was much too flip and would never be well received. On the contrary, when the first three volumes hit the market they were an immediate cash-register success for the syndicate. [11]

Comments: Author named. Close paraphrase, with one brief marked verbatim.


New York Times: Mr. Sudoplatov, now 87 and in poor health in a Moscow hospital, wrote that Dr. Oppenheimer and his colleagues were fierce opponents of violence who believed that sharing the secrets of atomic energy would create a balance of power to avert nuclear war. [12]

Source: Oppenheimer, Bohr and Fermi were fierce opponents of violence, they would seek to prevent a nuclear war, creating a balance of power through sharing the secrets of atomic energy. This would be a crucial factor in establishing the new world order after the war, and we took advantage of this. [13] (p. 195, visible using amazon search inside function).

Comments: Author and book named. Multiple close paraphrases.

Further thoughts

As I understand it, there are several aspects that impinge upon the appropriateness or otherwise of close paraphrase.

  1. First, we have the principle that ideas are not copyrighted, only their creative expression is. Note the word creative. A sentence like "Doris Lessing won the Nobel Prize for Literature in 2007" does not involve creative expression. It is a mere description of fact.
  2. Fair use is partly dependent on what purpose the use serves. In the case of newspapers, the purpose is to generate income for the publisher, while in the case of Wikipedia, the purpose is non-profit educational use. Non-profit educational use is more likely to support a fair-use rationale than use to generate income.
  3. The effect on the original author's ability to derive income from their work is a key element in assessing fair use. In the case of Wikipedia, a close paraphrase of a handful of sentences from a book is unlikely to diminish the market for that book. Indeed, it is likely to serve as an advertisement for the book, especially if the author and/or title is named in the article text as well as in the footnote reference. On the other hand, if a WP article copies an entire web page or an entire journal article or book, using close paraphrasing, this might very well diminish the item's financial success.

Looking at the above examples from UK and US professionals, I believe a close paraphrase of a short passage in a book is perfectly acceptable in Wikipedia if the author and/or the work has been named in the text. This being Wikipedia, a footnote reference citing the page number should of course be added as well. Jayen466 19:36, 2 May 2009 (UTC)

I share your opinion (or what I believe to be your opinion) that we can paraphrase more closely when the author is named in the text. Your point one is quite correct. However, point 2 is problematic for Wikipedia. Our purpose is to create a reference work that can be reused widely, evenly commercially. This is why even though we are a non-profit organization we do not accept material licensed for non-commercial use only. This is also why we have deliberately adopted a standard for handling non-free material that is more strict than the United States definition of fair use. (WP:NFC explicitly states so with respect to images, but NFC also applies to text.) With respect to competition, fair use (of course) is a four-pronged test. A close paraphrase of a handful of sentences from a book is unlikely to represent substantial taking of the book (though there's likely to be rare cases where it may). But if the source is not-free, it's always likely to come back to the test at NFC: "It's used for a purpose that can't be fulfilled by free material (text or images, existing or to be created)." Close paraphrase is seldom necessary, since we can use limited direct quotation as necessary and revise as not. --Moonriddengirl (talk) 19:52, 2 May 2009 (UTC)
Thanks for pointing out the problem with point 2, which is now readily apparent. As for point 3 and close paraphrase being seldom necessary, I only agree with you half-heartedly. Sometimes yet another direct quote feels too much. It is too intrusive, and the grammar just doesn't flow. Now, if it can be rephrased beautifully, brilliant. But there are also potential downsides. You may lose punchiness by non-close paraphrasing. The result may feel contrived. Some things have simply been said well, the easiest way they can be said. Why reinvent the wheel? Or sometimes you may end up misrepresenting your author, subconsciously and inadvertently adding interpretations that are entirely your own. I see value in the occasional close paraphrase, as in some of the above examples. We shouldn't come across as saying "Thou shalt not use close paraphrase." More like, Try to do without it, but if you've named and cited your author, we won't hang you for it, either. Jayen466 20:19, 2 May 2009 (UTC)
(after e/c, maybe moot now) Jayen, your examples above all seem to involve descriptions of direct statements by other people, followed immediately by a cite to the actual work. I'd call all of those acceptable paraphrasing. Is there a dispute that such sourced wording would not be acceptable in a wiki-article? There are only so many adjectives in the English language, seems to me that mandatory use of a thesaurus is not envisioned here. (And agree with MRG that GFDL content can be used for commercial purposes) Franamax (talk) 20:21, 2 May 2009 (UTC)
Unfortunately I've seen an experienced editor use this argument in a content dispute a few days ago, first deleting a close paraphrase (with the author named in the text and his work cited in the footnote), as "copyvio" and then later on claiming it was still too "close to WP:Plagiarism." Jayen466 20:56, 2 May 2009 (UTC)
(Reply to Jayen; threading and all. :)) This may not be as much of a problem when closely paraphrasing non-copyrighted, although with free text there's also no good reason not to directly quote if the original words are so pithy that they can't be sufficiently modified. However, it's a serious problem when dealing with copyrighted material. (I'm not disputing your judgment on the acceptability of the paraphrases above, Franamax. :) I'm talking general philosophy.) Naming and citing your author won't make a valid defense if we substantially reproduce another source to the point of substantial similarity. I have seen some contributors who are too cautious about paraphrasing, but far more who lean the other direction and would think that changing 3 words out of a stretch of 30 would be an acceptable paraphrase. It's not, even if it starts with "John said...." (For an example of an article with close paraphrasing concerns using non-free source, see [14]. I'm linking to old because the examples have been redacted.) --Moonriddengirl (talk) 20:31, 2 May 2009 (UTC)
Again, I think it is a question of how much of the source is reproduced in this way. If it's 30 words from a 300-page book, commenting on someone's assessment of a person or event, is that a problem? It does not seem to have been a problem for the above papers. If it is three paragraphs from a website, reproduced verbatim and then edited with a thesaurus, like here, I readily agree it is an entirely different matter. Jayen466 20:56, 2 May 2009 (UTC)
Jayen, that is the lamest example of "paraphrasing" I've seen yet! "elongated snout" --> "long nose"? "similar" --> "like"? My purple butt. :) Franamax (talk) 21:42, 2 May 2009 (UTC)
Even if the above paraphrases were problematic, I'd have to answer your "does not seem to have been a problem" with an tweaked WP:OTHERSTUFF. :) Unless we had a reliable reference guide to journalism that cited those as examples of acceptable paraphrase, they aren't useful to us in determining how journalistic ethics apply to paraphrase. For all we know, the AP author of that first article was fired for plagiarism the following day. We'd also need a larger sample size to even begin to argue what is common journalist standard. But all this is theoretical, working off the "if the above paraphrases were problematic". They aren't. All three of them make very clear that they are not only citing a source, but paraphrasing directly that source's speech, with markers like "wrote" and "called it".) But leaving aside that, I'm afraid that I don't see what your point is here. This essay indicates pretty clearly that "If a non-free copyrighted source is being used, it is recommended to use original language and direct quotations, to clearly separate source material from original material; nevertheless, limited close paraphrasing may be acceptable under fair use in some cases." Are you proposing to add a specific example of this? It certainly might be worth noting that when attributed within the text as a clearly noted paraphrase of a quotation, close paraphrasing is less of an issue. But I think that your examples are overwhelming to the overall length of the essay. --Moonriddengirl (talk) 21:16, 2 May 2009 (UTC)
Relax ... no intention to get these examples into this essay. :) And your points are well taken. As for specific improvement suggestions, the lede makes close paraphrasing sound categorically like a crime:

Copyright law (see Wikipedia:Copyrights) forbids Wikipedia contributors from copying information directly from other sources except in limited cases and with attribution. Close paraphrasing unsuccessfully attempts to circumvent these restrictions by copying and superficially modifying information from another source.

If I were to paraphrase this (!), I might write: "Close paraphrasing is when someone is breaking copyright law and is trying to hide the fact they are doing it." That may be too harsh. Jayen466 22:17, 2 May 2009 (UTC)
We have the sentence "When using a close paraphrase legitimately, citing a source is highly recommended, and in some cases, required." I would suggest changing this to "When using a close paraphrase legitimately, it is recommended that the author be expressly named in the text and that the source be cited in a footnote." (or something equivalent). Jayen466 22:28, 2 May 2009 (UTC)
I think that sounds like a good change. The opening may need a bit of thought. I'm afraid I'm sneaking off from cooking supper even to write this. :) --Moonriddengirl (talk) 22:34, 2 May 2009 (UTC)

Fiest v. Rural section misleading?

I read Fiest v. Rural in October 2005 when someone with a POV to push nominated an article I started as a {{copyvio}}.

I think I read it pretty closely. Either I didn't, or the section of this essay devoted to the SCOTUS ruling is pretty darn misleading.

The SCOTUS ruling stated, in general, that lists of "facts" aren't copyrightable. Then it spelled out that there were exceptions, where a list could be copyright, because, for instance, there was something remarkable about how the entries were ordered.

The section that mentions Fiest v. Rural reverses the sense of the SCOTUS ruling. I would be very disturbed if this essay were to be turned into a guideline while this section remained so misleading... Geo Swan (talk) 06:29, 28 June 2009 (UTC)

I have also read it very closely, and I don't see it as misleading. Perhaps examining the particulars will help. The passage says:

Note, however, that closely paraphrasing extensively from a non-free source may be a copyright problem, even if it is difficult to find different means of expression. In Feist Publications v. Rural Telephone Service, the United States Supreme Court noted that factual compilations of information may be protected with respect to "selection and arrangement, so long as they are made independently by the compiler and entail a minimal degree of creativity," as "[t]he compilation author typically chooses which facts to include, in what order to place them, and how to arrange the collected data so that they may be used effectively by readers"; the Court also indicated that "originality is not a stringent standard; it does not require that facts be presented in an innovative or surprising way" and that "[t]he vast majority of works make the grade quite easily, as they possess some creative spark, 'no matter how crude, humble or obvious' it might be."[15]

As I read this, it indicates that the original, creative elements in factual compilations are copyright protected and, accordingly, one cannot closely paraphrase extensively from them. This seems very much what the Court indicated. You can use facts, but you cannot use creative words and structure. "Close paraphrasing" is all about appropriating creative words and structure.
Perhaps you are remembering the conclusions of the Court more strongly than they were actually set out? Feist was found to have no copyright protection in an alphabetical directory of names and addresses because such a directory exhibited no creativity in arrangement or language. But the Court did not reverse the long-standing copyright protection on "lists of facts". It merely clarified it. It set out the crux of the issue here: "This case concerns the interaction of two well-established propositions. The first is that facts are not copyrightable; the other, that compilations of facts generally are. Each of these propositions possesses an impeccable pedigree." (my emphasis added, since this seems to be the opposite of what you assert above). After establishing this, it explained what elements in compilations of facts are copyrightable and why. A few additional quotes, "Notwithstanding a valid copyright, a subsequent compiler remains free to use the facts contained in another's publication to aid in preparing a competing work, so long as the competing work does not feature the same selection and arrangement"; "Thus, if the compilation author clothes facts with an original collocation of words, he or she may be able to claim a copyright in this written expression. Others may copy the underlying facts from the publication, but not the precise words used to present them"; "As applied to a factual compilation, assuming the absence of original written expression, only the compiler's selection and arrangement may be protected; the raw facts may be copied at will"; "A factual compilation is eligible for copyright if it features an original selection or arrangement of facts, but the copyright is limited to [499 U.S. 340, 351] the particular selection or arrangement. In no event may copyright extend to the facts themselves."
If after review you still find the passage misleading, we may need to find different language to convey what is meant: that you cannot appropriate the creative language and structure from copyrightable expression, even if you may use the facts. --Moonriddengirl (talk) 12:07, 28 June 2009 (UTC)

Possible suggestions

I was accused of this (probably rightly so) and so I went and read about it on the web. It's a tricky matter, and I had to read quite a bit before actually finding useful suggestions on how to avoid it. Most articles had examples similar to our "Felis catus" one, which did not help me. I tend to write about low notability subjects, and the source article will only have 1-3 sentences on the subject. It's easy to paraphrase 5 paragraphs into 5 sentences without plagiarizing, but its hard to paraphrase 3 sentences into 2 sentences without doing it too closely (for low notability subjects, you want to grab all the info you can). Anyways, after reading lots of unhelpful examples (with recommendations like "read other sources about cats", when they possibly don't exist), I found three recommendations to be useful.

  • Put away the source material while paraphrasing, and don't look at it.
  • Read over your paraphrase as if you were your own worst enemy, looking for plagiarism.
  • If there's no good way to paraphrase, direct quote it.

I was hoping we could add something like this to the article, but I figure discussing it first is a good idea. - Peregrine Fisher (talk) (contribs) 04:37, 19 August 2009 (UTC)

Original research

The (cat) source says:"It has been associated with humans for at least 9,500 years." You recommend conveying this by: "The earliest known cat lived over 9,500 years ago" which is a completely different statement, & one which needs a "domesticated" to make it true, I would imagine. Johnbod (talk) 02:33, 6 October 2009 (UTC)

  • The general problem with this essay is in Wikipedia, close paraphrasing is required to avoid original research concerns. Trying to convey the source's statements with other words will very often change the meaning. II | (t - c) 22:04, 18 July 2010 (UTC)
  • It may be difficult, but it must be done; US copyright law requires sufficient alteration of copyrighted content to avoid copyright infringement. --Moonriddengirl (talk) 22:09, 18 July 2010 (UTC)
I've altered the proposed second paragraph. For one thing, we must indeed sacrifice style is the only reason to closely paraphrase a copyrighted work is that the original flows better. That said, changing from active voice to passive voice is seldom going to be sufficient rewriting to avoid close paraphrasing anyway. I've made other changes to clarify; we can't very closely paraphrase meaning--we can only closely paraphrase language, and if that language is copyrighted this is forbidden by our copyright policy and by US copyright law. I believe I may have conveyed the intended message by altering:

Thus the meaning conveyed by the article must be very closely paraphrased to the source's meaning. Replacing words used by the source with words which seem synonymous to the Wikipedia editor may unacceptably change the meaning. Using quotes around key words or phrases is one way to avoid this issue.

to

Thus the meaning conveyed by the article must follow the source's meaning. Language should be selected with care to avoid unacceptably changing the meaning.

If you disagree, we can always restore the status quo while we discuss other possibilities. --Moonriddengirl (talk) 22:32, 18 July 2010 (UTC)
Your changes are fine - actually a bit more succinct. I still worry about people sacrificing style, but I suppose it can't be helped. Were John's comments (#Issues) introduced into the article? If John is correct that facts which cannot be reworded cannot be copyrighted - and you seemed to say he was - then this resolves some of the issue. But this does not appear to be in the article, and I suspect that John may not be quite correct in regard to plagiarism. II | (t - c) 22:52, 18 July 2010 (UTC)
Derrick seems to have incorporated something about scale in response, here. The second point is touched upon in the section that begins "It is also permitted when there are only a limited number of ways to say the same thing." Personally, I think if we try to explain that much further, we're going to need to be very careful with our language. It's very true that non-creative language is not protected, though close paraphrasing issues may still exist even then if the structure is appropriated. The Supreme Court has deliberately set the threshold for creativity very low, as the Court reminded in Feist v. Rural:

As mentioned, originality is not a stringent standard; it does not require that facts be presented in an innovative or surprising way. It is equally true, however, that the selection and arrangement of facts cannot be so mechanical or routine as to require no creativity whatsoever. The standard of originality is low, but it does exist.

I work with copyright problems a lot on Wikipedia, and I suspect most people would be astonished what people try to argue is non-creative. :/ Explaining it is pretty difficult, since there's no objective standard.
As to his third point, I'm not sure I entirely grasp it. :) He seems to have suggested that Wikipedia contributors do not properly research their work, which may be true. Doesn't change the fact that it's the best approach to the problem, tho. --Moonriddengirl (talk) 15:19, 19 July 2010 (UTC)
All of this is correct as far as I know - please feel free to revise it. It's very hard to come up with a good illustrative example of close paraphrasing and correct rewording because a paragraph is such a limited context as compared to a complete article, but I'm sure there's room for improvement in this. In the third point, I was making the point that "blind copy pasting" of text from an external source that you haven't completely read or understood can result in content that has policy problems (for example, a lot of the EB1911 articles had tone problems that had to be corrected). Dcoetzee 15:29, 19 July 2010 (UTC)

Difference: Copyright and Plagiarism Issues

I think the essay has it totally wrong on permitting close paraphrasing from PD sources: The copyright issue does not arise in such case but it is still plagiarism, and therefore unethical. The only acceptable instance is attribution not via a <ref> tag but in the style of {{1911}}. The difference is that this template actually admits that possibly entire paragraphs have been taken over unchanged. The common meaning of a footnote does not do that. Detailed reasoning here. --Pgallert (talk) 10:48, 4 November 2010 (UTC)

PS: Wikipedia is not in the business of educating its contributors, and does not generally concern itself with subjective ethical concerns
Seems to contradict Jimbo's statement here. --Pgallert (talk) 11:05, 4 November 2010 (UTC)
I agree and your comment is a good summary - the guidelines on plagiarism at Wikipedia:Plagiarism are quite clear about this, and anything that I wrote that contradicts this should be disregarded or modified. Dcoetzee 11:33, 4 November 2010 (UTC)
I've modified the essay, which was created before Wikipedia:Plagiarism was "voted" into a guideline. Better? Worse? :) --Moonriddengirl (talk) 12:03, 4 November 2010 (UTC)
Much better than before but not yet there. ;) The only permitted close paraphrasing can be if you do not only acknowledge the source of the idea but also the source of the text itself. Example: (partly cross-posting from Wikipedia talk:Plagiarism) There is a huge difference between {{cite DNB}} and {{DNB}} (below): One template acknowledges that you got the idea from there. The other one mentions the possibility that you got the text from there (the idea+the wording). That's exactly the difference between plagiarising and not plagiarising. --Pgallert (talk) 07:12, 5 November 2010 (UTC)
  • {{cite DNB}}:   Dictionary of National Biography. London: Smith, Elder & Co. 1885–1900. {{cite encyclopedia}}: Missing or empty |title= (help)
  • {{DNB}}:   This article incorporates text from a publication now in the public domainDictionary of National Biography. London: Smith, Elder & Co. 1885–1900. {{cite encyclopedia}}: Missing or empty |title= (help)
Yes, but all that is still set out at Wikipedia:Plagiarism, unless I'm mistaken, which is why I wrote "if a contributor closely paraphrases public domain or copylefted content, he or she should explicitly acknowledge that" and "The procedure for properly attributing close paraphrasing of public domain or compatibly licensed content is the same as that of attributing copied public domain or compatibly licensed content. See specifically Wikipedia:Plagiarism#Sources under copyleft and Wikipedia:Plagiarism#Public domain sources for the procedure." Sending them to the guideline for the procedure helps avoid this essay going out of date if the guideline recommendations are ever refined or changed. --Moonriddengirl (talk) 10:44, 5 November 2010 (UTC)
"[I]f a contributor closely paraphrases public domain or copylefted content, he or she should explicitly acknowledge that" -- I know that you know that :) What is not clear to many, is what exactly the that, the last word in this quote, means. Many[dubious ] believe putting an ordinary reference, without any additional comment, would be fine. Should possibly be accompanied by an example that does not use any template to make clear that this is a general requirement, not something that only applies to EB or DNB. --Pgallert (talk) 12:32, 5 November 2010 (UTC)

Example:
Some closely paraphrased text.<ref>This paragraph has been closely paraphrased from the following Public Domain source: {{cite book|title=Some title|and-so-on}}</ref>

You're right; I have seen many examples in my time of things I thought obvious that others did not. :) For now, I've just expanded to clarify what "that" means. --Moonriddengirl (talk) 12:57, 5 November 2010 (UTC)

Feist v. Rural

I think it is a bit odd that in Wikipedia:Close_paraphrasing#When_is_close_paraphrase_permitted? we are using a decision that concerned telephone directories and compilation lists as a basis for discussing close paraphrasing. It is true that the quoted decision quotes some other pertinent decisions, but the overall thrust seems to be slightly off the mark. We would not want to paraphrase an alphabetical telephone directory (which the Supreme Court found was not copyrightable, by the way). Some of the points made may be relevant to whatever guidelines we have for our list articles (i.e. we must not copy the precise selection and arrangement of published list), but close paraphrasing usually occurs when we cite narrative sources, rather than lists.

The present wording in the essay is the following:

If a non-free copyrighted source is being used, it is recommended to use original language and direct quotations, to clearly separate source material from original material; nevertheless, limited close paraphrasing may be acceptable under fair use in some cases. It is also permitted when there are only a limited number of ways to say the same thing. In general, sentences like "Dr. John Smith earned his medical degree at State University" can be rephrased "John Smith earned his M.D. at State University" without copyright problems. Note, however, that closely paraphrasing extensively from a non-free source may be a copyright problem, even if it is difficult to find different means of expression. In Feist Publications v. Rural Telephone Service, the United States Supreme Court noted that factual compilations of information may be protected with respect to "selection and arrangement, so long as they are made independently by the compiler and entail a minimal degree of creativity," as "[t]he compilation author typically chooses which facts to include, in what order to place them, and how to arrange the collected data so that they may be used effectively by readers"; the Court also indicated that "originality is not a stringent standard; it does not require that facts be presented in an innovative or surprising way" and that "[t]he vast majority of works make the grade quite easily, as they possess some creative spark, 'no matter how crude, humble or obvious' it might be."[1]

Now If someone has produced a compilation, e.g. a business directory, containing the names and telephone numbers of plumbers, dentists and estate agents, then I am not allowed to reprint and sell exactly the same directory listing myself. This is what is meant where the essay says, factual compilations of information may be protected with respect to "selection and arrangement, so long as they are made independently by the compiler and entail a minimal degree of creativity," as "[t]he compilation author typically chooses which facts to include, in what order to place them, and how to arrange the collected data so that they may be used effectively by readers."

But the wording, as presented, seems to me to fudge the issue that raw facts are never copyrightable. They can always be copied at will. The way it stands in the essay, it's apt to give the reader the impression that a source acquires some copyright over facts, by dint of having selected them for inclusion.

The relevant passage from Feist v. Rural is as follows (I've highlighted the parts that comment on the non-copyrightable nature of facts in themselves):

Factual compilations, on the other hand, may possess the requisite originality. The compilation author typically chooses which facts to include, in what order to place them, and how to arrange the collected data so that they may be used effectively by readers. These choices as to selection and arrangement, so long as they are made independently by the compiler and entail a minimal degree of creativity, are sufficiently original that Congress may protect such compilations through the copyright laws. Nimmer 2.11[D], 3.03; Denicola 523, n. 38. Thus, even a directory that contains absolutely no protectible written expression, only facts, meets the constitutional minimum for copyright protection if it features an original selection or arrangement. See Harper & Row, 471 U.S., at 547 . Accord, Nimmer 3.03.
This protection is subject to an important limitation. The mere fact that a work is copyrighted does not mean that every element of the work may be protected. Originality remains the sine qua non of copyright; accordingly, copyright protection may extend only to those components of a work that are original to the author. Patterson & Joyce 800-802; Ginsburg, Creation and Commercial Value: Copyright Protection of Works of Information, 90 Colum.L.Rev. 1865, 1868, and n. 12 (1990) (hereinafter Ginsburg). Thus, if the compilation author clothes facts with an original collocation of words, he or she may be able to claim a copyright in this written expression. Others may copy the underlying facts from the publication, but not the precise words used to present them. In Harper & Row, for example, we explained that President Ford could not prevent others from copying bare historical facts from his autobiography, see 471 U.S. at 556-557, but that he could prevent others from copying his "subjective descriptions and portraits of public figures." [499 U.S. 340, 349] Id. at 563. Where the compilation author adds no written expression, but rather lets the facts speak for themselves, the expressive element is more elusive. The only conceivable expression is the manner in which the compiler has selected and arranged the facts. Thus, if the selection and arrangement are original, these elements of the work are eligible for copyright protection. See Patry, Copyright in Compilations of Facts (or Why the "White Pages" Are Not Copyrightable), 12 Com. & Law 37, 64 (Dec. 1990) (hereinafter Patry). No matter how original the format, however, the facts themselves do not become original through association. See Patterson & Joyce 776.
This inevitably means that the copyright in a factual compilation is thin. Notwithstanding a valid copyright, a subsequent compiler remains free to use the facts contained in another's publication to aid in preparing a competing work, so long as the competing work does not feature the same selection and arrangement. As one commentator explains it: "[N]o matter how much original authorship the work displays, the facts and ideas it exposes are free for the taking. . . . [T]he very same facts and ideas may be divorced from the context imposed by the author, and restated or reshuffled by second comers, even if the author was the first to discover the facts or to propose the ideas." Ginsburg 1868.
It may seem unfair that much of the fruit of the compiler's labor may be used by others without compensation. As Justice Brennan has correctly observed, however, this is not "some unforeseen byproduct of a statutory scheme." Harper & Row, 471 U.S., at 589 (dissenting opinion). It is, rather, "the essence of copyright," ibid. and a constitutional requirement. The primary objective of copyright is not to reward the labor of authors, but "[t]o promote the Progress of Science and useful Arts." Art. I, 8, cl. 8. Accord, Twentieth Century Music Corp. v. Aiken, 422 U.S. 151, 156 (1975). To this end, copyright assures authors the right to their original [499 U.S. 340, 350] expression, but encourages others to build freely upon the ideas and information conveyed by a work. Harper & Row, supra, at 556-557. This principle, known as the idea/expression or fact/expression dichotomy, applies to all works of authorship. As applied to a factual compilation, assuming the absence of original written expression, only the compiler's selection and arrangement may be protected; the raw facts may be copied at will. This result is neither unfair nor unfortunate. It is the means by which copyright advances the progress of science and art.

As far as I can see, any and all facts described in a source are free for the taking. Selection and arrangement are protected when the matter concerns the production of a compilation, like for example the Yellow Pages, with category names, and specific items of information selected for listing under specific categories. However, the fact that President Ford "selected" certain facts for inclusion in his autobiography, and omitted others, does not mean that any of the raw facts included in the autobiography thereby become unfree. Any and all of them are free for the taking. --JN466 13:58, 10 November 2010 (UTC)

You have yourself highlighted the passage of concerns: "a subsequent compiler remains free to use the facts contained in another's publication to aid in preparing a competing work, so long as the competing work does not feature the same selection and arrangement." I don't have time to embark on another long round of conversation today, so I'll see if I can explain this succinctly. If I write a biography of a famous individual and choose facts from his life which I feel are pertitent to his development, I am using creativity in my selection of what facts to present. The facts are not protected; my selection and arrangement of them (including what I prioritize) is. If you come along and write a biography focusing on the same facts I have focused on and do not otherwise clear "fair use", you are misappropriating my creative work. When the facts are such that anyone would present them and in the same order, there is no creativity; when they are not, there is. Facts are free; my creative structuring of them is not. --Moonriddengirl (talk) 14:10, 10 November 2010 (UTC)
The reason I mentioned this is that Hans Adler for example appeared to argue the other day that we must not present too many detailed facts from a source, as doing so might result in substantial similarity, and thus a copyright infringement. If that was his argument, then I believe he was mistaken; while any creative expression and structuring of such facts is protected, the facts themselves are not. --JN466 14:14, 10 November 2010 (UTC)
He's right, for the reasons I just set out. That the car was black, that they went to a specific supermarket, that they switched to a certain other color car, these are creative assignments of weight to facts by the writer. I don't have time to find the link, because I've got copyright cleanup to do, but my note to you there about Michael Jackson has bearing here. --Moonriddengirl (talk) 14:22, 10 November 2010 (UTC)
That the car was black, that they went to a specific supermarket, that they switched to a certain other color car, these are facts. They do not belong to the person who reported them. "No matter how original the format, however, the facts themselves do not become original through association." "[N]o matter how much original authorship the work displays, the facts and ideas it exposes are free for the taking. . . . [T]he very same facts and ideas may be divorced from the context imposed by the author, and restated or reshuffled by second comers, even if the author was the first to discover the facts or to propose the ideas." --JN466 14:30, 10 November 2010 (UTC)
This begins to feel very circular to me. :/ "a subsequent compiler remains free to use the facts contained in another's publication to aid in preparing a competing work, so long as the competing work does not feature the same selection and arrangement." --Moonriddengirl (talk) 14:34, 10 November 2010 (UTC)
The judge was talking about compilations here. I could include all the phone numbers I find in the Yellow Pages, for example, to create a numerical listing of business numbers in my town, without breaking the Yellow Pages' copyright. What I cannot do, though, is print my own version of the Yellow Pages, using all the same category names (plumbers, dentists ...), and including exactly the same businesses in each category. --JN466 17:39, 10 November 2010 (UTC)
You are failing to generalize. The fact that they're discussing a compilation does not mean that the entire conversation refers only to phone books. But never mind that, I linked below to that IP attorney who speaks of news broadcasts. --Moonriddengirl (talk) 15:28, 11 November 2010 (UTC)
(ec) And I still maintain that. The numbers in an indiscriminate telephone directory as such cannot be copyrighted. But if someone issues a book "The 1000 most important telephone numbers in New York", then it will of course have copyright, including the selection of numbers ("even a directory that contains absolutely no protectible written expression, only facts, meets the constitutional minimum for copyright protection if it features an original selection or arrangement"). It's the same principle as for photographs of old art. A photograph of a painting does not usually have copyright because there is no creative act involved, only technicalities of getting the representation right. If I take a photograph of an ancient Greek statue I do get copyright.
A biographer of Albert Einstein has to make lots and lots of non-obvious decisions. How much space do we give to his love for music? Should we mention his violin teachers? All of them? Should we sketch their backgrounds? If the archives contain a composition of his, should that be mentioned? Analysed? How should one characterise him? As a physicist? A mathematician? A scientist? A patent clerk? How about nationality and ethnicity? (German, Jewish, American, Swabian, ...?) There are literally millions of little decisions to be made.
If you get ten years and a huge archive full of all information about Einstein with the exception of all biographies of him, then I guarantee you that the biography you will write will look nothing like Britannica's. It will be a very rare paragraph that corresponds in content to a Britannica paragraph. A tiny fraction of your sentences will be similar to Britannica's, but nowhere near 10%, and there won't be entire paragraphs that look as if you had just rephrased each Britannica sentence separately. You will write a very different biography, on all scales: Different choice and relative weight of topics, different choice of organisation, different choice of illustrative details, and often a very different, obviously independent choice of words.
If, by contrast, you take all the facts in the Britannica article and reorganise them according to new criteria, creating a completely new structure and expressing everything in your own words, then you are basically on the safe side. But that's in part because you can't do that without changing the facts and their weight as well. Maybe you will introduce errors, maybe you will hint at aspects of Einstein's life that were missing in Britannica. But not even the bare facts will be the same. Hans Adler 14:48, 10 November 2010 (UTC)
I don't have time to pull out any books right now, but let me drop a link to a random IP lawyer's website, where he attempts to explain: [16]. Note particularly paragraph 3. You can have the facts, but if you follow the "unique expression and arrangement of facts" (to borrow his terms) you risk infringing copyright. The best solution here, as I noted at one of the other discussions on this matter, is to use multiple sources that focus on different facts and in different ways. That helps prevent your relying too heavily on the "unique expression and arrangement of facts" of one. --Moonriddengirl (talk) 15:00, 10 November 2010 (UTC)
Facts are not copyrightable, but note what your random IP lawyer also says: "News broadcasting, even when major media outlets lead with the same story and the same basic facts, is copyrightable as a unique expression and arrangement of facts, if the broadcast is simultaneously recorded." This shows how low the bar for originality is. Hans Adler 19:49, 10 November 2010 (UTC)
We don't always have the luxury of several sources to cover the same detail. The important point is that the facts are free to take, that this is not affected by their having been selected for inclusion in a source, and that their unique expression and arrangement must not be reproduced. If there is only a single source containing the detail in Hans's example, "The gangsters entered their black Volvo and raced with it to a nearby Sainsbury's supermarket car park, where they switched to a dark blue Mercedes", Wikipedians are free to report all of these facts, e.g. like so: "The perpetrators left the scene in a black Volvo, and then exchanged their vehicle for a dark blue Mercedes at a Sainsbury's supermarket nearby."
We cannot institute a principle whereby Wikipedians are prevented from reporting facts only available in a single source. --JN466 17:53, 10 November 2010 (UTC)
We don't have such a principle. Wikipedians are welcome to report facts only available in a single source so long as they don't follow the "unique expression and arrangement of facts" in that source. --Moonriddengirl (talk) 18:22, 10 November 2010 (UTC)
You left out the "car park", but it's not a serious omission because everybody would have guessed that anyway. That's precisely the kind of thing we should do, both to be on the safe side and for encyclopedic brevity. Some editors try to mention every little detail that happens to be in the source, but some of them are just accidents of the way something was expressed. In fact, the source might have started with your sentence and rephrased it as mine. In that case it's possible that the dark blue Mercedes was parked in front of an inner city Sainsbury's rather than in the car park, but the author of our source didn't know this and introduced a mistake when he paraphrased someone else. That's why it's always a good idea to leave out the more trivial details when we only have single source. Hans Adler 19:46, 10 November 2010 (UTC)

I was asked to comment here, but I can't see which parts of the discussion refer directly to what's in the essay. Regarding the "unique expression and arrangement of facts" prohibition: how much text would have to be involved before it became a problem? SlimVirgin talk|contribs 15:04, 11 November 2010 (UTC)

As with all copyright matters, there's no hard line, I'm afraid. Generally, the more creative the content and the more extensive the borrowing (in proportion to the source and the article which uses it), the more likely a problem is to exist. I'm not quite sure what Jayen's sudden panic is over this point; I have yet to see any confusion over it at WP:CP. The text that's alarming him has been in this essay since February 2009. --Moonriddengirl (talk) 15:24, 11 November 2010 (UTC)
Moonriddengirl, my sudden panic is related to the incident involving a prominent arbitrator who has recently resigned and exercised his right to vanish. I found the whole thing a very depressing affair, and very bad for project morale. His edits were clearly problematic and not best practice, but in the opinion of some, including Jimbo, they have also been mischaracterised and been described in overly harsh terms. I am afraid we will have people gunning for prominent editors on the basis of close paraphrasing and copyvio. Some of these criticisms will be perfectly valid, but in such a situation it is all the more important that we are very clear in our guidelines and policies what exactly is a problem, and what is not.
Now, we have this passage: If a non-free copyrighted source is being used, it is recommended to use original language and direct quotations, to clearly separate source material from original material; nevertheless, limited close paraphrasing may be acceptable under fair use in some cases.
It is also permitted when there are only a limited number of ways to say the same thing. In general, sentences like "Dr. John Smith earned his medical degree at State University" can be rephrased "John Smith earned his M.D. at State University" without copyright problems. Note, however, that closely paraphrasing extensively from a non-free source may be a copyright problem, even if it is difficult to find different means of expression. In Feist Publications v. Rural Telephone Service, the United States Supreme Court noted that factual compilations of information may be protected with respect to "selection and arrangement, so long as they are made independently by the compiler and entail a minimal degree of creativity," as "[t]he compilation author typically chooses which facts to include, in what order to place them, and how to arrange the collected data so that they may be used effectively by readers"; the Court also indicated that "originality is not a stringent standard; it does not require that facts be presented in an innovative or surprising way" and that "[t]he vast majority of works make the grade quite easily, as they possess some creative spark, 'no matter how crude, humble or obvious' it might be."[1]
I am concerned about the highlighted sentence, and the way some of the rationale that follows, from the Rural v Feist, has been used. If an editor takes facts from a single source, facts that use technical terms that are irreplaceable, they may well open themselves up to charges of close paraphrasing and copyvio, even if they have done their human best to avoid any duplication of creative expression and structure.
I should be very happy if my panic turns out to be unfounded. We do have plagiarism and copyvio problems, and I appreciate all the work you do to address them, but there is also a potential for damaging the project here. I just want to make sure that we protect editors' legitimate editing practices as much as possible to avoid false accusations that cause upset in the community, while detracting from the very real and very considerable work that needs to be done on the plagiarism and copyvios that we do have. --JN466 15:59, 11 November 2010 (UTC)
That arbitrator, whom I quite liked, violated our copyright policy. His problem was not related to the use of facts from sources, but copying content from sources. See [17]. Given your new understanding of close paraphrasing, I trust you'll see the problem here. Content like "On September 28, 1997, PhraMaha Taweepong Tawiwongso, PhraMaha Putthachak Buddhisaro, and PhraMaha Saman Methawee were invited to a local celebration of the "Sarth Thai" Day and to visit the place of interest. All monks agreed that was a perfect place for a monks' residence and a meditation center" ([18]) is not a bald statement of facts, but quite clearly creative expression. He had, unfortunately, been misunderstanding our copyright policy for years. See [19], compared to [20]. This was no false accusation; the only problem here is that people kept calling it "plagiarism" instead of identifying it for what it really was. That he left the project was his choice. We have had many contributors who have failed to understand copyright policy and have even gone through WP:CCI but have continued contributing constructively.

The sentence "Note, however, that closely paraphrasing extensively from a non-free source may be a copyright problem, even if it is difficult to find different means of expression" is true. The other material is there to help explain why it's true. It's there because the problem I do see on Wikipedia quite routinely is people asserting that since information is free, they can closely follow non-fiction sources with impunity. It's a pain in the neck to rewrite content in your own words sometimes, but our desire to share information doesn't free us from having to do it. --Moonriddengirl (talk) 16:12, 11 November 2010 (UTC)
I think we all know plagiarism when we see it. Even though hard cases are discussed on talk pages, as a matter of practice we do know instinctively when an edit looks deceptive, and that's what plagiarism boils down to, whether intended to be deceptive or not.
So the trick when writing guidelines is to find words that capture that common sense instinct. If we get too detailed and legalistic, either people will ignore the guideline, or they'll take it seriously and leave.
As for copyright issues, I'm unsure when they would kick in, but surely not until a substantial amount has been copied (unlike plagiarism which can involve just a few words). Or am I wrong about that? SlimVirgin talk|contribs 16:37, 11 November 2010 (UTC)
This isn't a guideline; it's an essay. In spite of what Jayen said when he approached you at your talk page, there has been no movement towards making this page a guideline since February 2009, a movement that was withdrawn by the author. Beyond that, it depends what you mean by "substantial amount". In terms of copyright infringement, "substantial" is based on a number of factors, including the length of the original source, the length of the item making use of the content and the centrality of the content to either. If the use is not proportionally substantial or central to either, it's more likely to be dismissed as de minimis. If it is, mitigating factors are considered before a determination of whether the use rises to the level of copyright infringement. --Moonriddengirl (talk) 16:56, 11 November 2010 (UTC)
Do you have a real example from WP of text that might be a copyright violation in a counter-intuitive way? SlimVirgin talk|contribs 18:23, 11 November 2010 (UTC)
Not without knowing what's intuitive to you. :) That's kind of the problem. We could take the example already linked above, for an example. That's a clear issue to me. It may be a clear issue to you. Presumably, it was not a clear issue to the contributor who placed it here. --Moonriddengirl (talk) 18:31, 11 November 2010 (UTC)
(edit conflict) The example you link to is a problem, because it extensively copies the source's diction for no good reason. It is quite similar to this example from the world of journalism, which caused a respected journalist to lose his job. (I am actually in favour of making WP:Close paraphrasing a guideline eventually, and from what you wrote yesterday, I thought you were desirous of that too.) --JN466 19:07, 11 November 2010 (UTC)
No, I was just noting that if it ever does become a guideline, this would be the place to elaborate on it. It's useful enough to me as an essay; I refer to it when I run into people who produce content like the above. It is useful in giving them guidance on how to rewrite. The underlying idea--that you can't follow sources too closely for copyright reasons--is already covered in policy at WP:C. --Moonriddengirl (talk) 19:12, 11 November 2010 (UTC)

Oh, here's an example. See Talk:Eden Gardens. That article was marked as a copyvio of two different sources by an IP in March 2009. Contributors had removed the notice before the listing came due. Note the impassioned claims that "Did you notice that the text is basically on hard facts on the stadium. How can this be a copyright infringement? " and "I doubt if there is a Copyright Infringement. Facts are facts ... You cannot copy them, can you? I just don't see the need to list this page for copyright infringement." I'm comparing the most egregious examples of the text in question to one of the sources after the supposed cleanup:

The present Eden Gardens Cricket Club came in existence some time in the year 1864.... The first ever first-class match to be played here was in 1917-18, while the first Test match was against Douglas Jardine's MCC in January 1934.

The source

The present Eden Gardens Cricket Club came in existence some time in the year 1864....The first ever first-class match to be played here was in 1917-18, while the first Test match was against Douglas Jardine's MCC in January 1934.

To us, it may be obvious that such content is not handled properly according to WP:C and WP:NFC, but to people of different cultural backgrounds, this isn't at all clear. To them, this is "hard facts" and reusable exactly as is. (User:SGGH deleted and replaced the article.) --Moonriddengirl (talk) 19:06, 11 November 2010 (UTC)

I think we have to explain to people, in non-legalistic language, the difference between facts and their creative expression, and explain that while facts are not copyrightable, the threshold for creative expression is actually extremely low. --JN466 19:11, 11 November 2010 (UTC)
I'm open to improving the language, though I believe we need to keep the underlying law visible at least in footnotes. As in the above conversation, people hear things ("facts can't be copyrighted") and misunderstand what that means. I have had people tell me that non-fiction is not copyrightable for that reason. --Moonriddengirl (talk) 19:17, 11 November 2010 (UTC)
By the way, I agree that the problem with our arbitrator appears to have been one of systematic copyright violation, rather than plagiarism – he did cite his sources, and never tried to pretend that his material did not come from those sources – and that it was a disservice all round to for the issue to come to be described as one of plagiarism rather than copyright violation in community discussions. --JN466 19:28, 11 November 2010 (UTC)
The above is an example of plagiarism, even with a source cited in the footnote; it's just not the most deceptive kind. I'm wondering what makes it also a copyright violation, i.e. not covered by fair use. SlimVirgin talk|contribs 20:24, 11 November 2010 (UTC)
Use of creative expression and creative structure? --JN466 21:35, 11 November 2010 (UTC)
It's a copyright violation: a violation of our copyright policy. We have to rewrite creative content. Only a court of law can determine if it is a copyright violation--a legal offense. --Moonriddengirl (talk) 21:42, 11 November 2010 (UTC)
Okay, so it's not a copyright violation in any meaningful sense, and probably not in the legal sense either given how few words are involved. That's why I'm confused as to why copyright issues are being raised in this context at all. Only a court can decide whether something is a violation; only copyright lawyers can decide whether it's likely to be one in any given jurisdiction. Our job is only to stop the most obvious examples (someone posting huge screeds of material from a book, an entire newspaper article, or an entire poem, for example). Otherwise our main concern is the moral one, plagiarism. SlimVirgin talk|contribs 21:48, 11 November 2010 (UTC)
Wikipedia:Plagiarism is guideline; Wikipedia:Copyrights and Wikipedia:Copyright violations are policy. Our job is to make sure that content complies with policy. (Your "most obvious examples" is far at variance with Wikipedia's treatment of non-free content.) --Moonriddengirl (talk) 21:52, 11 November 2010 (UTC)
But this is not the copyright violation policy. This is an essay about close paraphrasing, which is closely related to plagiarism, and most of the time will only be distantly related to copyright. I didn't understand your point about my most obvious examples.
What can't be allowed to develop is paranoia about plagiarism dressed up as copyright violation concerns. We don't want to end up with an approach to text that mirrors our approach to images, with Wikipedians encouraged to run around trying to catch editors out. So please keep the two issues separate—plagiarism, a moral and editorial concern that we can rightly deal with; and copyright, a legal concern that most Wikipedians know very little about, and which doesn't matter at all for the purposes of this essay. The reason it doesn't matter is if you help people not to plagiarize (broadly construed) when paraphrasing, you're ipso facto helping them not to violate copyright. SlimVirgin talk|contribs 22:04, 11 November 2010 (UTC)
This essay is related to copyright just as much as it is to plagiarism; close paraphrasing is also a copyright concern. A close paraphrase plagiarism issue on Wikipedia can be dealt with simply by attributing; attribution does not efface copyright infringement. --Moonriddengirl (talk) 22:06, 11 November 2010 (UTC)
Close paraphrasing is usually not going to be a copyright concern, because of the small amount of material we tend to use, and the extent to which it is a copyright concern on any given occasion can only be judged by lawyers. So there's honestly no point in us getting bogged down by it here, and no need either. Plagiarism we can judge more easily. SlimVirgin talk|contribs 01:59, 12 November 2010 (UTC)
Close paraphrasing is frequently a copyright concern. It shows up at CP all the time. Copyright may not be your concern, but it's a serious concern. --Moonriddengirl (talk) 02:05, 12 November 2010 (UTC)
Can you give me a real example of where close paraphrasing on WP was confirmed as a copyvio, but not plagiarism, and where it wasn't an obvious case (e.g. more or less copying a whole article)? SlimVirgin talk|contribs 02:08, 12 November 2010 (UTC)
Go check out the listings at WP:CP. They're all archived. --Moonriddengirl (talk) 02:11, 12 November 2010 (UTC)
I wouldn't know what to look for. As this is something you're involved in, I'd appreciate seeing a real example, so I know what kind of actual practices you have in mind. SlimVirgin talk|contribs 02:16, 12 November 2010 (UTC)
I've already given you a real example; it's in the link higher on the page...the one where some other website did the work for us. If you want more, you can go to CP and look for the words "close paraphrase." For example, there are two tagged here, although they haven't come current yet so I have no idea if there's any merit to the tags. Frequently they don't tell you the source, which does make it a bit challenging. Cleaning them up involves (a) finding the source (if it hasn't been closely identified); (b) figuring out if there is close paraphrasing; (c) figuring out if the close paraphrasing is substantial enough to require complete rewriting or simpler cleanup. Oh, yeah, and determining if the text is copyrighted. --Moonriddengirl (talk) 02:26, 12 November 2010 (UTC)
Sorry, I'm not making myself clear. The example you linked to was plagiarism. It was also a copyvio, but that's a separate issue. What I'm looking for is a real example from WP of close paraphrasing that was a confirmed copyvio, but was not also an example of plagiarism, i.e. something that could only be dealt with by looking at the legal aspect, and not the editorial. SlimVirgin talk|contribs 02:34, 12 November 2010 (UTC)
Oh, I see. I don't evaluate for plagiarism. However, the resolution for plagiarism is attribution. If I put a template on the page that said, "Content is copied from this source", that would eliminate the problem per plagiarism. Doing so would not eliminate the copyvio. The only examples that I could come up with that would be copyvio but not also plagiarism would be when content is copied from websites that are incompatibly licensed: such as when people properly attribute content copied or closely paraphrased from non-commercial sources that we cannot retain. However, copyright concerns trump plagiarism concerns; it is a separate issue--a much more urgent one. --Moonriddengirl (talk) 02:47, 12 November 2010 (UTC)
"Wikipedians encouraged to run around trying to catch editors out". That's happening already. Used as an ad-hominem argument in an AfD. --JN466 02:21, 12 November 2010 (UTC)
Just to add some relevant info, Wikipedia:Plot-only_description_of_fictional_works#Avoiding_violating_copyright describes the case Twin Peaks Productions vs. Publications International in which the source material (a television series) was summarized in detail by a book. This is not only paraphrasing but a change in medium. Nevertheless it was found to be copyright infringement and ineligible for fair use. Dcoetzee 22:26, 11 November 2010 (UTC)

Why plagiarism is relevant to copyvios

In response to SlimVirgin further up (because I missed it when that dialogue happened): Moonriddengirl's Eden Gardens example is a good example why the plagiarism question is actually relevant to the copyvio question. The example is a copyvio in the legal sense for the following reasons:

  • The wording of two sentences is absolutely identical with that in the source, and it is very unlikely that someone has come up independently with these two sentences. (Moreover, our page history proves that that's not how the two sentences arose.)
  • There was no licence to use material from the source. This leaves fair use as the only way we could legally use such material.
  • See wikisource:Convention for the Protection of Literary and Artistic Works/Articles 1 to 21#Article 10 for the restrictions on fair use: "permissible to make quotations from a work [...], provided that their making is compatible with fair practice, and their extent does not exceed that justified by the purpose". Now this example can't be fair use for the following reasons:
    • It's not a quotation. It's incorporating text as if it was our own.
    • Whatever "fair practice" means, plagiarism is extremely unlikely to fall under the definition. The fair use exception is clearly not intended to allow plagiarism.
    • The extent of copying "justified by the purpose" of plagiarism is nothing at all.

Hans Adler 08:45, 16 November 2010 (UTC)

And my take on why, from all the lessons I've learned from watching:
  • If you can frame it as a copyvio, then do so. It is a much simpler issue (surprisingly enough), gets dealt with through a much speedier process, and editors are better able to accept the concerns. I rate it closer to getting a speeding ticket in a school zone, "yes officer I was driving too fast". Contrast to:
  • Don't lead out with a suggestion of plagiarism. You are suggesting a moral crime akin to "you almost killed my child when you drove past", which almost inevitably prompts a response of "the hell I did and anyway if I did I just won't drive my car anymore you stupid cow, are you happy now?".
That's very much a qualitative assessment but I've seen this happen one or two times now, we identified early on at [WP:Plagiarism] that copyvio needed to be dealt with separate from plagio and copyvio had to come first. Vast amounts of drama could be avoided if people would prioritize the issues. Sorry if this is not relevant to this thread, I was twigged by the heading to say something I've been thinking of for a while. Franamax (talk) 13:00, 16 November 2010 (UTC)
Very well put, and I agree. :) --Moonriddengirl (talk) 13:03, 16 November 2010 (UTC)
I also agree, actually. I meant this as a late contribution to a technical discussion further up, but I see why you responded the way you certainly convinced me to keep discussions on the level of copyright in the future. Hans Adler 13:25, 16 November 2010 (UTC)

Attribution

The essay contains the sentence, "The sources should also, of course, be attributed", where "attributed" links to Wikipedia:Attribution, a policy/guideline that was never adopted.

Recently, I have seen it argued that attributing text to a source is different from citing a source: attribution, so the argument goes, is mentioning the source writer in-text, while citing is just adding a footnote. I presume we mean the latter, and would suggest putting "cited" instead of "attributed", wikilinked to WP:Citing sources. --JN466 23:34, 11 November 2010 (UTC)

Yes that was the intention. Linked the wrong thing. Dcoetzee 01:09, 12 November 2010 (UTC)

Recent lead re-write

Hi. I'm afraid that the alterations here shifted the focus of this essay too much into plagiarism and too far away from copyright. We delete articles that are closely paraphrased from Wikipdia every day; people need to understand that. The original paragraph makes that clearer, I think:

Copyright law (see Wikipedia:Copyrights) forbids Wikipedia contributors from copying information directly from other sources except in limited cases and with attribution. Close paraphrasing unsuccessfully attempts to address these restrictions by copying and superficially modifying information from another source.

The altered paragraph, removing the second sentence, implies that copy-paste is the end.

Also, this is likely to be confusing:

Judicious quoting (with or without quotation marks) or closely paraphrasing source material is appropriate—so long as it is limited and does not breach copyright—if accompanied by in-text attribution that makes clear whose words or ideas are being used (e.g. "John Smith wrote that ...").

Policy requires that direct quotations, in which non-free content is used verbatim, be explicitly marked with quotation marks or some other form of noting quotations. You cannot introduce copyrighted content by saying:

Montaigne wrote that all motion discovers us: the very same soul of Caesar, that made itself so conspicuous in marshaling and commanding the battle of Pharsalia, was also seen as solicitous and busy in the softer affairs of love and leisure.[21]

If Montaigne were not public domain, this would violate our non-free content policy.

Certainly, though, the content should be clarified. I'll see if I can take a stab at doing so without the issues that concern me. --Moonriddengirl (talk) 10:45, 1 June 2011 (UTC)

Draft of Close Paraphrasing changes

I copied this essay into my workspace and took a stab at drafting (here) some changes from a "newbie's" - like me - perspective, with the intention to:

  • help the reader understand what is the "right way" to approach paraphrasing source information
  • minor edits to format to make it a little easier to scan the article
  • provide a little information about the "see also" links to help understand the terms or what type of information might be found at the link
  • under the assumption that the people that most need to understand the language, terms and concepts are early contributors (who can get confused just by the terms, which then makes it hard to understand the concepts)

Thankfully all the information that was needed was actually in the article, external links or see also links - it was just a matter of grouping it hopefully there would be less digging required to get the salient points!

I'm hoping for feedback on the usefullness of the added information:

  • Why is that important? section
  • How to write acceptable content? section
  • An additional example
  • See also information expanded a bit and put into a table

Questions:

  • I formated some of the "Wikipedia:" links for readability, but wonder is that against style guidelines for essays?
  • Some of the information for the "How to write acceptable content" section I first gathered from the external links, but I also saw it in the "Signpost dispatch" on plagiarism (GREAT page). Should the content be cited?
  • Any thoughts about additions, changes that would made this more useful for the reader?--CaroleHenson (talk) 02:16, 30 June 2011 (UTC)
  Done edits to the article were made on a user page and copied into the existing article. If interested, discussion about those changes can be found at User talk:CaroleHenson/Edited version of Close Paraphrasing.--CaroleHenson (talk) 21:03, 4 July 2011 (UTC)

RfC for the explicit auditing of DYKs for compliance with CP policy

An RfC has been launched to measure community support for requiring the explicit checking of DYK nominations for compliance with basic WP policies—including Close paraphrasing policy—and to improve the management of the nominations page through the introduction of a time-limit after which a nomination that does not meet requirements is archived. Tony (talk) 04:15, 23 July 2011 (UTC)

Sorry to repeat, but I think we should not make a habit of calling close paraphrasing "CP". I've seen people elsewhere on Wikipedia using "CP" to mean "child pornography", and if we have people miscellaneously mentioning that someone broke the "CP" policy eventually we'll have some really embarrassing misunderstanding. The more-or-less euphemism "close paraphrasing" itself is uncommon, and when I saw the abbreviation "CP" I really had no idea what it stood for here until looking at the text. Wnt (talk) 16:10, 31 July 2011 (UTC)
I usually take CP to mean "copyright problem", as that's the Wikipedia nickname: WP:CP. --Moonriddengirl (talk) 12:49, 1 August 2011 (UTC)

Too strict a standard

You say it is unacceptable to say

"They insisted they would refuse to leave until they had met with Carson"

When the original says

"The employees say they will not be leaving until they meet with Mr. Carson"

Because "the sentence structure is the same".

Now in the context of other more egregious violations, maybe this is a small contributing factor; but as a standalone case (which it seems to be described as in the right hand column) I don't see it as an issue. I mean, just to write this fact in the most straightforward way "in my own words", namely: "The employees said they wouldn't leave until they met with Mr. Carson" is at least as bad; even if I deliberately alter the structure to go far from the original - "Carson's employees refused to leave until they'd had their meeting with him" is scarcely any 'better' than the version you give in your example. More to the point, if I were editing a section that contained that last version, it might end up being rephrased "in my own words", i.e. in a way you found unacceptable, even if I'd never read the source.

To get to the broader picture here, I think that a rewrite like this example shows a good faith effort to rewrite the text rather than just plagiarize it. I don't think that should be a Wikicrime. Legally I don't know how it would play, but I don't think there's really any copyright suit over one sentence, or over any section of text that is short enough to use as a Fair Use quote. Wnt (talk) 16:21, 31 July 2011 (UTC)

Typically a good paraphrase will condense the original text, presenting the main idea, but in fewer words. The example above shows the idea rewritten in the same number of words, when better would be something like "The employees wanted a meeting with Mr. Carson". There's really no reason to keep the rest of the sentence.
It's best to keep in mind that paraphrasing and summarizing tends to distill and shrink down material that then is worded in a different manner than the original, while still keeping the original idea. It's not always easy. Truthkeeper88 (talk) 16:30, 31 July 2011 (UTC)
But that doesn't keep the original idea. There's a huge difference between holding a sit-in and simply saying you want to meet with someone. Wnt (talk) 16:42, 31 July 2011 (UTC)
Then write "The employees decided to hold a sit-in". Truthkeeper88 (talk) 17:04, 31 July 2011 (UTC)
Yeah, but that's "original research". Just because they refused to leave doesn't mean they called it a sit-in. Maybe they stood the whole time. Maybe it wasn't a planned protest. Trying to follow the information in a single source faithfully puts the editor on a very narrow line here. Wnt (talk) 21:59, 31 July 2011 (UTC)
We have editorial judgment, and if the wiggle room is too small then I always go to direct quote that's attributed to the author. Most of the time I find enough wiggle room to reword, and that's the goal, to reword in our own words without taking the author's words and sentence structure. I don't deny it's hard, but the more it's done the easier it is, and truly for me, as I said below, perspective in terms of time helps a lot. There's a good resource here. Truthkeeper88 (talk) 22:46, 31 July 2011 (UTC)
The issue isn't just whether it's hard, but whether it's something we should demand of editors. The standard, even in the essay, is "put it in your own words". Not "put it in words you've wracked your brain to make sound as different from the author as you possibly can". If that's what you want for your essay, then say that. But I don't think it's what most of us want, and I don't think it's something we would demand. Wnt (talk) 02:24, 1 August 2011 (UTC)
Then this needs to be rewritten, and it was just rewritten by an editor who found it confusing. Perhaps it's still confusing. If text is not rewritten in your own words, whether wracked out of the brain, or however they get onto the page, then we run the risk of violating copyright. Whether we want it is irrelevant - that not doing it causes problems is the issue. This is no different than having cite a statement or provide sources - it's quite honestly easier to write without having everything cited, but we have to cite per WP:V. This isn't policy, but I think WP:Plagiarism is. I need to look at that. Truthkeeper88 (talk) 02:34, 1 August 2011 (UTC)
The question is, what's "very few changes", as the guideline puts it? I'd say 5 words the same out of 13 in this example is not "very few changes". Wnt (talk) 03:10, 1 August 2011 (UTC)
I think you've raised good points and that the guideline needs to be rewritten. There are plenty of style guides that address paraphrasing vs. close paraphrasing - I'll have a look at what I can find tomorrow. Generally style guides recommend that a paraphrase by necessity will be more concise and shorter than than the original - it's really an issue of rewriting a sentence of two vs. summarizing a few pages or more, but the concept is similar. The aim shouldn't be to plug in different words, word by word, but to distill the idea in one's own words, of that makes any sense. Truthkeeper88 (talk) 06:29, 1 August 2011 (UTC)

The problem is seldom a single sentence; it's in aggregate. What would be an acceptable paraphrase if only one sentence were involved may quickly rise to a level that risks (or potentially crosses the line) into infringement as the size of the close paraphrase increases. This is addressed under "Why is it a problem", which notes that "Close paraphrasing rises to the level of copyright infringement when taking is substantial. Depending on the context and extent of the paraphrasing, limited close paraphrase may be permitted under the doctrine of fair use; close paraphrase of a single sentence is not as much of a concern as an entire section or article." The difficulty, I guess, is in producing an example that illustrates that, when an individual sentence from the example might be too strict a standard if it were alone but not if it is clustered with others. What we need to avoid is what the courts have described (very loosely paraphrased, because I don't have time to dig it up :D) as clumsy efforts to make writing look original--writers who are not skilled in paraphrase often do this in good faith. They think that a thesaurus is all it takes to write original content, but this is obviously not so (or it would be legal for us to translate content from one language to another, since literally every word is changed). :/ --Moonriddengirl (talk) 12:48, 1 August 2011 (UTC)

I'm fairly busy today but made a quick and clumsy edit to that section because I think it implies sentence structure regarding a single sentence instead of a cluster of sentences. I have a style guides and grammar books and want to see how they address this issue, and then make an attempt to fix that section a little. I think it's important your points are emphasized MRG. Truthkeeper88 (talk) 13:19, 1 August 2011 (UTC)
Just to add to above, I've had time to look at one writer's reference book. The paraphrasing section is clear that close paraphrasing is problematic in situations where the wording is too close to the original in a string of sentences, as MRG mentions. The example the book gives is an entire paragraph, and the although the sentences have been reworded, the density of words identical or similar to the original is what is makes something closely paraphrased. The example shows the paragraph completely rewritten. I think we should do something like that here. I'm not familiar with editing guideline or policy pages - can I make an assertion without backing it up without a source? Certainly I think the single sentence example should be tweaked or removed. Also, I think the examples in the table might be a slightly misleading and am wondering if we can do without it altogether. Truthkeeper88 (talk) 22:26, 1 August 2011 (UTC)
It isn't a solitary sentence, though; it's part of the paragraph. :) It's only in that context that (imo) it would be a problem--a cumulative similarity issue. Perhaps we should clarify that? If we keep the example. I think we do need an example, though it's always been hard to come up with. People ask for examples. :/ This is the one we used to have, which was evidently a problem for people as well. --Moonriddengirl (talk) 12:58, 2 August 2011 (UTC)
I understand that it's part of a para, but the annotation in the box/ table referred only to the sentence and not the para level, which might have caused confusion. I don't mind the cat example, but if people are saying it causes confusion then I suppose we shouldn't use it. Probably the best thing would be to write our own example. Truthkeeper88 (talk) 13:10, 2 August 2011 (UTC)
Fine by me, if you want to take a shot at it. :) Alternatively, we could clarify in the annotations that it is an aggregate problem and not at a single sentence level. --Moonriddengirl (talk) 14:02, 2 August 2011 (UTC)
I would be absolutely terrible at it. It's hard to break training and I'm trained not to paraphrase closely. I'm thinking we can use a piece of PD text, maybe something 19th centurey, suitably dry, and I might give it a shot, but cannot guarantee the outcome at all. I have edited the page to indicate the sentence is problematic within the context of the paragraph, although I'm not sure it's sufficiently clear. Truthkeeper88 (talk) 14:06, 2 August 2011 (UTC)

The Goose Model

I think that more effort should be given to explain the procedural difference between "close paraphrasing" and good editing in Wikipedia. To begin with, normally Wikipedia should be an encyclopedia, not just a regurgitation of the sources, so we shouldn't often need to copy every scrap of information from a source article, and there should be multiple source articles. But there are a few cases where we want to, and in that case, both I and the "close paraphraser" might start by doing the same thing, namely, copying the article text into the edit box (though yes, this is a red flag, and there's some risk of inadvertently putting a copyright violation into the article history if you hit the wrong button, and certainly it's not something to do often). But here's the difference:

When the Close Paraphraser goes to work on that text, I think he looks on it all as a blurb he has to change. So he goes over it here and there and changes some words, until it seems "scrambled up enough", but maybe he didn't get everything, and probably it has the same linear order etc.

When I go over it, there's a big white space between the copied text and my text. In that space there lives a "virtual goose", and as I go along writing my text, I decide where I want to start, what I want next... and the "goose" proceeds to devour the matching part of the source material. Then it gets, em, deposited in my own words in the section I'm writing, and periodically I cut out the text I've done from the original. Eventually the goose turns up its nose at what's left, and I delete it all.

Now, perhaps this is too picturesque or picaresque a way to describe the desired technique, but I think when we read the real DYK examples we see that the real problem is that people are doing things the first way instead of the second. Wnt (talk) 16:41, 31 July 2011 (UTC)

Unless someone is really good at paraphrasing, and it's something that takes a fair amount of skill, the best thing to do is read the material, walk away from it and think about, and then at a later time write it down without looking at the original. Usually that results in a fairly good paraphrase. I can't paraphrase if the text is in front of me because the words stick in my mind and make their way into my writing - I have to step away from it. I also make notes, in sandboxes, capturing the ideas from the sources, which at a later time I rewrite and add to a page. All of this takes time and patience, which maybe is the biggest deterrent. Dunno. Truthkeeper88 (talk) 17:09, 31 July 2011 (UTC)
I think the best techniques may be based an awful lot on people's experience with paraphrasing and their natural gift for it. Since I work at WP:CP, I mostly see the people who didn't do it well, and I suspect an awful lot of them begin with copying the article. The challenge for them then is that they will frequently attempt to put the material into their own words sentence by sentence, which is nearly impossible to do. You wind up retaining the same structure as the original, and creativity in writing is not limited to language but also to selection and organization of content. I think most of our readers would be better served by taking notes--distilling out the facts that are essential--ideally of several sources and then blending those facts together without referring to the originals. It's always a good idea to do a final comparison of the end result with the sources, just to make sure. --Moonriddengirl (talk) 12:58, 1 August 2011 (UTC)
Just to add to this, I think some of our citations requirements are often misunderstood, with the belief that a piece of text has to be cited from one source, and the next piece from another source, and so on. Actually when I first started writing here, I thought that, couldn't tell you from where I had that impression, but it makes for very difficult writing. And I can see how it could add to the pitfalls. I think using WP:CITEBUNDLE is useful and gives even more flexibility in terms of rewriting, rephrasing sources. Truthkeeper88 (talk) 12:49, 2 August 2011 (UTC)
Agreed that sentence by sentence rewriting is not a good thing. The goose must be free to wander around the grass, eating just what it's hungry for at the moment. ;) Wnt (talk) 20:48, 2 August 2011 (UTC)

A possible example to use

To Moonriddengirl and others: here's a possible example I've created, using a paragraph from a book published in 1907. I'm not sure it works, but I've tried to re-use the language and flow of the original paragraph.

Original: (Hardie, Martin, (1907). English Coloured Books. page ix.)

The colour-plates with which this volume is illustrated have been executed with great care and skill, and are admirable examples of successful three-colour work. It is only right, however, to emphasise the fact that, while they give a faithful rendering of pictorial quality they are simply process translation, and naturally cannot reproduce the technique and texture of the originals. The illustration representative of Kate Greenaway's work has been printed from the original wood-blocks by kind permission of Messrs. Warner.

Rewrite = close paraphrasing:

The volume has been illustrated with colour-plates with great care and skill and are wonderful examples of good three-colour work. They give a faithful rendering of pictorial quality but it's important to emphasize that they are only a process translation. Naturally they cannot reproduce the technique and appearance of the the originals. Kate Greenaway's illustration has been printed with original wood-blocks by permission of the printer.

Let me know. I can try others as well and eventually we'll end up with a good example. Truthkeeper88 (talk) 15:22, 3 August 2011 (UTC)

I think that's a pretty good example. :) --Moonriddengirl (talk) 12:44, 20 August 2011 (UTC)
Hi MRG - I saw this zoom by on my watchlist and then promptly forgot about it! Sorry about that - if you think it works, I'll copy it into the page to replace the current example. Or should we use it as a second example? Truthkeeper (talk) 16:50, 23 September 2011 (UTC)

Honestly I've lost the thread here, but see this issue coming up here and there in the project. I'd like to use the example above from the 1907 book but can't remember why we decided we needed another example, nor can decide where to add it. Had we decided to replace the existing example in the box? Suggestions? Truthkeeper (talk) 13:37, 5 November 2011 (UTC)

Tools for detecting close paraphrasing and copy/paste

I could have sworn I saw a link to an online tool which scans a Wikipedia article and its references automatically, producing a diff upon detection. Is there such a thing? Seems not to be mentioned here or in major WP articles about detecting copy/paste...

There are commercial tools listed in Category:Plagiarism detectors, and free and commercial ones listed in Plagiarism detection, but man, I thought we had one... --Lexein (talk) 07:09, 20 August 2011 (UTC)

Sorry for disappearing! I don't log in as often as I used to, and I sometimes miss things on my watchlist. Are you thinking about the duplication detector? If so, it's really good for direct copying but can be misleading for close paraphrasing. :) --Moonriddengirl (talk) 12:43, 20 August 2011 (UTC)
You know, for us, it's only been a few hours. How long has it been for you? Yes, the Duplication Detector sounds good. I'd like to gather all such tools which work well, and make sure they are listed in documentation, throughout the various projects. Is it sensible to list the DD in the mainspace Plagiarism detection? --Lexein (talk) 13:09, 20 August 2011 (UTC)
LOL! I was actually referring more to missing the note above than this one. :D Yes, I think it would be sensible to include that. :) I'd probably note that it might have limitations for detecting close paraphrasing and also that the number of tandem words it checks for can be adjusted. --Moonriddengirl (talk) 13:12, 20 August 2011 (UTC)
Though the Duplication Detector exists, it's not quite as automatic as I'd hoped. The Earwig Copyvio Detector purports to be automatic, but so far it hasn't generated the results I (perhaps naively) expected. I should learn Python. --Lexein (talk) 13:21, 20 August 2011 (UTC)
It's because of a change in Yahoo's allowances. :/ All of our automatic web comparison tools are broken at the moment. I understand that User:Coren is attempting to work something out with Google. --Moonriddengirl (talk) 13:28, 20 August 2011 (UTC)
Yes, I just spotted that news in User talk:Coren. Hrm. Makes me think whingey thoughts about running scripts locally to do this. --Lexein (talk) 13:47, 20 August 2011 (UTC)
Hi Lexein. I wrote Duplication Detector and don't think it should be included in the mainspace Plagiarism detection, because it is designed to work on a single document and not at large scale like commercial/professional tools. It is also designed to serve a different, orthogonal purpose from Coren's bots: they find pages that are likely sources for copyvios, and Duplication Detector points out exactly which material in the article is copied. They work together. Extending it to also handle close paraphrasing is a good idea, but it's not clear how to do this (maybe some kind of word-level Levenshtein distance computation).
I'm aware that Coren's bots are having trouble due to Yahoo's terms of service, and I don't have an easy solution for this, but have been discussing it by e-mail with another user. Basically no search engine allows automated queries right now (even Google Search API and Yahoo's BOSS require in their terms of service that all queries are based on manual input). But frankly, I don't think they'll care if we break their TOS, as long as we pay their fees for using their service APIs. There are also ways we can decrease the number of queries required by using statistical models to estimate the number of results various queries will produce. Dcoetzee 06:33, 21 August 2011 (UTC)
Excellent. Right - I see your point about publicizing DD. I hope it all works out somehow. It's ironic that I'd get interested in autodetecting plagiarism at about the same time as the Yahoo trouble. I'm looking through the other tools in Plagiarism detection and others. --Lexein (talk) 07:38, 21 August 2011 (UTC)

Extent of the legal issues

Were the findings in Macmillan Co. v. King ever upheld in any other copyright case? I ask because our article on Macmillan states that chapters of a textbook abridged into paragraphs were still found to be infringing. That is not only far different than the definition of close paraphrasing given in this essay, but it's extremely disturbing and troubling. This essay suggests that merely the use of synonyms beyond rephrasing will get us out of legal trouble, but how can that possibly be the case when single paragraph abridgments of entire chapters have been held to infringe copyright? Can we please get an actual copyright lawyer to help figure this out? 69.171.160.37 (talk) 08:54, 9 September 2011 (UTC)

Where does this essay suggest that? That's certainly not it's intent. :/ It's supposed to be recommending that people "review information from reliable sources, extract the salient points, and use your own words, style and sentence structure to draft verbiage for an article", and if it is somewhere giving a false impression that needs work. --Moonriddengirl (talk) 10:09, 9 September 2011 (UTC)

Note that Macmillan Co. v. King says that the case was decided under now-replaced statutes. So it seems that this article mentions no applicable case law at all. In particular, no example or other evidence is provided that a small amount of close paraphrasing (say, a few sentence fragments) is a problem under copyright law. Zerotalk 02:45, 11 December 2011 (UTC)

That may be true, but we can't let the editors know ;-) Compared to doing the job properly close paraphrasing is so quick and easy that unless restrained its use would spread widely; and then it would be a problem.  —SMALLJIM  10:11, 16 December 2011 (UTC)

Verbiage

This essay uses the word "verbiage" six times, but it's not le mot juste: verbiage redirects to verbosity. Wikt:verbiage agrees ("1. Overabundance of words") and notes that its intended meaning here is secondary and US-based ("2. (US) The manner in which something is expressed in words"). Other dictionaries (including OED) agree. To me, a Brit, the word has a purely negative connotation and seeing it made me chuckle while reading the essay. Can we change it? Wiktionary suggests using "diction," "phrasing," etc.  —SMALLJIM  13:39, 25 November 2011 (UTC)

Did it myself.  —SMALLJIM  09:46, 10 December 2011 (UTC)

Request clarification in Example: close paraphrasing repaired

Hi! Could somebody help me out understand how the final paragraph does not paraphrase the related sentences from the first /second sources in the example where the close paraphrasing has been repaired?

  • In the 1930s a Works Progress Administration (WPA) project, called Federal Writers' Project... (final paragraph) and These narratives were collected in the 1930s as part of the Federal Writers' Project of the Works Progress Administration (WPA)... (first source)
  • having been one to fifty years of age when they obtained their freedom in 1865 (final paragraph) and ranged in age from one to fifty at the time of emancipation in 1865 (second source)

Many thanks in advance. --Tinpisa (talk) 14:09, 12 December 2011 (UTC)

I'm not an expert, but it may be acceptable under "When there are a limited number of ways to say the same thing" - per When is close paraphrase permitted? Or maybe the example is pointing out that by mixing two (or more) different sources, the incorporation of closer wording from each is less unacceptable. Just a guess, really.  —SMALLJIM  09:56, 16 December 2011 (UTC)
In that case, the example should be changed into something that illustrates complete removal of paraphrasing for the benefit of editors. Or, the example needs to be improved to set a high standard for editors to follow and remove any possibility of confusion in the minds of the editors. Leaving it as such is incorrect, then. Tinpisa (talk) 10:28, 16 December 2011 (UTC)
You may be right, I'm not an expert, as I said. But I do know that good paraphrasing is an art, so there won't be one right answer. How would you reword it?  —SMALLJIM  11:03, 16 December 2011 (UTC)

problematic passage

This makes no sense to me:

"Finally, close paraphrasing can also become problematic when a contributor closely paraphrases a source without understanding it; consequently, the contributor does not possess the ability to assess whether an article conforms to our policies, particularly Wikipedia:Neutral point of view, or to repair it if it does not. The result is frequently content that has a bias similar to the bias of the source."

If an editor doesn't understand a source, then a greater distance between the source and the inserted text increases the chance of error. That is, close paraphrasing is less likely to introduce errors than major paraphrasing. In the case of highly technical language, only an expert may be capable of rewording without changing the meaning. However, this seems to be a different issue from that of NPOV. The need for neutral presentation is clear, but I can't see how it is relevant to this page. The last sentence appears to be in contradiction to WP:SUBSTANTIATE and WP:V. We are supposed to report sources accurately, not to sanitize them. Zerotalk 07:54, 10 December 2011 (UTC)

I agree with you. The intent of that paragraph is not at all clear. Shall we take it out, or does it express an important concept that can be clarified by rewriting it?  —SMALLJIM  09:40, 16 December 2011 (UTC)
The intention of this passage was to say, among other things, that when a user closely paraphrases from a biased source without understanding it, they are more likely to reproduce its bias unwittingly, which is an NPOV violation. If you can think of a way to rephrase this I'm all for it. Dcoetzee 10:17, 16 December 2011 (UTC)
Well, it shouldn't say that, since it would violate policy. It is not POV to preserve the bias of a source, it is required by WP:V and WP:NOR. Zerotalk 11:04, 16 December 2011 (UTC)
You're misunderstanding my argument. NPOV policy says to attribute biased viewpoints to the speaker making them (e.g. the author of the source). Copying text may make the article appear to directly espouse the bias of the source instead. Dcoetzee 00:14, 17 December 2011 (UTC)
So it's not a problem that's actually caused by close paraphrasing, it's just that one of the circumstances under which the problem can arise is when a close paraphrase is carried out carelessly or maliciously. What about replacing that para with something like this:
"Another potential problem arises when a contributor copies or closely paraphrases a biased source either purposefully or without understanding the bias. This can make the article appear to directly espouse the bias of the source, which violates our neutral point of view policy."
Does that cover it?  —SMALLJIM  16:06, 17 December 2011 (UTC)
It avoids the misunderstanding, but I don't know why it belongs in this essay. Failure to attribute opinions correctly happens for all types of reporting of sources. Is it a particular problem for close paraphrasing, more than for non-close paraphrasing? I don't think so. Zerotalk 06:16, 18 December 2011 (UTC)
Perhaps you prefer guidance to stick to the main points, whereas Dcoetzee thinks it should cover wider possibilities? As a temporary measure I've replaced that para as above - it's clearer, at least. I'll let you two decide whether it, or something like it, should remain or not.  —SMALLJIM  10:34, 21 December 2011 (UTC)