Note that the bot's maintainer and assistants (Thing 1 and Thing 2), can go weeks without logging in to Wikipedia. The code is open source and interested parties are invited to assist with the operation and extension of the bot.

Before reporting a bug, please note: Addition of DUPLICATE_xxx= to citation templates by this bot is a feature. When there are two identical parameters in a citation template, the bot renames one to DUPLICATE_xxx=. The bot is pointing out the problem with the template. The solution is to choose one of the two parameters and remove the other one, or to convert it to an appropriate parameter.

Or, for a faster response from the maintainers, submit a pull request with appropriate code fix on GitHub, if you can write the needed code.

Convert semanticscholar links to use s2cid=Edit

new bug
Reported by
Headbomb {t · c · p · b} 15:32, 23 April 2020 (UTC)
What should happen
We can't proceed until
Feedback from maintainers

The parameter is now supported as of the last module update. @Nemo bis: and @Pintoch: since OAbot (talk · contribs) should also make use of this parameter, with |sc2id-access=free when appropriate. Headbomb {t · c · p · b} 15:27, 23 April 2020 (UTC)

Personally I vastly prefer direct links to but I know this will go ahead anyway. So sad. :( Nemo 17:20, 23 April 2020 (UTC)
Note to self API is AManWithNoPlan (talk) 00:31, 24 April 2020 (UTC)
For transparency, I'm Sebastian and I work on Semantic Scholar a free/non-profit academic search and discovery engine (additional details about us are available here). It would be great if as part of this bug fix we can bring back the logic (see code change) that was reverted ~2 months ago pending creation of the s2cid citation template parameter. This change will insert the s2cid on references to add links to Semantic Scholar for publisher licensed content (using the "is_publisher_licensed" flag in our API) on references that don't have an open access link. This should enrich the links that are available to Wikipedia users without cluttering the references and enable users to explore the referenced content on Semantic Scholar (which includes the ability to discover supplemental and extracted content such as links to code libraries, clinical trials, citations and references and more).Sebaskohl (talk) 16:56, 1 May 2020 (UTC)
@Sebaskohl: in my humble opinion it was a mistake to request the addition of |s2id= for this reason: paradoxically, having support for this parameter means that links to SemanticScholar are going to be migrated away from the title itself to a cluttered identifier list (because Citation bot does that, and I am not sure it is a good thing). Perhaps the free-to-read locks will encourage readers to click on them, but still, I would not expect the change to increase the number of clicks from Wikipedia to SemanticScholar. In terms of reader experience, I think it is wrong to show SemanticScholar ids to users, as they are not established bibliographical ids that people would be able to use elsewhere like DOIs or ISSNs. I am all in for linking to SemanticScholar, but I think this change was detrimental to this aim (sorry that I did not flag that earlier in the process). Retrospectively I think the same of the addition of |citeseerx= (the identifier has no bibliographic value - we just want the link). − Pintoch (talk) 08:00, 15 May 2020 (UTC)
Everyone has their preferred website, adding all the identifiers allows everyone to win (lose?). I honestly think |citeseerx= exists simply to get rid of those links from a place of primacy. Parameters reduce any wikipedia responsibility for linking to copyrighted works by linking to the abstract page instead of directly to the PDF. AManWithNoPlan (talk) 13:37, 15 May 2020 (UTC)

The dedicated S2CID parameter greatly simplifies maintainance reduces edit-window clutter, and combined with |s2cid-access=free is the better solution. And if it's a version of record, we can have bots [or the templates] automatically link to it from the ongoing RFC. Headbomb {t · c · p · b} 14:27, 15 May 2020 (UTC)

TODO: Convert url to s2cid and cite web to cite journal. Get doi based upon ID and add. AManWithNoPlan (talk) 20:10, 31 May 2020 (UTC)

Request to add link to Semantic Scholar s2cid when an open access link is not availableEdit

Disclosure for full transparency: I'm Sebastian and I work on Semantic Scholar, a free/non-profit academic search and discovery engine (additional details about us are available here).

Following this Citation Bot discussion and this discussion, the Semantic Scholar Corpus ID has been added to the template as a new Citation Template parameter. Since the s2cid is now available as a supported parameter in the Citation Template, I'm putting out a request to bring back the logic (see code change) that was reverted ~2 months ago pending creation of the s2cid citation template parameter. This change will insert the s2cid on references to add links to Semantic Scholar for publisher licensed content (using the "is_publisher_licensed" flag in our API) on references that don't have an open access link. This should enrich the links that are available to Wikipedia users without cluttering the references and enable users to explore the referenced content on Semantic Scholar (which includes the ability to discover supplemental and extracted content such as links to code libraries, clinical trials, citations and references and more). I'm opening this up for discussion and we are happy to explore alternative options to make this as useful as possible for the Wikipedia community. Sebaskohl (talk) 16:25, 1 May 2020 (UTC)

FYI, see Semantic Scholar entry on the Wikipedia:Reliable sources/Perennial sources list: if I understand correctly only open access Semantic Scholar references are deemed reliable in Wikipedia (because of the unclear copyright situation of the others – if that is no longer applicable this bot talk page is anyhow not the place to change the current "reliability" assessment). --Francis Schonken (talk) 16:33, 1 May 2020 (UTC)
@Sebaskohl: see also User_talk:Citation_bot#Convert_semanticscholar_links_to_use_s2cid= above. Headbomb {t · c · p · b} 16:38, 1 May 2020 (UTC)
Anyhow, opened a talk at Wikipedia talk:Reliable sources/Perennial sources#Semantic Scholar about the copyright aspect. --Francis Schonken (talk) 16:49, 1 May 2020 (UTC)
@Headbomb: Thanks! Let me add this request to User_talk:Citation_bot#Convert_semanticscholar_links_to_use_s2cid= and we can archive/close this request (I didn't want to overload the other request). @Francis Schonken: For this ask we would only be linking to publisher licensed content and not any indexed crawled content (we have licensed content from 550+ publishers and academic societies - additional details are available here). Sebaskohl (talk) 16:50, 1 May 2020 (UTC)
Again, it is not me you have to impress, and certainly not on this talk page: I suggest you let your voice heard at Wikipedia talk:Reliable sources/Perennial sources#Semantic Scholar – which may have to go elsewhere in view of earlier discussions, but there's a start about the copyright issues. --Francis Schonken (talk) 17:18, 1 May 2020 (UTC)
That page doesn't determine anything. For instance arXiv is listed as a terrible source (which is ridiculous) only because some people are not able to consult it properly, but there's ample consensus for linking it whenever possible from an existing citation. Nemo 17:44, 1 May 2020 (UTC)
There is no inherent issue of reliability with SemanticScholar. There are some concerns about copyrights in certain situations, but that's already addressed above. Headbomb {t · c · p · b} 17:50, 1 May 2020 (UTC)
If arXiv is considered always suitable for addition, even automatically, it shouldn't be. Many new arXiv preprints have not yet undergone any peer review. More to the point, for an existing peer-reviewed publication, many (I would venture to guess most) arXiv preprints have not been updated to include all the changes made during peer review, so they may remain unreliable even when the actual publication version is reliable, or may not even contain claims sourced to the reliable version. —David Eppstein (talk) 18:33, 1 May 2020 (UTC)
arXiv links are links of convenience and are clearly marked as arxiv links. Useful to have when versions of records are down, or no free access version exists. They aren't added as URLs though, so they don't display any identifier-of-record or other versions of record. Headbomb {t · c · p · b} 18:37, 1 May 2020 (UTC)

This looks neat. Maybe time to update the perennial source discussion... but imo arXiv links are always useful as links of convenience -- and their permalinked versions make fine identifiers if needed. A pity they're not added as URLs. Contra David's hypothesis above, I would venture that in fact most of them are indeed updated after review, at least in math + physics -- authors often want the fully-public version of their work to be as correct as possible. – SJ + 20:24, 1 May 2020 (UTC)

It has already been proven «that the text contents of the scientific papers generally changed very little from their pre-print to final published versions» and «The high level of DOIs indicates that authors are returning to their records to update the information after publication», «As the figure below shows, most articles on arXiv are updated 90 days prior to publication or later. On average, arXiv versions are updated two months before the date of publication.» The version of record rarely adds anything to the citation, it's just a (poor) signal of quality for those who cannot judge the content by themselves. Nemo 21:01, 1 May 2020 (UTC)
The version of record is a reliable source by Wikipedia standards. The preprint version is not. If you adhere to such fundamental disagreement with how Wikipedia defines reliability of sourcing, you should not be one of the developers of a bot that makes mass changes to Wikipedia sourcing. —David Eppstein (talk) 06:56, 2 May 2020 (UTC)

Edit-warring continuesEdit

Extended content

Immediately after the User talk:Citation bot/Archive 20#3RR topic was archived, the bot resumed edit-warring, on the same article on which the 3RR warning was issued. Suggestions? --Francis Schonken (talk) 19:11, 28 April 2020 (UTC)

I suggest that, after you have been told the previous time how to prevent the bot from changing your citations, you actually pay attention this time. —David Eppstein (talk) 19:18, 28 April 2020 (UTC)
No such advice was given. My advice to the bot operators was to not run unapproved tasks. AManWithNoPlan's comment on the edit-warring edit performed by the bot, at User talk:Citation bot/Archive 20#3RR, was that its only substantive part was "questionable" (the rest of the bot edit was purely cosmetic) . --Francis Schonken (talk) 19:25, 28 April 2020 (UTC)
Your claim that "No such advice was given" is a falsehood. From the first reply to your earlier thread: It appears that you have not looked at this bot's user page, User:Citation bot. If you had, you would have seen the section "Stopping the bot from editing" on steps you could have easily taken to preserve your dubious citation formatting preferences on the article in question.David Eppstein (talk) 20:10, 28 April 2020 (UTC)
Francis Schonken, If you think the particular page in question requires special attention you can find how to prevent the bot from editing at User:Citation_bot#Stopping_the_bot_from_editing and User:Citation_bot/use#..._the_bot_made_a_mistake?, This is what was told to you by that it was explained on the bots userpage. For the edit warring part again it is not the bot doing it by itself, which it seems you are implying. It is someone activating the bot which can be seen in the edit summary. The bot is not going against you because it is a fully automated process that actively disagrees with you. It doesn't seem to me anyone but you disagreed with the edit itself so far and to me it also seems to be explained why the jstor url was changed to the parameter. I think AManWithNoPlan' specifically referred to the fact that the jstor url was non-free, not the actual edit itself. Multiple other bot operators and administrators responded in the discussion and did not seem to notice your advice about "unapproved tasks". This is what I gathered from the previous discussion, it was archived because no action needed to be taken, so no changes were made to the code. Also issuing a 3RR for something you are involved in as a party that disagrees and also keep reverting the bot seems a little dubious, I say again the bot is not a person that is going on without discussing, it continuous because it doesn't know any better. The operators responded and did not seem to find any problems with the bots edits. Redalert2fan (talk) 20:17, 28 April 2020 (UTC)
Again, the bot should not be performing unapproved tasks. --Francis Schonken (talk) 20:21, 28 April 2020 (UTC)
Also, again, this is not about denying the bot access to certain pages. The bot is, afaik, allowed to perform approved tasks on any page. --Francis Schonken (talk) 20:24, 28 April 2020 (UTC)
So how does one stop the bot performing unapproved tasks? Other than that, please don't explain me things I already know. Tx. --Francis Schonken (talk) 20:27, 28 April 2020 (UTC)
Francis Schonken, How do you know the bot is running unapproved tasks? what are the supposed unapproved tasks? It would be hard for anyone to discuss if you won't share this.
Also I disagree with the fact that "I can't explain things you already know". According to the documentation of Template:Cite journal the edit the bot is making is in fact correct as far as I can read. There was some talk about if the jstor url in question was usable in the previous thread, but I think the operation of changing to a jstor parameter by itself is not a bug.
Information on bot issues can be found at WP:BOTISSUE and judging by what you want the section "Major malfunctions and complaints" can be helpful, but as suggested there it might be a good idea to try discuss it here first with the bot operators. Redalert2fan (talk) 20:47, 28 April 2020 (UTC)
Re. "How do you know the bot is running unapproved tasks?" – I don't, but whenever I raise the issue here, the topic is archived before the question is answered. So, is the bot running unapproved tasks? --Francis Schonken (talk) 20:52, 28 April 2020 (UTC)
That's too vague a question to be useful, meaningful, or answerable. What specific tasks do you see the bot performing that you think might be unapproved? —David Eppstein (talk) 21:26, 28 April 2020 (UTC)
The substantive part of the one I reverted, that is the one of which AManWithNoPlan said it was "questionable" (see above, see previous discussion linked to above). --Francis Schonken (talk) 05:35, 29 April 2020 (UTC)
Except it's really not. Headbomb {t · c · p · b} 14:47, 29 April 2020 (UTC)
Then maybe try to sort it with AManWithNoPlan, since you seem to be contradicting one another. --Francis Schonken (talk) 15:32, 29 April 2020 (UTC)
I'm not responsible for someone else's remarks, nor your interpretation of them. Especially since the "questionable" part was clearly put in quotes, indicating that it was dubious to call that questionable to begin with. Headbomb {t · c · p · b} 15:42, 29 April 2020 (UTC)
Whether it is questionable or "questionable" or neither: is it a task approved for the bot? --Francis Schonken (talk) 16:01, 29 April 2020 (UTC)
It is, yes, since 2008. See Wikipedia:Bots/Requests_for_approval/DOI_bot_2 for the url conversion thing (in that one, for DOIs, but there's no difference between a conversion of DOI urls to DOI parameters, and JSTOR urls to JSTOR parameters). Headbomb {t · c · p · b} 16:08, 29 April 2020 (UTC)
Thanks for explaining, I was skimming through the bot approvals (again) and couldn't find anything that would support the edit you had the bot perform.
First, in the 8th approval I found: "If something like <ref>[ JMA Miller (2000)], [the bot] would leave it alone, since the URL is not bare." Of course, in a cite template,
  • |title=Kein Bach-Autograph: Die Handschrift Brüssel, Bibliothèque Royale, II. 4093 (Fétis 2960) |url=
would be the equivalent of
  • <ref>[ Kein Bach-Autograph: Die Handschrift Brüssel, Bibliothèque Royale, II. 4093 (Fétis 2960)]
... so I'd expect the bot to leave it alone (based on what was explained as the bot's approved 8th task).
Second, regarding the 2nd approval: no, it does not seem to cover this action, not for dois nor for jstors. Even if it would for dois, that is not the same as performing such task on jstors: especially when the doi is defined in the template and directs to a jstor location, there's no use to add a jstor when the doi reads like |doi=10.2307/932505 (emphasis added), while in that case the jstor reads |jstor=932505 (emphasis added), a.k.a. doubling info – I see no added value from that information. So, the bot would need approval for that task anyhow. @Smith609: what are your thoughts on this? --Francis Schonken (talk) 16:55, 29 April 2020 (UTC)
See Wikipedia:Bots/Requests_for_approval/DOI_bot_2#Trial where this was explicitly tested and approved. Extending the task to other identifiers, like JSTOR, is a trivial extension to an already approved task, which does not require subsequent individual approval. Headbomb {t · c · p · b} 17:00, 29 April 2020 (UTC)
(edit conflict)It was approved for the {{cite doi}} template, where that action makes sense. Not for {{cite journal}} where it is better to have a clickable link for the article title. Anyhow, seems best to challenge this not-so-trivial-as-you-seem-to-think task expansion. --Francis Schonken (talk) 17:19, 29 April 2020 (UTC)
Francis Schonken, Where is it stated that it is better to have a clickable link for cite journal templates? or is that personal preference? on the cite journal documentation under identifiers it states: When an URL is equivalent to the link produced by the corresponding identifier (such as a DOI), don't add it to any URL parameter but use the appropriate identifier parameter, which is more stable and may allow to specify the access status. The url parameter or title link can then be used for its prime purpose of providing a convenience link to an open access copy (as in, at least accessible to everyone for free) which would not otherwise be obviously accessible. a JSTOR is an identifier (as is also documented) so the change is correct. Redalert2fan (talk) 17:43, 29 April 2020 (UTC)
Redalert2fan a JSTOR is an identifier That's incorrect: JSTOR does not provde identifiers, just stable URLs (actually, they offer their clients DOI registration as well, but most publishers have stopped using these because they prefer to use their own). This is unlike the DOI and HDL systems that provide namespaced unique identifiers that can be looked up in a directory and dereferenced to an URL (that may or may not change over time, and may or may not resolve to a stable JSTOR URL). These are entirely different beasts. Also, please keep in mind that the CS1/CS2 template documentation does not supercede WP:CITEVAR (citation style is a matter for local consensus on each article). Neither does local consensus here (or blanket assertion) determine what is a "trivial" difference in terms of bot approval under WP:BOTPOL: that's what we have the WP:BAG and WP:BOTREQ for. --Xover (talk) 18:21, 29 April 2020 (UTC)
@Xover: JSTOR does provides JSTOR identifiers. Headbomb {t · c · p · b} 18:57, 29 April 2020 (UTC)
@Headbomb: Citation needed. --Xover (talk) 19:27, 29 April 2020 (UTC)
Identifier: "An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique class of objects". JSTOR 103033 refers to a unique document hosted on the JSTOR repository. No different than doi:10.1093/mnras/staa248 which refers to a unique document, which is resolved via Headbomb {t · c · p · b} 19:33, 29 April 2020 (UTC)
This is what I ment with it being an identifier. As far as I can read from the template going by this the edit is correct. That WP:CITEVAR supersedes this on an article is nice to know since there clearly is a disagreement about the style that is or should be used, which should be discussed. But I am just interested in apart from personal preference the edit (changing to a jstor parameter) was okay by itself. Redalert2fan (talk) 19:47, 29 April 2020 (UTC)
See also CS1/CS2 documentation which prefers dedicated parameters to hard-coded URLs, and WP:DEADHORSE. Headbomb {t · c · p · b} 17:08, 29 April 2020 (UTC)
Only to make place for a link to a freely accessible version of the document. As it happens, free access is possible via the offered link, and as it already doubles with the doi link, no further doubling is necessary. Click on title = access to most accessible version of the article, as found by wikipedian, is a principle respected here, and no longer respected after the bot edit. I think we're done here: if you want it otherwise for that article, take it up on its talk page. --Francis Schonken (talk) 18:19, 29 April 2020 (UTC)
How can we be done when you have still not taken any steps on the article to ensure that the bot respects your formatting preferences and will no doubt come back whining here when the bot happens across the article again? —David Eppstein (talk) 18:53, 29 April 2020 (UTC)
Again, the bot is welcome but should not run unapproved tasks. As it happens the task was unapproved: it was a, yet unapproved, extrapolation of a previously approved task. Plus what Xover said above, including that CITEVAR needs to be discussed on the article's talk page. The same is true regarding excluding bots from certain pages: if you want that for this mainspace article, then discuss it on its talk page. Anyhow, content discussions regarding individual articles don't belong here. Further, if an approval of the actual task is sought this is not the place where to discuss that, so in any case, here, on this talk page we appear to be done. Unless you have to add something that is relevant here? --Francis Schonken (talk) 19:11, 29 April 2020 (UTC)
The bot has been doing this for all CS1/CS2 supported identifiers since time immemorial, from arxiv links, to SSRN links, to OCLC links, to DOI links, to Bibcode links, to JSTOR links. There is nothing special about JSTOR urls/identifiers that makes this suddenly contentious or controversial. Headbomb {t · c · p · b} 19:53, 29 April 2020 (UTC)
For scale |jstor= is used on about 65,000 articles. Whereas equivalent raw JSTOR links in CS1/2 templates are used on ~50 articles (and most of those are misused in |lay-url=). Headbomb {t · c · p · b} 20:06, 29 April 2020 (UTC)
No use soliciting me on this page for an approval of multiple extrapolations of a somewhat related task approved 12 years ago – which could likewise be seen as extrapolations of a task the bot would not be performing according to clarifications given nine years ago. This is not the place where such approval can be given, nor am I, nor likely any of the other participants here, the ones who would be handing out the final approval, if any. Look, it is nine years ago any additional task of the bot has been approved, and it seems likely these tasks have more than doubled. Time for a refresh of the approvals anyhow, I'd say, even if they were totally uncontroversial (which they're not). Also guidance may have changed in the mean while, like CITEVAR which has been rewritten and fine-tuned quite a few times in the last decade if I'm not erring: things that were OK 12 years may no longer be so now. --Francis Schonken (talk) 20:19, 29 April 2020 (UTC)
The bot edits inline with the most up-to-date template documentation. The bot does not switch citation styles (CS1 to CS2 and vice versa), convert un-templated citations to templated citations, or otherwise violates WP:CITEVAR. Headbomb {t · c · p · b} 20:34, 29 April 2020 (UTC)
That template documentation is not kept up-to-date, and anyhow has no more value than an essay in WP:CONSENSUSLEVEL context. Bot tasks only OK when giving a somewhat peculiar interpretation to such documentation would need approval anyhow. --Francis Schonken (talk) 20:55, 29 April 2020 (UTC)
Then take it to Help:CS1 and get consensus to change that documentation. Because that's longstanding, and has consensus. Headbomb {t · c · p · b} 21:25, 29 April 2020 (UTC)
Francis Schonken, The fact that CITEVAR needs to be discussed on the articles page instead does not mean the comments made her about differences in preferences of citation styles raised here can be waived away as not valid or can be ignored, you have read here that at least some people think the changes that the bot made are okay - this is apart from the fact that there is disagreement about if the bot is allowed to do so by approval. I think the change itself is okay, I would as a person make the same change. Now I won't do so because I have read here you do not think it is the preferred. Should I just change it myself to the jstor parameter because technically you posted your oppose only here? No that would not be right. but I and everyone else can copy all the exact messages same over to the talk page if that is "required". The fact that something should be discussed somewhere else does not mean whatever was posted here before CITEVAR was mentioned suddenly becomes moot. Redalert2fan (talk) 20:03, 29 April 2020 (UTC)
The bot runs unapproved tasks. You saying they are "OK" (as you did above, and now repeat as if someone else said it) means nothing on a page where the approval of these tasks can not be handed out. --Francis Schonken (talk) 20:19, 29 April 2020 (UTC)
Again, it is not. Headbomb {t · c · p · b} 20:29, 29 April 2020 (UTC)
Seems like a state of denial: anyhow everything here has been said... twice over. If the bot is not running unapproved tasks (as you seem to be contending), getting an approval for all these extrapolations would be a mere formality, no? So, why aren't you going to the place where that should be handled, which would leave me satisfied? Or is this just trying to postpone the inevitable? --Francis Schonken (talk) 20:55, 29 April 2020 (UTC)
"Getting an approval for all these extrapolations would be a mere formality" it already has approval, so there is no need to seek it again. Headbomb {t · c · p · b} 21:23, 29 April 2020 (UTC)
& continuing the state of denial... --Francis Schonken (talk) 03:43, 30 April 2020 (UTC)
Francis Schonken, Now you are saying something I didn;t say. I am not saying the bot should run unapproved tasks or that running unapproved tasks is okay. I am not asking for permission for the bot to run any tasks. I have no idea why you think this or try to change my words in to this. I do not run the bot. I am saying the change itself is okay. The change made was correct, the exact change of letters that comprises all the parameters and urls is in my opinion correct - You keep on hammering on the fact that the bot made it. I am saying you that is not part of my exact point. Redalert2fan (talk) 20:41, 29 April 2020 (UTC)
You left out the part of the disagreement about citation style under the guise that I approve unsanctioned bot editing.
There is also an disagreement about whether the task are approved our not, I cannot change anything for you about this. Redalert2fan (talk) 20:46, 29 April 2020 (UTC)
As said, the article's talk page is where to assess whether the edit (as such) is right for that page or not: please go there with your comments. That has nothing to do with the discussion we're having here. It is anyhow not up to you to say whether edits are (in absolute terms) "correct" or not. Please show a bit of modesty in your opinions: everyone's entitled to have their opinion, not everyone gets to decide what "is" correct and what isn't. --Francis Schonken (talk) 03:43, 30 April 2020 (UTC)
Francis Schonken, I guess this is your reply to my removed comment from your talk page trying to resolve our disagreement? I that case I see no further need to continue Redalert2fan (talk) 10:13, 30 April 2020 (UTC)
Unless and until you edit the article itself to tell the bot not to change certain aspects of the citation style, the article's talk page will be irrelevant. And as long as you continue to argue here while refusing to make that edit, I will continue to draw the conclusion that you prefer drama-mongering to being a useful contributor to the encyclopedia. —David Eppstein (talk) 04:25, 30 April 2020 (UTC)
Again, the bot is welcome to edit the page but should not be running unapproved tasks. Not on that page, not anywhere else. So placing a tag preventing the bot in one single article is *not* what this is about. Headbomb was now running AWB to do tasks which were recently disapproved for Citation bot. Placing a tag that prevents Citation bot is of no use against AWB, so, David Eppstein, your repeated comment is of no help at all. Further the bot is welcome on any page I know or have edited: I give a lot of attention to correct references, but that is a complex job, leading to errors every now and then, in which case I'm but too glad a bot catches errors like that and corrects them: so no, I am not going to place tags preventing a bot, and your insistence I should do so is highly unhelpful. --Francis Schonken (talk) 04:37, 30 April 2020 (UTC)
The citation bot shouldn't be used to edit war like that. Headbomb, please make sure it doesn't edit that page again. It's changing Google Books links and removing issue numbers. And some editors prefer to place jstor links in url=. SarahSV (talk) 04:44, 30 April 2020 (UTC)
SlimVirgin, The changing to of the google books url is just clean up, it doesn't change the page where you end up. Its a case of the department of the redundancy department. ? is all that is required. You can try and follow both versions of the link and you will end up at the same book. Are we on the same article about the issue numbers removal? on BWV Anh. the both is actually trying to add an issue number. Redalert2fan (talk) 10:22, 30 April 2020 (UTC)
Redalert2fan, thank you for the response. If you follow the links, it appears that Google changes them back again to, e.g. So an editor adds the link obtained from Google; a bot changes it; then Google changes it back. Why not leave out the bot step?
As for adding jstor to url=, one benefit is that you can click on it from the diff. Re: issue number, you're right, I misread the diff. The bot is adding it. SarahSV (talk) 20:11, 1 May 2020 (UTC)
SlimVirgin, Personally I don't know the exact reason, but it might save a little space on wiki? I don't think its done as a separate edit since in this case the addition of an issue number + the jstor (whether preferred or not) are done as the main part the edit, at least to my knowledge. Now I'm actually a little interested why it is actually done then. Redalert2fan (talk) 20:46, 1 May 2020 (UTC)

Other edit-warEdit

Extended content

Meanwhile Headbomb has teamed up with Citation bot (steered by himself, no less) to "win" an edit-war here (about the same type of edit, i.e. jstor conversion coupled with cosmetic edits). IMHO this is disruptive behaviour far beyond the scope of this talk page, but I want to give Headbomb yet another chance to commit to a cease and desist from now on, so that we can return to our regular editing (which is: actually improving the encyclopedia instead of programming bots to perform marginal deteriorations). WP:NOTBATTLEGROUND won by editors teaming up with the bots they steer. If you think that a useful edit on that page, then take it up at that article's talk page. --Francis Schonken (talk) 06:35, 30 April 2020 (UTC)

Or you know, you could follow the instructions given to you repeatedly by David Eppstein and others. Or convince the world that best practices are actually 'deteriorations'. I don't monitor, not do I care to monitor, which pages you decide to personally WP:OWN and impose your idiosyncratic styles on, against conventions. Headbomb {t · c · p · b} 10:47, 30 April 2020 (UTC)
Not up to you to decide whether this conversation is over. --Francis Schonken (talk) 15:53, 1 May 2020 (UTC)
You've got just about the worse case of WP:IDIDNTHEARTHAT in the world. This discussion is over, because you'll be talking with yourself. Headbomb {t · c · p · b} 15:54, 1 May 2020 (UTC)
Still, not up to you to decide. --Francis Schonken (talk) 15:56, 1 May 2020 (UTC)

Above, I pinged the bot's maintainers, not up to you to decide whether they should answer or not. --Francis Schonken (talk) 15:58, 1 May 2020 (UTC)

Further, I'm still thinking whether, and if so how, I'd respond to your first contribution to this sub-section: I won't be time-pressured on that one, tx. --Francis Schonken (talk) 16:02, 1 May 2020 (UTC)

I don't think I've ever seen citation bot being used like this. @Headbomb and Smith609: citation bot shouldn't be used to impose citation preferences repeatedly if an editor reverts. See WP:CITEVAR. If Frances prefers to place jstor links in url=, that's okay and in fact beneficial in a couple of ways. Ditto with the other factors. What can be done to stop this? SarahSV (talk) 20:41, 1 May 2020 (UTC)
SlimVirgin, On the user page it is explained how to prevent the bot from operating on a single page or per citation basis if this is needed because of a bug etc. This has been mentioned multiple times in the discussion. Now citation bot itself can't know what style is preferred per page or by a user and whether it is editing waring. It either can edit a page or can't because the bot is denied or the person that wants to activate the bot or the bot itself is blocked. It seems that Francis Schonken has so far not wanted to do this (deny the bot on the specific pages) since the parties involved disagree about if the bot is approved and should do this in the first place, and if the preferred way is also the correct way. This message is not ment as a way to speak for anyone, just to summarize the current situation. There are multiple solutions to the problem which both sides, or anyone else could do. Redalert2fan (talk) 21:00, 1 May 2020 (UTC)
SlimVirgin, It might be obvious but as you can see since no one so far has wanted to take action since they think they are right and the other is (extremely) wrong. Because of this it seems that probably no one is willing to change or do anything about it. Francis prefers x and Headbomb prefers Y, maybe you or I prefer z, what should we do now? Clearly It should be discussed in some way but so far that has lead to an apparent stalemate. Redalert2fan (talk) 21:17, 1 May 2020 (UTC)
Redalert2fan, no one should be edit warring over style preferences, per WP:CITEVAR and ArbCom. And no one should be using a bot to edit war over anything, also per ArbCom (as I recall). So we have a double whammy. Therefore, it's not about stopping the bot from editing one or two pages but about making sure it isn't used this way. I don't know anything about citation bot, so I don't know how access to it is regulated. SarahSV (talk) 22:43, 1 May 2020 (UTC)
SlimVirgin, Of course I wholly agree that an edit war should not be what is happening. I want to clearly stress to everyone that I do not support edit wars. I already got accused here of supporting unsanctioned bot editing before :(
I understood your question as about stopping the bot from accessing the pages since there was a dispute about style preference, or stopping the jstor change in general on every single page. My mistake. Your intention is to prevent the bot from being used in edit wars, regardless of who is right or what the preference is?
The way it works is anyone with an account can direct the bot to operate on a page via multiple ways explained here: User:Citation bot/use. Blocked/banned users can not use it. - this is the extent I know, since I'm not an operator. Redalert2fan (talk) 23:06, 1 May 2020 (UTC)
There is no "edit war" here. But the bot will eventually edit those pages again when someone asks to run the bot on those. If you don't want the bot to touch that page, then tell the bot to not touch it. Headbomb {t · c · p · b} 00:23, 2 May 2020 (UTC)
Headbomb, Regardless of opinion about if this is an edit war, or who does it, or who is right would it be practically possible to implement a feature for citation bot that recognizes edit wars by a single activator or even by multiple different activators vs a "reverter"?
I'm not sure any bot on Wikipedia is able to recognize that at the moment, the fact that it is "edit warring" itself of forced by anyone to do so. Redalert2fan (talk) 09:57, 2 May 2020 (UTC)
It is impossible for bots to recognize edit wars. Considering that people often cannot see them, I doubt a bot ever could. AManWithNoPlan (talk) 12:54, 2 May 2020 (UTC)
AManWithNoPlan, Yeah it would have to become like self aware, I just wanted to confirm that it would not be possible. Since the bot is only used as a tool at the moment I guess the activator in question should be careful of their edits when opposed on a page, apart from just checking for bugs like normal. Redalert2fan (talk) 13:05, 2 May 2020 (UTC)
Also, do not just keep running on a category or page over and over again. In such cases run it in tool (non-bot mode) and explain in detail why you are making the edits (perhaps split into multiple edits) within the edit description. I for one welcome our bot overlords....... AManWithNoPlan (talk) 13:28, 2 May 2020 (UTC)
It's a bot, the edit summary does the explaining. If the "reverter" wants to stop the bot from editing, there's instructions linked at the top of the page for how to do so. Headbomb {t · c · p · b} 14:31, 2 May 2020 (UTC)
(edit conflict)x2 I'd rather suggest the bot ceases edits of which the benefit is marginal and/or questionable and/or liable to being counterproductive. The question when programming a bot instruction should be: "would the resulting edits be a benefit under all circumstances with zero, or at least near to zero, false positives?" For instance, the WP:CONSENSUSLEVEL policy says that template documentation is a local consensus, not exceeding the consensus level of an essay. Any time an edit exclusively based on such base-level guidance is performed, it can easily be overturned by a local consensus established on the article's talk page. Bots should stay far away from performing edits exclusively based on local (or other base-level) consensuses. Note also that such local (or base-level) guidance can easily change (doesn't even need a wide consensus to change it): that is another reason for not letting bots engage in edits exclusively based on such local (or base-level) guidance. Policies, on the other hand, are more stable in expressing a long-term approach to certain issues: if it isn't directly in a policy, a bot can only perform a task that is expressly approved as beneficial to the encyclopedia. For which there is the BRFA process. I think the situation has gone a bit off-rail with Citation bot: too many tasks with marginal benefit, or otherwise controversial, lack of responsiveness by the bot maintainers (see number of archived issues without actually addressing the problem because it was not perceived as a "bug", while indeed, not following the WP:CONSENSUS policy is *truly* not a "bug" – it is a bigger problem), etc. "Has been going on for a long time" maybe explains why such issues have been under the radar for so long, but is not a justification for continuing them. --Francis Schonken (talk) 13:30, 2 May 2020 (UTC)
And the answer to "would the resulting edits be a benefit under all circumstances with zero, or at least near to zero, false positives?" is "yes". Having one person throw a big stink for WP:IDONTLIKEIT reasons does not change that. Headbomb {t · c · p · b} 14:31, 2 May 2020 (UTC)
Removing jstor links and converting to the parameter is not a local consensus, but a wikipedia-wide consensus. You are not supposed to link non-free copies directly, and jstor access is almost never ever a free for all. URL cleaning and removing user-specific parts is a wikipedia policy, not just a personal preference. In fact, there is a task force dealing with this. You are not supposed to link to Google Books unless the relevant parts are free, otherwise just use ISBN. That again is a wikipedia-wide consensus, not just some template documentation. Lastly, when I say "wikipedia", I mean this one, who knows about the German one. AManWithNoPlan (talk) 13:24, 4 May 2020 (UTC)
Maybe don't use expressions like "wikipedia policy" too liberally. There are rather clear distinctions:
Wikipedia policy
Afaik there are only a very few, but essential, rules about the (re-)formatting of references in Wikipedia policy. That is, afaik, in the WP:V core content policy, which mandates "inline citation" for content that is (likely to be) challenged. And that sources need to be cited "clearly and precisely (specifying page, section, or such divisions as may be appropriate)". Could you direct me to the Wikipedia policy that mandates "URL cleaning and removing user-specific parts"? Afaik such actions are only covered by Wikipedia policy if they contribute to precision and/or clarity of the reference, not in general.
Wikipedia guidelines
According to Wikipedia's policy on consensus levels, Wikipedia guidelines trump Wikipedia essays and template documentation. The most important Wikipedia guideline on (re-)formatting of references is, afaik, WP:REFERENCES, containing sections such as WP:SAYWHEREYOUGOTIT (which means, e.g., that url cleanup which would make less clear where the content was gotten is not OK) and WP:CITEVAR (basicly, respect how major content contributors format the references in an article – which, if I see some of the comments above, is easily misrepresented)
Wikipedia essays, and essay-level guidance such as template documentation
Again, according to Wikipedia's policy on consensus levels, Wikipedia essays and template documentation are trumped by Wikipedia guidelines (and of course also Wikipedia policies) any time. Essay-level guidance, including template documentation, is, per the above-mentioned policy on consensus levels, considered to be a "local consensus", which means that any talk page consensus can challenge template instructions, if there is a sound reason for it, e.g. based in Wikipedia policies or guidelines. For that reason, bots should never perform tasks based on template documentation exclusively. A minimum is an additional successful WP:BRFA, which gets the approval of the task above "local consensus" level – and which makes the task challengeable if it would not, or no longer (e.g. when guidelines change), conform to guidelines and policies.
"How long" a bot has been performing a task is about the most irrelevant consideration in that respect. As said, guidelines may change, or even, just the sensibilities regarding certain policies and guidelines, whether or not their text actually changed, may change over time. That's the same for anything in Wikipedia. For example, I have some experience with categorisation guidelines (wrote some many years ago): although the actual text of these guidelines changed very little, say over the last 10 years, categories that were perfectly viable according to these guidelines ten years ago, were recently, after a third CfD, voted down. I mean: consensus is not only a Wikipedia policy, it is also more than just following the same rules in a mechanical way for a long period of time. --Francis Schonken (talk) 04:33, 5 May 2020 (UTC)
"which means, e.g., that url cleanup which would make less clear where the content was gotten is not OK". Do you have any examples of the bot doing that? AManWithNoPlan (talk) 23:35, 5 May 2020 (UTC)
E.g. here – where the bot changed:
  • Berkahn, Jonathan (2006). Wrestling with the German Devil: Five Case Studies in Fugue After J.S. Bach (Thesis). Victoria University of Wellington.CS1 maint: ref=harv (link)
  • Berkahn, Jonathan (2006). Wrestling with the German Devil: Five Case Studies in Fugue After J.S. Bach (Thesis). Victoria University of Wellington. hdl:10063/1038.CS1 maint: ref=harv (link)
The second layout rather suggests that the source used for providing the referenced content would be behind a paywall, would require subscription, or would, e.g., have been a hard copy in a library – none of which was the case: the source is freely available for everyone who clicks the link, and that freely accessible web source was used as a base of the referenced content. (PS: compare User talk:Citation bot/Archive 20#Stop imposing hdl parameter, which was initiated as a consequence of the bot performing such edits) --Francis Schonken (talk) 10:15, 6 May 2020 (UTC)
WP:SAYWHEREYOUGOTIT also says "So long as you are confident that you read a true and accurate copy, it does not matter whether you read the material using an online service like Google Books; using preview options at a bookseller's website like Amazon; on an e-reader (except to the extent that this affects page numbering); through your library; via online paid databases of scanned publications, such as JSTOR; using reading machines; or any other method." WP:LINKSTOAVOID discourages links to places like JSTOR unless the copy is full free copy (not the same as a JSTOR link). URL shrinking in is CS1 encouraged The removal of user specific parts is a policy of wikipedia, which is one reason we remove a lot of stuff from Google Book URLs. There was an RFC that said (I cannot find it, since it was archived) you should not link to Google Books, unless the page needed is free -- But it also said that you should not remove it without some reason. AManWithNoPlan (talk) 23:50, 5 May 2020 (UTC)
Well, seems somewhat confused:
In particular, the EL guidance on preferring "free content", which is contained in, for example, the 6th and 15th item of WP:LINKSTOAVOID (direct links:WP:LINKSTOAVOID#EL6 and WP:LINKSTOAVOID#EL15) and WP:ELREG, has, in each case where such guidance is mentioned, the same explicit caveat, which reads:

This guideline does not restrict linking to websites that are being used as sources to provide content in articles.

So, unless Citation bot can discern links to "websites that are being used as sources to provide content in articles" (note that "sources" is a link to the WP:REFERENCES guideline) from external links that are not used for the purpose of referencing content, it should not modify any of these links. The presumption should be, if the bot can not make the distinction between "references for content" and other external links, that citation and cite templates are primarily used for referencing content, thus these links should not be modified. --Francis Schonken (talk) 06:17, 6 May 2020 (UTC)
As for user identifiable information, they actually block you from adding URLs with some proxies. There are other bots that are approved to do this also. AManWithNoPlan (talk) 23:56, 5 May 2020 (UTC)
Re "they actually block you from adding URLs with some proxies" – but for each of these proxies a consensus (i.e. a consensus above "local" level) has been established that they should not be used; for the bots preventing the links (in addition to what has already been excluded by blacklisting), they need a BRFA. It is not up to Citation bot's developers to de facto extend the number of "blacklisted" or otherwise excluded website links, based on "local consensus" (which then would be superseding the higher level consensus of the WP:EL caveats), and then impose that local preference by bot ... without even a successful BRFA. That would be stretching it beyond boundaries of what can be done without a suitably broad consensus. --Francis Schonken (talk) 06:17, 6 May 2020 (UTC)

"This guideline does not restrict linking to websites that are being used as sources to provide content in articles." and that's exactly why we have |jstor= and other (often non-free) identifiers. Headbomb {t · c · p · b} 10:35, 6 May 2020 (UTC)

this discussion is way beyond the scope of this bot. The bots actions are approved, and the bot obeys the exclusion rules and even allows citation parameter specific exclusion. As a person who has regularly clicked on links to only find them to be "Open your wallet" or very often with Google Books "Here's nothing" links; I do not understand how fooling people into thinking they will get something quick and easy is useful unlike that jstor link which lets you know "probably an abstract or pay up for more". AManWithNoPlan (talk) 12:29, 6 May 2020 (UTC)
Re. "The bots actions are approved,..." – as it happens, no, not all of the bot's actions are approved. Other than that, I appreciate your candour in stating your personal preferences. I have personal preferences too: they overlap largely with yours, and where not I suppose mine are closer to Wikipedia's policies and guidelines. But I don't go around Wikipedia changing to my preferences, which wouldn't even be allowed per WP:CITEVAR and other guidance.
The bot operates on a thin line between what is allowed and what isn't, and is more than once on the wrong side of that line, whether you like that appreciation or not. On the ground of the matter: for external links, in WP:EL sense, when there is a tension between accessibility and quality of external data, maybe the accessibility aspect comes first. For references, and, e.g., further reading items, discographies, lists of works (by an author), etc. it's always the quality of the content linked to that should come first, whether or not that material is very accessible. That is my stance, and that stance is completely conforming to core content policies such as WP:V, and subsidiary guidance such as WP:REFERENCES. I'd rather not catch the bot again performing tasks that run contrary to such guidance, like the bot edits that set off this subsection on this talk page: so I trust the bot's editing patterns will be modified enough to prevent that happening again. --Francis Schonken (talk) 05:01, 8 May 2020 (UTC)
They will not be, and the bot will eventually edit that page again, per consensus. Headbomb {t · c · p · b} 10:35, 8 May 2020 (UTC)
The consensus can be broadly split into three levels "Only humans should do it" (changing all citations to one specify style), "most people agrees on it, so bots can do it, but people should be able to block with a no bots flag" (what citation bot does), and "bots should even ignore the no bots flag" (anti-vandalism bots). AManWithNoPlan (talk) 11:51, 8 May 2020 (UTC)
{{cbignore}} is useful for flagging individual citations to be skipped without nobotting the whole page. -- GreenC 02:13, 14 May 2020 (UTC)

More sciencedirect URL cleanupEdit

new bug
Reported by
Headbomb {t · c · p · b} 19:05, 5 May 2020 (UTC)
What should happen
We can't proceed until
Feedback from maintainers

Remove duplicate citationsEdit

feature request
Reported by
RayScript (talk) 22:28, 11 May 2020 (UTC)
What happens
I noticed there are articles with duplicate citations. It seems like it would make sense to merge these duplicate citations (as the ReFill bot does) instead of leaving them separate. There are some cases (such as different pages of books) where it makes sense to cite one source many times. However, I can't think of a case where it is useful to have the exact same citation two times. Here's an example where number 32 and 33 were identical and could be collapsed.
We can't proceed until
Feedback from maintainers

I believe AWB does this already, but only on pages where refs are already "re-used". Headbomb {t · c · p · b} 22:45, 11 May 2020 (UTC)
I've also seen that AWB does this. But I don't have Windows. I can't find it now but I've seen a few pages with many many duplicate references and then it's easy to cleanup with search and replace but seems like it would be worth automating via a tool like this. RayScript (talk) 22:54, 11 May 2020 (UTC)
I have seen ReFill fail spectacularly on this task (very destructive). I feel that this task would be a heavy lift of testing/writing at this time. A feature I actually have wanted for years. Not sure worth the effort since other things do it. AManWithNoPlan (talk) 00:25, 12 May 2020 (UTC)
The bot actually used to do this, adding names to references based on authors and years of duplicated citations. I believe that there was a beaurocratic kickback: perhaps someone objected that a separate BRFA was required, but had not been obtained? The advantage of Citation Bot performing it is that it can detect identical references that differ in e.g. white space only. Martin (Smith609 – Talk) 15:35, 12 May 2020 (UTC)
If the bot's purpose it to help cleanup citations removing duplicates seems like would be a great fit for something that's so visibly annoying and an easy mistake for new users to make. Do you happen to remember when/where that discussion may have taken place? Before posting this I did a little searching on the archives of this talk page but didn't turn up anything directly related to removing duplicate citations. In regards to AManWithNoPlan, I'm not sure about the work required to make this change but if other tools do it but are buggy/destructive (refill) or only available on Windows (AWB) then perhaps it could be useful to look at how they other programs are doing it and implement something similar (given the licenses permit). However, I am respectful of the developers time and if this isn't something they're interested in doing that's okay. I wanted to put it out there to share that it's a feature that I would find helpful. — Preceding unsigned comment added by RayScript (talkcontribs) 16:05, 13 May 2020 (UTC)
WikiCleaner also does this semi-automatically. It detects exact duplicates and suggests they be "fixed". AWB does it automatically but needs another ref on the page already use "ref name" (where multiple refs are already re-used). Jonatan Svensson Glad (talk) 19:38, 13 May 2020 (UTC)
I think everyone is in favor of doing this. If we implement it, then we would have to get a lot of test case. Have it only run in tool mode to start. Phase 2: combine citations that have same parameters but in different order - including blank ones. Phase 3: existing refs with different names. AManWithNoPlan (talk) 11:49, 21 May 2020 (UTC)