User talk:Citation bot/Archive 17

Latest comment: 4 years ago by AManWithNoPlan in topic remove doi-broken-date if no doi
Archive 10 Archive 15 Archive 16 Archive 17 Archive 18 Archive 19 Archive 20

Caps: iJournal

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 20:01, 4 July 2019 (UTC)
What should happen
[1]
We can't proceed until
Feedback from maintainers


APIs not working: PubMed, PMID

Status
{{wontfix}} Tool Server Bug
Reported by
Eastmain (talkcontribs) 00:27, 9 June 2019 (UTC)
What happens
error messages suggesting than an API is broken on article XXXY syndrome
> Using pubmed API to retrieve publication details: 
  ! Error in PubMed search: No response from Entrez server
> Using Zotero translation server to retrieve details from URLs.
> Using Zotero translation server to retrieve details from identifiers.

> Expand individual templates by API calls

> Checking CrossRef database for doi. 
> Searching PubMed...  nothing found.
> Checking AdsAbs database
  > AdsAbs search 7087/25000:
      title:"XXXY syndrome"
> Searching PubMed... 
  ! Unable to do PMID search
  ! Unable to do PMID search nothing found.
What should happen
Citation bot should be able to connect to PubMed search and PMID search
We can't proceed until
Feedback from maintainers


I have not figured out pubmed yet. AManWithNoPlan (talk) 02:04, 15 June 2019 (UTC)
We have been black listed by pubmed. I have emailed them. AManWithNoPlan (talk) 21:41, 16 June 2019 (UTC)
A bug report has also been filed. Apparently the Wikimedia tool server which hosts many of these citations tools has been blocked from accessing the PubMed name server. Wikimedia Cloud Services has contacted the NIH with a request to lift the block. Boghog (talk) 16:54, 22 June 2019 (UTC)

Seems to be working now. Headbomb {t · c · p · b} 07:23, 24 June 2019 (UTC)

Tool server is still blocked. Boghog (talk) 12:08, 24 June 2019 (UTC)
I have a rest page and one expanded and one didn’t. It changes while running the page!!! AManWithNoPlan (talk) 13:06, 24 June 2019 (UTC)
I have seen that too. With the citation filling tool that also downloads data from PubMed and runs on the tool server, it occasionally works, but the vast majority of time, it doesn't. Boghog (talk) 14:45, 24 June 2019 (UTC)
Can't they just give the toolserver an /etc/hosts file or backup DNS to 8.8.8.8? Is it really that hard? AManWithNoPlan (talk) 18:45, 28 June 2019 (UTC)

include cite news and thesis in comma, colon, semicolon removal

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 23:24, 17 June 2019 (UTC)
What should happen
[2] [3], [4]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/1900 AManWithNoPlan (talk) 23:57, 5 July 2019 (UTC)

Fails to add volume/page (Zookeys)

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 03:25, 1 July 2019 (UTC)
What should happen
[5] (ignore the publisher thing)
We can't proceed until
Feedback from maintainers


Possibly issue/page instead of volume/page Headbomb {t · c · p · b} 03:43, 1 July 2019 (UTC)
that’s what happens when the title is massively different than crossref. AManWithNoPlan (talk) 03:59, 1 July 2019 (UTC)
I have to say I'm consistently puzzled by this logic of not adding missing information based on an already-provided DOI because of a title mismatch. I get not adding a missing DOI based on a title mismatch, but once the DOI is provided, it should be used. Headbomb {t · c · p · b} 04:09, 1 July 2019 (UTC)
because all people are imperfect and careless and a larger source of gigo than we want to deal with. I will think about perhaps if the title is a subset 🤔. AManWithNoPlan (talk) 04:16, 1 July 2019 (UTC)
This code will attempt to remove stuff after roman numerals from titles before doing the comparison. https://github.com/ms609/citation-bot/pull/1898 AManWithNoPlan (talk) 15:46, 5 July 2019 (UTC)

Remove trailing  

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 18:22, 5 July 2019 (UTC)
What should happen
[6]
We can't proceed until
Feedback from maintainers


Hardcoded or softcoded ones. Headbomb {t · c · p · b} 18:22, 5 July 2019 (UTC)

what do you mean by soft and hard? Are you referring to the html thingy and the actual auto-8 character? AManWithNoPlan (talk) 21:23, 5 July 2019 (UTC)
Both " " and " " Headbomb {t · c · p · b} 21:45, 5 July 2019 (UTC)
https://github.com/ms609/citation-bot/pull/1899 AManWithNoPlan (talk) 23:51, 5 July 2019 (UTC)

Adding URL when there's a DOI

See here where the bot adds a URL to a reference that has a doi and a pmid. I thought that in such cases a URL was not desired. --Randykitty (talk) 09:33, 6 July 2019 (UTC)

PS: just noted that a little bit lower, the bot removes a URL. I'm puzzled. --Randykitty (talk) 09:34, 6 July 2019 (UTC)
The URL points to the same place as the DOI, so it's redundant. Where it adds a url, it should be a free full version of the article. Headbomb {t · c · p · b} 10:04, 6 July 2019 (UTC)
I don't see any redundant URL added by the bot in this diff, can you be clearer? ucl.ac.uk and caltech.edu are institutional repository URLs, which are not redirected to by the DOI resolved with doi.org. Nemo 13:05, 6 July 2019 (UTC)

{{notabug}}, since FREE links are added. Links that are the same as an identifier are removed. AManWithNoPlan (talk) 17:21, 6 July 2019 (UTC)

if you find a dead link or pay link or DOI equivelent link added, then please report that. We can feed it back to the free DOI system and possibly black list it. AManWithNoPlan (talk) 17:22, 6 July 2019 (UTC)

Title = Loading

Status
{{fixed}}
Reported by
Redalert2fan (talk) 10:31, 6 July 2019 (UTC)
What happens
Title = Loading
Relevant diffs/links
[7]
We can't proceed until
Feedback from maintainers


It seems that this once was a website for the station but now it redirects to multiple spam websites. "Loading" nevertheless does not seem like a title we should accept in other cases as well. --Redalert2fan (talk) 10:31, 6 July 2019 (UTC)

Japanese titles removed while they appear to be correct

Status
{{fixed}} — also added some UTF-8 tests based upon thus to make sure that multibyte characters don’t get gooned in the future
Reported by
Redalert2fan (talk) 11:48, 6 July 2019 (UTC)
What happens
Japanese title are being removed while they appear to be correct
What should happen
title should not be removed
Relevant diffs/links
[8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19]
We can't proceed until
Feedback from maintainers


title = マートン&ゴメス大暴れ 先制3ランだダメ押し打だ title = 阪神ドラ2石崎が仮契約151キロ超えだ title = JNR/JR 25年の大アルバム title =トラ番担当記者コラム

and many more

--Redalert2fan (talk) 11:52, 6 July 2019 (UTC)

I would add to this. The bot also adds redundant similarly named |newspaper=The Japan Times Online when |publisher=The Japan Times; see this edit. The correct action here is to rename |publisher= to |newspaper= and not add |publisher=.
When |title= is primarily CJK script, in the best of all possible worlds, replace |title= with |script-title=<language code>:<title text>. Yeah, this is a best of all possible worlds thing because it isn't always easy or even possible to know what the language is. At the next release of Module:Citation/CS1, |script-title= will require a valid language code for non-Latin scripts (a limited list) so writing |script-title= without the language code will just result in a profusion of errors.
Trappist the monk (talk) 12:15, 6 July 2019 (UTC)
the utf-8 stuff is the problem, i will get the patch added ASAP. Then I will add a test to make sure this never occurs again. AManWithNoPlan (talk) 12:58, 6 July 2019 (UTC)

Caps: Drug Des Devel Ther

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 20:40, 6 July 2019 (UTC)
What should happen
https://en.wikipedia.org/w/index.php?title=Psoriasis&type=revision&diff=905097227&oldid=904126189ç
We can't proceed until
Feedback from maintainers


Caps: ASAIO J/ASAIO J./ASAIO Journal

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 22:42, 6 July 2019 (UTC)
What should happen
[20]
We can't proceed until
Feedback from maintainers


Slavic names

The bot is incorrectly capitalizing non-English journal names (as here). The correctly formatted Ekolist: revija o okolju and Acta geographica Slovenica were changed to the incorrect Ekolist: Revija O Okolju and Acta Geographica Slovenica. Doremo (talk) 05:34, 8 July 2019 (UTC)

The don't really know the rules for slovenanian, but at the very least the O in "Ekolist: Revija o Okolju", should be lowercase. Latin should be capitalized however. Headbomb {t · c · p · b} 06:00, 8 July 2019 (UTC)
{{fixed}} by adding relevant words to list of non-English word. AManWithNoPlan (talk) 17:22, 8 July 2019 (UTC)

Pubmed not available right now -- not a bot bug though

To discuss go to https://phabricator.wikimedia.org/T226088

I've seen the bot add pmid/pmc today... wonder if this is fixed, or just a hiccup. Headbomb {t · c · p · b} 19:50, 10 July 2019 (UTC)
Yup, T226088 is closed as resolved! Headbomb {t · c · p · b} 19:50, 10 July 2019 (UTC)

I pointed them to the root cause and they they {{fixed}} it. AManWithNoPlan (talk) 23:44, 10 July 2019 (UTC)

removal of url

Status
{{notabug}}
Reported by
Rayhartung (talk) 21:32, 9 July 2019 (UTC)
What happens
Citation bot removed url parameter when making other changes
Relevant diffs/links
Special:diff/883914749
We can't proceed until
Feedback from maintainers


When a url is not a free copy, then it must be removed IF there is another identifier according Wikipedia style guides (we don’t do this with google books, but we should). Also, if the url matches the doi, then it should be removed. AManWithNoPlan (talk) 21:46, 9 July 2019 (UTC)

Postscript thing in cite news

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 17:56, 10 July 2019 (UTC)
What should happen
[21]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/1915 AManWithNoPlan (talk) 17:09, 11 July 2019 (UTC)

API tweaks: Put diff | history in a fixed location

If you do a multiple bot run, you will have a list of stuff like

Written to Hoyt Vandenberg diff | history
...
Written to Hubert Winthrop Young diff | history
...
Written to Humanity and Paper Balloons diff | history


So to reviewing for diffs, you search for "diff | history" in the page, and you press Ctrl+G (in Firefox) to jump around. However, because Title in

Written to Title diff | history

isn't of fixed length, you need to spend time aligning your mouse with the diff link. Now this isn't the worse thing in the world, but if you have a list of 100 diffs, that's making a task that could take 20 seconds take 5 minutes. So instead, I suggest either of

[diff | history] Written to Title
Written [diff | history] to Title
Written to Title
[diff | history]

As better presentation that would allow for the more efficient reviewing of multiple diffs. Headbomb {t · c · p · b} 18:09, 10 July 2019 (UTC)

However, see also User talk:Citation bot/Archive_17#API: Batch run summaries below, which may be a better way of doing this. Headbomb {t · c · p · b} 19:41, 10 July 2019 (UTC)
This is easy. https://github.com/ms609/citation-bot/pull/1914 AManWithNoPlan (talk) 16:21, 11 July 2019 (UTC)

{{fixed}}

API tweaks: Redundancy elimination

Instead of

> Expanding 'Jakob Ackeret'; will commit edits.
---------------
[17:56:00] Processing page 'Jakob Ackeret' — edithistory

This could be combined in one single line

---------------
[17:56:00] Processing page 'Jakob Ackeret' — edithistory; will commit edits

Headbomb {t · c · p · b} 18:17, 10 July 2019 (UTC)

Good catch. It was already fixed in the other API interfaces. https://github.com/ms609/citation-bot/pull/1914 AManWithNoPlan (talk) 16:36, 11 July 2019 (UTC)

{{fixed}}

GIGO journal stuff?

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 18:29, 10 July 2019 (UTC)
What happens
[22], which adds|journal=The Unsolved Mystery of Kaspar Hauser. Jeffrey Moussaieff Masson Translator and Introduction. New York: The Free Press, 1996. 254 Pp.. Psychoanal Q
What should happen
[23]
We can't proceed until
Feedback from maintainers


This is possibly GIGO. Headbomb {t · c · p · b} 18:29, 10 July 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1912 Zotero works and then doesn't work. They change too often for us to support. Will add to blacklist. AManWithNoPlan (talk) 15:52, 11 July 2019 (UTC)

Caps JR for journal

Status
{{fixed}} journal titles with more than 8 UTF-8 characters get skipped now
Reported by
Redalert2fan (talk) 18:29, 10 July 2019 (UTC)
What happens
Journal= 週刊 歴史でめぐる鉄道全路線 国鉄・JR is changed to 週刊 歴史でめぐる鉄道全路線 国鉄・jr
What should happen
Caps should stay as JR
Relevant diffs/links
[24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38]
We can't proceed until
Feedback from maintainers


JR is in this case short for Japan Rail. --Redalert2fan (talk) 18:29, 10 July 2019 (UTC)

So it seemed to be fixed before for some time today, but just now I got numerous of the same changes again. --Redalert2fan (talk) 21:01, 11 July 2019 (UTC)
[39] [40] [41] [42] [43] [44] [45] [46] [47] --Redalert2fan (talk) 21:05, 11 July 2019 (UTC)

Support OL

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 18:33, 10 July 2019 (UTC)
What should happen
[48]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/1911 AManWithNoPlan (talk) 15:46, 11 July 2019 (UTC)

Ignore Template:full and Template:Deadlink in otherwise bare url refs

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 23:45, 10 July 2019 (UTC)
What should happen
[49]
We can't proceed until
Feedback from maintainers


headbomb Why deadlink? That seems like asking for trouble. Dead links often end up pointing to the wrong thing. FindArticles.com and such. AManWithNoPlan (talk) 16:10, 11 July 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1913 AManWithNoPlan (talk) 16:11, 11 July 2019 (UTC)
Please note that {{full}} is only an alias/redirect of {{Full citation needed}} (tJosve05a (c) 16:22, 11 July 2019 (UTC)
thanks, I added the full template name also. Today's lesson in irony is that {{full}} is not the full name.... AManWithNoPlan (talk) 16:28, 11 July 2019 (UTC)
Well, I was mostly thinking if you find something like https://(...)/10.1234/987654321{{deadlink}}, it could maybe be parsed as a DOI link were {{deadlink}} not there, even if the full url didn't resolve. I figured that if the link was dead, there would be nothing to be parsed and it wouldn't expand. Maybe I'm wrong there. Headbomb {t · c · p · b} 18:14, 11 July 2019 (UTC)
I will look at DOIs. Links are marked as dead often when the title has changed to “girls girls girls!!!!!” and such. AManWithNoPlan (talk) 19:46, 11 July 2019 (UTC)
Same for other identifiers if possible. It's not the most critical of things, so not toooo much thought needs to be put on this. But I figured if a link could be parsed when the if a deadlink template wasn't there, it'd be nice to have the bot do something with the link when the template was there. Headbomb {t · c · p · b} 20:57, 11 July 2019 (UTC)

GIT problems

Not sure which fixes are making it to tool server at this time. AManWithNoPlan (talk) 17:55, 11 July 2019 (UTC)

fatal: unable to look up current user in the passwd file: No such file or directory 🤔 AManWithNoPlan (talk) 19:38, 11 July 2019 (UTC)
{{fixed}}

ScienceDirect stuff

Status
{{notabug}}, some type of throttling
Reported by
Headbomb {t · c · p · b} 00:03, 10 July 2019 (UTC)
What happens
If it's doing [50]
What should happen
Why isn't it doing [51] (I had to strip those to bare urls before running the bot)
We can't proceed until
Feedback from maintainers


It looks like you got unlucky. AManWithNoPlan (talk) 00:56, 10 July 2019 (UTC)

It's happening on other articles too. There's this sequence for example, [52] + [53]. It's possibly the ?via=ihub that throws things off. Headbomb {t · c · p · b} 02:24, 10 July 2019 (UTC)
Not sure. It works sometimes.AManWithNoPlan (talk) 04:14, 10 July 2019 (UTC)
Very hit and miss. For now, I'm just running it multiple times until it finds nothing else to do. Headbomb {t · c · p · b} 04:17, 10 July 2019 (UTC)
I had just noticed it in another article where the expansion then succeeded as I entered the DOI manually. Did you submit many articles with sciencedirect.com URLs at once? Maybe we got throttled? Nemo 07:45, 10 July 2019 (UTC)
I suspect the expansion would have worked the next time because ?via=ihub was stripped from the URL in the previous bot's run. Headbomb {t · c · p · b} 07:53, 10 July 2019 (UTC)
Not sure. Right now I can't test because the tool doesn't have any spare capacity. Nemo 08:59, 10 July 2019 (UTC)
As in it's going through too many requests? I could hold on for a bit, it's at the end of ~100 article run or so. Headbomb {t · c · p · b} 09:03, 10 July 2019 (UTC)
not really sure, we are probably just getting throttled some place. AManWithNoPlan (talk) 14:18, 10 July 2019 (UTC)

Bad Title

Status
{{fixed}}
Reported by
Redalert2fan (talk) 23:53, 11 July 2019 (UTC)
What happens
title = Article expired
What should happen
title should not be added
Relevant diffs/links
[54] [55] [56] [57] [58] [59] [60] [61]
Replication instructions
run on: http://www.japantimes.co.jp/news/2016/11/19/business/jr-hokkaido-says-cant-maintain-half-railways/#.WDjkB9J96Ul
We can't proceed until
Feedback from maintainers


What happens is when a page is no longer avaiable on japantimes.co.jp you get redirected to https://www.japantimes.co.jp/article-expired/ which states: "The article you have been looking for has expired and is not longer available on our system. This is due to newswire licensing terms." and has the title "Article expired". This is not clearly not the title we are looking for. --Redalert2fan (talk) 23:53, 11 July 2019 (UTC)

Character encoding issue for author name "Fürst"

Status
{{not a bug}}
Reported by
Ich (talk) 20:38, 12 July 2019 (UTC)
What happens
"replacement character in |last4= at position 2" for article https://link.springer.com/article/10.1007%2FBF00562648
What should happen
Journal template should be filled out with "last4=Fürst"
Replication instructions
Run the citation bot on a page containing the following text: {{cite journal |doi=10.1007/BF00562648 }}
We can't proceed until
Feedback from maintainers


GIGO. Literally nothing we can do. We have complained to crossref and the publisher and they promised to fix the data someday. AManWithNoPlan (talk) 20:45, 12 July 2019 (UTC)

Bad title

"OpenId transaction in progress" diff --Redalert2fan (talk) 20:49, 12 July 2019 (UTC)

{{fixed}}

journal/publisher dupes with apostrophes

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 10:17, 12 July 2019 (UTC)
What should happen
[62]
We can't proceed until
Feedback from maintainers


"Catpostrophe"

Status
{{notabug}}
Reported by
Geographyinitiative (talk) 03:32, 14 July 2019 (UTC)
What happens
At [63] Shih’s was changed to Shih's, but the original source has Shih’s
What should happen
This part of the edit should be reverted and some method for differentiating between intentional use of ’ and ' should be made
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Hu_Shih&diff=905460950&oldid=905110158
We can't proceed until
Feedback from maintainers


This is proper behavior since Wikipedia style guides mandate non-fancy punctuation be used. AManWithNoPlan (talk) 03:39, 14 July 2019 (UTC)

@AManWithNoPlan: In a citation of a source, you can't go around changing the apostrophes as they are found in the original source. Geographyinitiative (talk) 04:02, 14 July 2019 (UTC)
Both are apostrophes, curly vs straight is a stylistic typographic change, not a semantic one. On Wikipedia, we mandate straight quotes and apostrophes. Even in quoted material. Even in citations. Headbomb {t · c · p · b} 04:29, 14 July 2019 (UTC)
@AManWithNoPlan and Headbomb: Thanks for your reply. You may be right, but if English Wikipedia outright bans all of the curly apostrophes, the readers will never get a chance to find out on their own whether or not there is a stylistic, semantic or other difference between the two types of apostrophes. Don't be so quick to assume things that you don't know for a fact. Just because they are similar doesn't mean they are the same- in fact, calling them 'similar' implies that they are 'different', otherwise we would call them 'identical'. No, I'm sorry, you can't change the name of cited sources randomly. I strong believe that you are dead, dead wrong on this one- you don't know what you are talking about in fact. Why have the two code points if there's no difference? I have to strongly rebuke you here otherwise you might not realize the error you are perpetrating on English Wikipedia. Thanks for your time. Geographyinitiative (talk) 04:51, 14 July 2019 (UTC)
This will not do, this will not do at all. [64] Let the author give you the apostrophes they want to. What is this garbage? Geographyinitiative (talk) 05:00, 14 July 2019 (UTC)
@AManWithNoPlan and Headbomb: No semantic difference, eh? Alright buddy, you look at this edit and tell me there's no semantic or stylistic difference: [65]. The authors are using curly apostrophes that curl inward from both directions. That's the author's way of writing in English. The author doesn't need your fascist hand to come down on them when someone uses this citation bot. So everything has to be simplified now- what is this, 1984? Just let the apostrophes alone. Geographyinitiative (talk) 05:03, 14 July 2019 (UTC)
Right, there is no semantic or stylistic difference there. The difference in orthography does not change the meaning or style of the word. Nothing "fascist" about these edits, and it is entirely inappropriate to refer to other editors in that way - please do not do that. And follow the consensus even if you do not agree with it, until and unless you have been able to change the existing consensus (which at the moment seems unlikely). Regards, --bonadea contributions talk 08:54, 14 July 2019 (UTC)
Some things are the way they are. Leave them alone. That's my opinion. Thanks for your work here. Geographyinitiative (talk) 05:06, 14 July 2019 (UTC)
Wait. The curly apostrophe isn't even there in the source linked in the article (this diff from the original report above) - it is a translation of the actual Chinese title. The "author" thus appears to be yourself, since you were the one to add the link with the translated title. --bonadea contributions talk 10:01, 14 July 2019 (UTC)
For reference the relevant manual of style can be found here: WP:MOSCQ. --Redalert2fan (talk) 11:09, 14 July 2019 (UTC)
There is a similar-looking character, ʻOkina (U+02BB), commonly used in Polynesian languages. That character should not be converted to typewriter apostrophe.
Trappist the monk (talk) 14:37, 14 July 2019 (UTC)

partial removal of "subscription" and "via" parameters

Looking at this diff, the bot appears to be removing the via= and subscription= parameters from citations. I personally find those useful, but I'm not too fussed about it. However, the bot has only carried out a partial removal; other citations that include the first of these have not been touched. Is this intentional? Vanamonde (Talk) 23:05, 13 July 2019 (UTC)

they get removed when the associated url is removed and they no longer serve a valid purpose. URLs that duplicate doi are removed in a accordance with style guides. AManWithNoPlan (talk) 23:23, 13 July 2019 (UTC)

{{notabug}}

nature.com down for the bot

Up to until an hour running the bot on bare URLs from nature.com (such as https://www.nature.com/articles/nature05769) worked. Now we get "Operation timed out after 10000 milliseconds with 0 bytes received". Have they blocked/blacklisted/throttled us? See for example with existing identifiers in template and bare URL (tJosve05a (c) 20:13, 14 July 2019 (UTC)

Citoid in Visual Editor does not seem to have any issues at all with this link. (tJosve05a (c) 20:13, 14 July 2019 (UTC)
{{notabug}} seems to be working now (tJosve05a (c) 21:44, 14 July 2019 (UTC)

Caps :SCR. --> Scr.

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 00:25, 15 July 2019 (UTC)
What should happen
[66]
We can't proceed until
Feedback from maintainers


Ovid / pmid redundancy

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 01:12, 12 July 2019 (UTC)
What should happen
[67]
We can't proceed until
Feedback from maintainers


Works fine in the case of a doi redundancy [68], btw. Headbomb {t · c · p · b} 01:13, 12 July 2019 (UTC)

Ovid is a pain, since they have two websites that you get to choose from, so nothing is ever redundant. I will have to write specific code. IF(doi && pmid=OvidUrl)THEN drop url. AManWithNoPlan (talk) 14:12, 12 July 2019 (UTC)
https://github.com/ms609/citation-bot/pull/1923 AManWithNoPlan (talk) 14:56, 15 July 2019 (UTC)

Better handling of edit conflicts

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 01:25, 12 July 2019 (UTC)
What happens
When the bot runs into an edit conflict, it stops
What should happen
The bot should automatically restart
We can't proceed until
Feedback from maintainers


This would be particularly helpful for batch runs. Headbomb {t · c · p · b} 01:25, 12 July 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1924 AManWithNoPlan (talk) 14:54, 15 July 2019 (UTC)

Title: Download Limited Exceeded (CiteSeerX)

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 20:17, 15 July 2019 (UTC)
What happens
[69]
What should happen
[70]
We can't proceed until
Feedback from maintainers


Bad title

Status
{{fixed}}
Reported by
Redalert2fan (talk) 20:26, 15 July 2019 (UTC)
What happens
title = 【お知らせ】Url(アドレス)が変わりました。|小田急電鉄 was added. This translates to: 【Notice】 Url (address) has changed. Odakyu Electric Railway. The page linked to is a dead link and redirects to https://www.odakyu.jp/404.html which gives: 404エラー お探しのページは見つかりませんでした。which translates to: 404 error The page you were looking for was not found.
What should happen
【お知らせ】Url(アドレス)が変わりました。|小田急電鉄 should not be added as a title
Relevant diffs/links
[71] [72] [73]
We can't proceed until
Feedback from maintainers


duplicate ieeexplore URLs

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 22:19, 15 July 2019 (UTC)
What should happen
[74], [75]
We can't proceed until
Feedback from maintainers


Looks like need to deal with https://ieeexplore.ieee.org/document/5671934?reload=true&arnumber=5671934 vs without all the extra stuff. AManWithNoPlan (talk) 22:35, 15 July 2019 (UTC)

IEEE is very annoying: some of their URLs redirect and some others don't. Their rate limits are also horrendous and their staff even boasts about how mean they are towards their users. Nemo 10:27, 16 July 2019 (UTC)
in that case https://github.com/ms609/citation-bot/pull/1928 AManWithNoPlan (talk) 15:00, 16 July 2019 (UTC)
Nemo bis I am interested in learning more about your comment about IEEE staff boasting about being mean to users. Do you know of links documenting this? —David Eppstein (talk) 19:25, 16 July 2019 (UTC)
David Eppstein, try and ask some university administrator who has received some "education" from IEEE. Nemo 19:30, 16 July 2019 (UTC)
I fail to see how copyright education is being 'hostile'. Headbomb {t · c · p · b} 21:03, 16 July 2019 (UTC)

Broken Bx-y-z Elsevier URLs

All these URLs are broken: insource:"www.sciencedirect.com" insource:/science\/article\/B....-.{7}-../. Nemo 19:07, 16 July 2019 (UTC)

{{fixed}} I think

User:Marianne Zimmerman

{{notabug}}

Cross-posted here, at User talk:Marianne Zimmerman, Wikipedia:Bots/Noticeboard, and at User talk:Smith609

This account has made tens of thousands of edits by proxy using the Citation bot. It is still ongoing while I'm writing this. The account itself has made only 11 edits so far.

It is obvious that this 'Marianne Zimmerman' account is a bot, since it is working around the clock, 24/7. The account is not labeled as such, and has not been authorized by the Bot Approvals Group. In itself not a big deal, because the account has been making only positive edits and has not caused disruption. Still, it is technically violating policy, and I'm wondering why a bot would use another bot to make bot edits. That seems rather silly. I hope the author of the 'Marianne bot' can come forward so that we can work things out. Cheers, Manifestation (talk) 12:04, 14 July 2019 (UTC)

The one thing I wonder about if the user checks their edits for possible bugs or mistakes, but seeing not a single revert of citation bot by this user makes me believe they absolutely do not. This means that it would be quite possible that actual bad edits are made... If you can run 24/7 and check your edits In the end it might technically not be a problem (policy aside), but I can tell you that I spend quite sometime checking every edit made by the bot under my request and then posting bug reports here, even on 1000 page category runs.
Further point I wonder how they run citation bot in an automatic way, they either must use a very large input of pages via the web interface ( pages separated with |) or made some sort of script that interacts with the web interface, it is clear with the edit summaries that category mode is not used. In any case basically running an unauthorized bot aside, it is possible that bad edits have and will be made, unless Marianne can kindly prove that they check the bot's edits as is requested. Redalert2fan (talk) 12:15, 14 July 2019 (UTC)
@Redalert2fan: Yeah, I think you're right. This 'Marianne Zimmerman' account must be blocked, at least for now, even though the owner seems to be acting in good faith. There is a reason why Wikipedia bots have a trial period. But more importantly, this 'Zimmerman' bot seems a bit redundant. It appears to just roam around, randomly cleaning up articles it encounters. Can't the Citation bot itself do that? Cheers, Manifestation (talk) 12:24, 14 July 2019 (UTC)
@Manifestation: Well currently citation bot does not operate by itself and is only user activated so without activation nothing will happen. "Editors who activate this bot should carefully check the results to make sure that they are as expected. While the bot does the best it can, it cannot anticipate the existing misuse of template parameters or anticipate bad/incomplete metadata from citation databases." is clearly stated on the bots userpage. Since you activate it yourself you should check the edits made. For why citation bot does not operate in automatic mode I suspect that is exactly the reason currently, Maybe the maintainers can further explain this? Since I'm not totally clear on that. Ofcourse Citation bot has long passed its trial period but you can see many pages of bug reports in the archives just because things change on the internet and the maintainers/operator can not predict every variation in templates,characters,languages etc. Which is why it is so important to check these edits. Anyone can run the bot for any reason, including random runs or just pages/categories of interest so that's not a problem as long as edits are checked in my opinion. Thanks, Redalert2fan (talk) 12:37, 14 July 2019 (UTC)
@Redalert2fan: Oh, I didn't saw that. I believed the Citation bot made relatively simple changes, so I thought it wasn't a big deal if someone makes mass-edits with it. But you're right, this may not be a good idea after all. I've reported 'Marianne Zimmerman' to WP:ANI. Thanks, Manifestation (talk) 12:58, 14 July 2019 (UTC)
it was pretty conclusively decided in the last discussion that the bot is responsible for its edits. I am not the operator. AManWithNoPlan (talk) 13:22, 14 July 2019 (UTC)

Black list?

@Smith609: would it be possible to have an onwiki page of blacklisted users for citation bot? Now that activation requires authentication, having an admin-editable page of "blacklisted" users would help in situations like this (while not requiring a full block of the activator). — xaosflux Talk 13:16, 14 July 2019 (UTC)

That would be a good idea, but it would still require us to check the edits/activations manually. Perhaps you could built in some kind of limit per user, with an internal warning being triggered if that user surpasses it. I've been scrolling through the thousands and thousands of edits of the Citation bot commissioned by the Marianne bot, and the earliest activation by the Marianne bot I could find was at 20:24, 24 June 2019. Go here, press Ctrl/Cmd + F, and search for "Marianne". Safe for a few pauses, the bot had been running non-stop for 20 days straight, with no one noticing until now. - Manifestation (talk) 16:35, 14 July 2019 (UTC)
Oh, and maybe captcha would be a good idea? - Manifestation (talk) 16:40, 14 July 2019 (UTC)
“no one noticed” — actually I noticed and I am sure many others noticed too. Just no one cared. AManWithNoPlan (talk) 18:48, 14 July 2019 (UTC)
The edits I checked looked harmless enough. And if (as in this case) a sock is really just running the bot on randomly selected articles, I don't see the problem. But we do want to block activations from blocked users, to head off more problematic behavior like stalking other users (using the bot to send the message that the user is being stalked, and by whom) or to mask bad edits (by running the bot afterwards to hide the edits from watchlists and make them harder to roll back). —David Eppstein (talk) 19:02, 14 July 2019 (UTC)
I noticed a lot came from the Marianne account, didn't really consider them harmful, although the volume is more than you'd expect from a manual activations. Blacklisting is a good feature to have, although if it needs to be deployed here, I got no real opinion on. Headbomb {t · c · p · b} 19:49, 14 July 2019 (UTC)
I, too, had sampled a number of those diffs (a few thousands, I think; an addictive game which consumed many hours of my time). I reported the issues I found, which were very few. It would be nice to have server-side runs on larger sets of "safe" articles (such as bare refs) so that the bot would become a no-op on those. Nemo 22:39, 14 July 2019 (UTC)
Alternatively, you could use https://tools.wmflabs.org/iabot as a model. It automatically blocks all on wiki blocked users and sysops here have admin privileges on the web interface. They can block users there too. Everything is permissions based, and new users simply cannot do as much as established users and still much less than admins. See https://tools.wmflabs.org/iabot/index.php?page=metainfo&wiki=enwiki for info. Not to mention plenty of other abuse counter measures in place.—CYBERPOWER (Chat) 13:47, 15 July 2019 (UTC)
blocked users are blocked by the bot already. Better controls might be in order though. AManWithNoPlan (talk) 14:15, 15 July 2019 (UTC)
{{notabug}}

Markup in linked title

Status
{{notabug}}
Reported by
Trappist the monk (talk) 14:46, 17 July 2019 (UTC)
Relevant diffs/links
This edit at Real options valuation
We can't proceed until
Feedback from maintainers


This:

[http://www.intechopen.com/books/aerospace-technologies-advancements/a-real-options-approach-to-valuing-the-risk-transfer-in-a-multi-year-procurement-contract A Real Options Approach to valuing the Risk Transfer in a Multi-Year Procurement Contract]. Arnold, Scot, and Marius Vassiliou (2010). Ch. 25 in Thawar T. Arif (ed), Aerospace Technologies Advancements. Zagreb, Croatia: INTECH. {{ISBN|978-953-7619-96-1}}
A Real Options Approach to valuing the Risk Transfer in a Multi-Year Procurement Contract. Arnold, Scot, and Marius Vassiliou (2010). Ch. 25 in Thawar T. Arif (ed), Aerospace Technologies Advancements. Zagreb, Croatia: INTECH. ISBN 978-953-7619-96-1

Becomes this:

{{cite journal|url=http://www.intechopen.com/books/aerospace-technologies-advancements/a-real-options-approach-to-valuing-the-risk-transfer-in-a-multi-year-procurement-contract|title= A Real Options Approach to valuing the Risk Transfer in a Multi-Year Procurement Contract]. Arnold, Scot, and Marius Vassiliou (2010). Ch. 25 in Thawar T. Arif (ed), Aerospace Technologies Advancements. Zagreb, Croatia: INTECH. {{ISBN|978-953-7619-96-1}}|journal= Aerospace Technologies Advancements|doi= 10.5772/7170|date= January 2010|last1= Vassiliou|first1= Marius S.|last2= Arnold|first2= Scot A.}}
Vassiliou, Marius S.; Arnold, Scot A. (January 2010). "A Real Options Approach to valuing the Risk Transfer in a Multi-Year Procurement Contract]. Arnold, Scot, and Marius Vassiliou (2010). Ch. 25 in Thawar T. Arif (ed), Aerospace Technologies Advancements. Zagreb, Croatia: INTECH. [[ISBN (identifier)|ISBN]] [[Special:BookSources/978-953-7619-96-1 |978-953-7619-96-1]]". Aerospace Technologies Advancements. doi:10.5772/7170. {{cite journal}}: URL–wikilink conflict (help); templatestyles stripmarker in |title= at position 229 (help)

Trappist the monk (talk) 14:46, 17 July 2019 (UTC)

There are other errors in that edit; see in particular @ Don Chance and @ D. Mauer.
Trappist the monk (talk) 14:51, 17 July 2019 (UTC)
These are not errors introduced by the bot, it was all manual work. (Thank you for finishing it, I was tired.) Nemo 15:20, 17 July 2019 (UTC)

better edit summary when bot deliberately removes the citation url

Status
{{fixed}}
Reported by
Rayhartung (talk) 12:29, 18 July 2019 (UTC)
What happens
when bot removed the url parameter, the edit summary was "Add: issue. Removed accessdate with no specified URL. Removed parameters. | You can use this bot yourself. Report bugs here. | User-activated." which appears to be saying that before changes, the citation did not have a url (leading to the earlier "not a bug")
What should happen
better edit summary (did not mention removal of URL)
Relevant diffs/links
Special:diff/883914749 and User talk:Citation bot/Archive 17#removal of url
We can't proceed until
Feedback from maintainers


The edit summary the bot produced was:
"Add: issue. Removed accessdate with no specified URL. Removed parameters. | You can use this bot yourself. Report bugs here. | User-activated."
Perhaps before "Removed accessdate ..." could be added something like "Removed URL that matched DOI." or "Removed nonfree URL." (or at least "Removed URL."). Rayhartung (talk) 12:29, 18 July 2019 (UTC)

Thanks for the suggestion. That edit is 5 months old, this was already fixed in March. Now the edit summary states "Removed URL that duplicated unique identifier" (example). Nemo 12:41, 18 July 2019 (UTC)

Redundant chapter-url and other url types should also be removed

Status
{{fixed}}
Reported by
Nemo 18:01, 16 July 2019 (UTC)
What happens
chapter-url not forgotten at least for Elsevier (unless it's a temporary glitch)
What should happen
special:diff/906566638
Relevant diffs/links
special:diff/906568062, special:diff/906571154
We can't proceed until
Feedback from maintainers


Ambiguous edit summary

Please see this diff: [76] where the edit summary is: "Add: date. Removed parameters." The actual change made was: publication-date=August 2018 was changed to date=August 2018. While a part of the parameter was removed no full parameters were removed and no new date was added making the summary a bit inaccurate. If possible could the summary for edits like these be changed to something that describes the specific action a more closely? --Redalert2fan (talk) 20:20, 18 July 2019 (UTC)

we walk a thin line between logging everything in horrible detail and not describing everything. We might want a “parameter name changed” at some point. AManWithNoPlan (talk) 21:10, 18 July 2019 (UTC)
Mostly {{fixed}}. Now warns users some add/dels are actually changes. AManWithNoPlan (talk) 00:51, 19 July 2019 (UTC)

Running both 2 times gives more changes

The bot ran 2 times on the following pages; diff 1 and diff 2 on this page only dates were added. diff 3 and diff 4 multiple actions were performed on this 2nd page. --Redalert2fan (talk) 20:58, 18 July 2019 (UTC)

website parsing goes through a separate server process and sometimes that times out. AManWithNoPlan (talk) 21:13, 18 July 2019 (UTC)
{{wontfix}}

Figure out missing archive date

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 09:26, 12 July 2019 (UTC)
What should happen
[77]
We can't proceed until
Feedback from maintainers


The date can easily be determined through the webarchive url. Headbomb {t · c · p · b} 09:26, 12 July 2019 (UTC)

archive-date added while archivedate is present

Status
{{fixed}}
Reported by
Redalert2fan (talk) 14:59, 19 July 2019 (UTC)
What happens
archive-date was added while archivedate is present
What should happen
archive-date should not be added when archivedate is present or change archivedate to archive-date if that is preferred
Relevant diffs/links
[78] [79] + more
We can't proceed until
Feedback from maintainers


Started since https://github.com/ms609/citation-bot/pull/1947 was merged. --Redalert2fan (talk) 14:59, 19 July 2019 (UTC)

GRRRRR. I checked for "archive-date" and "archive-date". Will be fixed soon. AManWithNoPlan (talk) 15:41, 19 July 2019 (UTC)
Ah I just came to report this. [80] Nemo 15:46, 19 July 2019 (UTC)

Thanks for the quick fix! --Redalert2fan (talk) 16:05, 19 July 2019 (UTC)

If/when deployed, it's worth re-running on all pages in Category:Pages with citations having redundant parameters, which swelled a bit today (thanks Trappist the monk for reporting). Nemo 16:05, 19 July 2019 (UTC)

if they are identical, now the bot removes the extra one. AManWithNoPlan (talk) 18:13, 19 July 2019 (UTC)

Proxy subzero.lib.uoguelph.ca

Status
{{fixed}}
Reported by
Nemo 16:04, 19 July 2019 (UTC)
What happens
special:diff/887657108
What should happen
special:diff/906976400
Replication instructions
Links like http://www.sciencedirect.com.subzero.lib.uoguelph.ca/ are not identified as proxies.
We can't proceed until
Feedback from maintainers


Should be generalized to cover every http(s)://www.sciencedirect.com(proxycrap)/ possible. Headbomb {t · c · p · b} 16:11, 19 July 2019 (UTC)

LIPIcs

Status
{{fixed}} with booktitle and book-title code
Reported by
David Eppstein (talk) 18:41, 19 July 2019 (UTC)
What happens
Adds incorrect and incorrectly capitalized journal= parameter to conference proceedings (book) citation in LIPIcs book series
What should happen
Special:Diff/906995033 (ignore the unrelated sciencedirect url in a different citation). More generally Citation bot should never add a journal= parameter to a citation with incompatible parameters (in this case booktitle).
Relevant diffs/links
Special:Diff/906960341
We can't proceed until
Feedback from maintainers


The "problem" here is that in {{cite journal}} the "journal" field is really used to mean serial. I doubt we even have templates to precisely replicate all the FRBR and host/components hierarchies. Nemo 19:36, 19 July 2019 (UTC)
Well, this is a {{cite conference}} with a |series= already present. That should be enough to figure out that adding a |journal= to that likely doesn't make much sense. Headbomb {t · c · p · b} 20:36, 19 July 2019 (UTC)

Re 'ref' and 'mode' parameters

Could the bot drivers be requested to not add line breaks where |ref= or |mode= are on the same line with (that is, immediately following) "{{cite xxx" or "{{citation"? These parameters change the behavior of those templates in very significant ways, effectively changing the template. Having these parameters deeper into the argument makes them less visible, and creates confusion. Where an editor sees fit to put them on the same line, that should be respected. ♦ J. Johnson (JJ) (talk) 22:10, 18 July 2019 (UTC)

Could you point out a case where that was done? AManWithNoPlan (talk) 22:51, 18 July 2019 (UTC)
The instance I have at hand was actually InternetArchiveBot's doing, whereas the instance I thought(?) was Citation_bot's doing is not readily at hand. Okay, maybe not a problem here. ♦ J. Johnson (JJ) (talk) 23:20, 19 July 2019 (UTC)
{{notabug}} for now. If you find one, then bring it up again. AManWithNoPlan (talk) 23:28, 19 July 2019 (UTC)

Vol/Issue cleanup

Status
{{wontfix}}
Reported by
Headbomb {t · c · p · b} 19:46, 14 July 2019 (UTC)
What should happen
[81]
We can't proceed until
Feedback from maintainers


See the |volume=24/2 --> |volume=24 + |issue=2 type of stuff. Headbomb {t · c · p · b} 19:46, 14 July 2019 (UTC)

Might be too tricky to separate from cases like |volume=18/19|volume=18–19 however. Headbomb {t · c · p · b} 23:13, 14 July 2019 (UTC)

cite theses vs cite document for same link

At [82] cite web was changed to cite thesis. Also type = Thesis was added. but at [83] Cite web was changed to Cite document. As far as I can see the only difference before was c vs C in cite. Further do we need type = Thesis if we have cite thesis? --Redalert2fan (talk) 19:47, 19 July 2019 (UTC)

|type=thesis differentiates it from |type=dissertation. (tiny difference). AManWithNoPlan (talk) 21:49, 19 July 2019 (UTC)
the difference comes from the bot not getting consistent meta data for odd reasons. AManWithNoPlan (talk) 21:56, 19 July 2019 (UTC)
{{wontfix}} but odd. AManWithNoPlan (talk) 22:58, 20 July 2019 (UTC)

DOIs that CrossRef does not resolve, but some other provider resolves

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 06:55, 20 July 2019 (UTC)
What should happen
[84]
We can't proceed until
Feedback from maintainers


I have encountered and manually corrected a few of these too. Another common pattern is Wiley DOIs losing a central <> element like "<839::AID-NME423>" and DOIs truncated after a dot or missing dots between digits. Nemo 08:34, 20 July 2019 (UTC)
even worse, DOIs that end with a dot and the dots part of the doi. AManWithNoPlan (talk) 18:34, 20 July 2019 (UTC)
Never saw those. Smith, J. (2015). "Article". Journal. doi:10.1234/0123456789 (inactive 2019-07-21).{{cite journal}}: CS1 maint: DOI inactive as of July 2019 (link) throws an error, and there are no such errors found on Wikipedia. Headbomb {t · c · p · b} 18:51, 20 July 2019 (UTC)
The evil period ending doi required some magic to avoid the error (it might have been encoding OR the URL was used, but both the url and doi fields had bot stopping comments added). I do not remember what I did. AManWithNoPlan (talk) 20:05, 20 July 2019 (UTC)
https://github.com/ms609/citation-bot/pull/1962 The problem is not the period, the code already fixed that. The problem was that the DOI is not in CrossRef, so that code did not realize that dropping the period fixed it. AManWithNoPlan (talk) 20:05, 20 July 2019 (UTC)
Also, add this which is extra aggressive, since period ending dois generate errors. https://github.com/ms609/citation-bot/pull/1966 AManWithNoPlan (talk) 21:31, 20 July 2019 (UTC)

Question about API output / double work?

Hello, I just ran the bot on this revision of TRAPPIST-1 giving these results. In the API output this section caught my eye;

> Remedial work to prepare citations
> Trying to convert ID parameter to parameterized identifiers.
> Trying to convert ID parameter to parameterized identifiers.
  ~ Renamed "date" -> "CITATION_BOT_PLACEHOLDER_date"
  ~ Renamed "CITATION_BOT_PLACEHOLDER_date" -> "date"
> Trying to convert ID parameter to parameterized identifiers.
  ~ Renamed "year" -> "CITATION_BOT_PLACEHOLDER_year"
  ~ Renamed "CITATION_BOT_PLACEHOLDER_year" -> "year"
  ~ Renamed "date" -> "CITATION_BOT_PLACEHOLDER_date"
  ~ Renamed "CITATION_BOT_PLACEHOLDER_date" -> "date"

In the end no dates were changed or added. Is this intended behavior or is there some accidental double work going on? --Redalert2fan (talk) 11:43, 20 July 2019 (UTC)

You have just seen inside the machine where the sausage is being made. We have to temporarily move some things out of the way and then put them back during some API calls. AManWithNoPlan (talk) 14:11, 20 July 2019 (UTC)
https://github.com/ms609/citation-bot/pull/1961 AManWithNoPlan (talk) 14:18, 20 July 2019 (UTC)
{{fixed}}

Remove truncated Elsevier URLs

Status
{{fixed}}
Reported by
Nemo 13:08, 20 July 2019 (UTC)
What happens
URLs such as http://www.sciencedirect.com/science/article/pii/ (without any ID in the final path, or a truncated ID) 404 and are therefore not removed.
What should happen
If the DOI is functioning, just remove them: special:diff/907094949
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/1964 AManWithNoPlan (talk) 20:02, 20 July 2019 (UTC)

Had to run the bot twice

Status
new bug
Reported by
(tJosve05a (c) 14:30, 20 July 2019 (UTC)
What happens
Had to run the bot twice on the same page to get the PMID
What should happen
The PMID should have been found with the first edit
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Pestalotiopsis&diff=prev&oldid=907103550
We can't proceed until
Feedback from maintainers


Running the bot another time (3rd time), it also finds PMC ID: https://en.wikipedia.org/w/index.php?title=Pestalotiopsis&diff=prev&oldid=907103709 (tJosve05a (c) 14:31, 20 July 2019 (UTC)

pubmed is acting funny right now. AManWithNoPlan (talk) 15:32, 20 July 2019 (UTC)
annoyingly {{notabug}} on our part. AManWithNoPlan (talk) 16:26, 20 July 2019 (UTC)

API: Batch run summaries

Currently when a run is completed you get:

  • Done all 100 pages in Category:X.

Could the number of pages edited also be added to this? Giving:

  • Done all 100 pages in Category:X. Made changes to 25 pages.

Or some sort of a variation of that?

--Redalert2fan (talk) 18:24, 10 July 2019 (UTC)

Or a summary with diffs

Suppressing the | You can use this bot yourself. Report bugs here. | Activated by User:Username part of the edit summary. Headbomb {t · c · p · b} 19:37, 10 July 2019 (UTC)

I suppose you mean suppressing it in the API only and not the actual posted edit summary by the bot? This would massively help with checking the edits so support for this. But if its quick to implement my original suggestion at least helps a bit already in my opinion. Redalert2fan (talk) 19:52, 10 July 2019 (UTC)
Yes, in the API only. Whoever activated the bot knows they activated the bot and that it's possible for them to do so. Headbomb {t · c · p · b} 20:05, 10 July 2019 (UTC)
{{fixed}} a lot for now. AManWithNoPlan (talk) 23:30, 19 July 2019 (UTC)
@AManWithNoPlan: still missing for multiple articles (e.g. Page 1|Page 2|Page 3|...|Page N). Headbomb {t · c · p · b} 06:53, 20 July 2019 (UTC)

{{fixed}} AManWithNoPlan (talk) 13:27, 20 July 2019 (UTC)

@AManWithNoPlan: the summary diffs don't contain the oldids and you get stuff like [85]. Headbomb {t · c · p · b} 18:59, 20 July 2019 (UTC)
https://github.com/ms609/citation-bot/pull/1963 copy/pasted code leads to a variable not being defined. AManWithNoPlan (talk) 20:06, 20 July 2019 (UTC)
{{fixed}}

"Removed URL that duplicated unique identifier"

[86] I'm getting these all the time now and I think they arguably make the citation sections worse. There's no way that [edit: general readers] know to click on the linked "doi" when the citation's title itself is unlinked. I'll note that the {{cite journal}} documentation examples keep the url parameter even when a doi is provided.

Where is the consensus to make this edit en masse? czar 13:26, 20 July 2019 (UTC)

I'm going out now but I'll leave a quick answer to one of your points: a lot of people do, in fact, know to click the DOI. We know for sure from CrossRef data: https://www.crossref.org/blog/https-and-wikipedia/ https://www.crossref.org/blog/real-time-stream-of-dois-being-cited-in-wikipedia/ Nemo 13:36, 20 July 2019 (UTC)
unless the url is free to download without logging in, you should not add them unless there is no other links out. AManWithNoPlan (talk) 13:39, 20 July 2019 (UTC)
there is even movement afoot to remove the automatic linking of titles when a PMC is present. AManWithNoPlan (talk) 13:47, 20 July 2019 (UTC)
My question was where this consensus has been established, or if this is just a practice localized to editors using this bot/tool. czar 15:17, 20 July 2019 (UTC)
I don’t have time to look it up, but the links are in the talk archives somewhere—hopefully someone not in an auto parts store can respond better. AManWithNoPlan (talk) 15:31, 20 July 2019 (UTC)
The general idea is that these links are redundant with the DOI/other identifiers, who are clear about where they take you (doi: version of record, jstor = jstor repository, etc... If you don't know what those are, we have the wikilinks). |url= is then freed up to be used for freely-available full text versions-of-record of the paper hosted on an author's website, or similar. If the DOI version is free, you can use |doi-access=free to mark it as free, etc. Headbomb {t · c · p · b} 17:29, 20 July 2019 (UTC)

Please see the usage page for why {{notabug}} AManWithNoPlan (talk) 15:50, 21 July 2019 (UTC)

103 more proxies

Proxies for www.sciencedirect.com which we currently link somewhere: quarry:query/37794. Nemo 00:00, 21 July 2019 (UTC)

{{fixed}} got them all. AManWithNoPlan (talk) 15:50, 21 July 2019 (UTC)
Thanks. Only for ScienceDirect itself though? I suppose these proxies are used for other publishers too. Nemo 15:54, 21 July 2019 (UTC)
ouch, that hurts. 😂🤣AManWithNoPlan (talk) 15:55, 21 July 2019 (UTC)
https://github.com/ms609/citation-bot/pull/1975 this will get most of those and many others. AManWithNoPlan (talk) 16:53, 21 July 2019 (UTC)
{{fixed}}

Adds two weird DOIs

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 07:12, 21 July 2019 (UTC)
What happens
[87]
We can't proceed until
Feedback from maintainers


I checked an older version and it did it back in the day too. Will investigate. AManWithNoPlan (talk) 16:54, 21 July 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1977 AManWithNoPlan (talk) 19:17, 21 July 2019 (UTC)

Weird and capitalization

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 07:20, 21 July 2019 (UTC)
What happens
[88]
What should happen
[89]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/1974 AManWithNoPlan (talk) 16:54, 21 July 2019 (UTC)

Privacy settings is not a good title

Status
{{fixed}}
Reported by
Redalert2fan (talk) 21:22, 21 July 2019 (UTC)
What happens
[90]
What should happen
[91]
We can't proceed until
Feedback from maintainers


Comes from a (redirect to a) cookie consent popup. --Redalert2fan (talk) 21:22, 21 July 2019 (UTC)

Caps

Status
{{fixed}}
Reported by
Doremo (talk) 02:44, 23 July 2019 (UTC)
We can't proceed until
Feedback from maintainers


The bot is incorrectly changing non-English capitalization (as here, where društva za should be lower case). Doremo (talk) 02:44, 23 July 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1989 AManWithNoPlan (talk) 11:32, 23 July 2019 (UTC)

Bad title: WebCite query result

Status
{{fixed}}
Reported by
(tJosve05a (c) 11:24, 23 July 2019 (UTC)
What happens
|title=WebCite query result
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/1990 AManWithNoPlan (talk) 11:31, 23 July 2019 (UTC)

Handle dates that are ranges of two dates

Status
{{fixed}}
Reported by
Keith D (talk) 15:29, 24 July 2019 (UTC)
What happens
The BOT adds date ranges in ISO format - which is not a valid date format. |date=2011-10-01 - 2011-12-17
What should happen
Should add in a valid format date in accordance with the use templates if present in the article. Should have added |date=1 October – 17 December 2011 or |date=October 1 – December 17, 2011
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Take_That_discography&diff=prev&oldid=907658837
We can't proceed until
Feedback from maintainers


SSRN link cleanup

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 16:14, 25 July 2019 (UTC)
What should happen
[92]
We can't proceed until
Feedback from maintainers


Untitled_new_bug

Status
{{fixed}}
Reported by
Keith D (talk) 12:14, 26 July 2019 (UTC)
What happens
Making cosmetic changes that do not affect output which is against BOT rules. Especially as this appears to be part of a large bulk run.
What should happen
Skip the article in question.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Dewsbury_(UK_Parliament_constituency)&curid=1821124&diff=907913824&oldid=907646499
We can't proceed until
Feedback from maintainers


Probably time to fix that blank publisher removal. AManWithNoPlan (talk) 12:20, 26 July 2019 (UTC)

Probably could be general to all blank removals. If only blank stuff is done, skip. Headbomb {t · c · p · b} 12:41, 26 July 2019 (UTC)
I found a couple places we called “tidy” on blank parameters. Will update soon. The only blank thing we remove then will be some postscript parameters when meaningless and the empty via parameter since its presence is rare and leads to misuse and parameters that duplicate set parameters (remove blank year if date is set) AManWithNoPlan (talk) 13:10, 26 July 2019 (UTC)
May be worth keeping removal of empty depreciated parameters such as |coauthor=. Keith D (talk) 17:17, 26 July 2019 (UTC)

Suppport template:Cite LSA

Status
{{wontfix}} for such a lame template
Reported by
Nemo 16:17, 26 July 2019 (UTC)
What should happen
special:diff/907985037
We can't proceed until
Feedback from maintainers


I just undid your edit. Seeing as the template does not support |doi=. AManWithNoPlan (talk) 17:17, 26 July 2019 (UTC)

Converted them to {{citation}} in that article. No reason to use such a feature-poor template on a non-linguistics related article. Headbomb {t · c · p · b} 19:46, 26 July 2019 (UTC)

Title = Untitled

Status
{{fixed}}
Reported by
Redalert2fan (talk) 18:18, 26 July 2019 (UTC)
What happens
title=Untitled-2
What should happen
nothing
Relevant diffs/links
[93] [94]
We can't proceed until
Feedback from maintainers


Convert cite web to journal and add DOI

Status
{{notabug}}, just grumpy. If this keeps up, then we add PII API support some day
Reported by
Nemo 06:01, 27 July 2019 (UTC)
What should happen
special:diff/908065540
Replication instructions
I'm not sure whether it's about cite web without DOI being left alone altogether, or ScienceDirect acting up as usual.
We can't proceed until
Feedback from maintainers


Yet another reason to drop urls and us doi instead as discussed above. Probably science direct being grumpy. AManWithNoPlan (talk) 16:22, 27 July 2019 (UTC)

Same with {{cite article}} and {{cite conference}}: are they supposed to be left alone or did something go wrong? special:diff/908234528. Nemo 11:17, 28 July 2019 (UTC)

Bracketed issues

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 22:47, 28 July 2019 (UTC)
What should happen
[95]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2034 AManWithNoPlan (talk) 11:51, 30 July 2019 (UTC)

CAPS: JAMA

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 22:57, 29 July 2019 (UTC)
What should happen
[96]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2031 AManWithNoPlan (talk) 02:14, 30 July 2019 (UTC)

Science (New York, N.y.)

Status
{{fixed}}
Reported by
Nemo 07:18, 30 July 2019 (UTC)
What happens
Citation bot gets journal name with places from pmid:6623089
What should happen
special:diff/908518681
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2033 AManWithNoPlan (talk) 11:51, 30 July 2019 (UTC)

This should just be streamlined to Science, yup. I could find other similar cases too. Headbomb {t · c · p · b} 12:07, 30 July 2019 (UTC)

Consider linkinghub.elsevier.com always redundant

Status
{{fixed}}
Reported by
Nemo 07:28, 30 July 2019 (UTC)
What happens
Some linkinghub.elsevier.com URLs redundant with DOIs do not get removed, either because Elsevier acts up or because they contain extra parentheses and stuff in their IDs.
What should happen
special:diff/908519421: just remove them all when there is a DOI; this domain is never the final destination for the user, so all these links were added by some automatic tool when the users provided the DOI or other input.
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2032 AManWithNoPlan (talk) 11:51, 30 July 2019 (UTC)

Thanks, it works! Now I know 3300 articles on which the bot should be run. :) Nemo 20:23, 30 July 2019 (UTC)

journal/series clash

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 04:16, 27 July 2019 (UTC)
What should happen
[97]
We can't proceed until
Feedback from maintainers


This mostly due to the (Clifton, NJ) thing in one but not the other. Probably should be a hardcoded exception/equivalence. Headbomb {t · c · p · b} 04:16, 27 July 2019 (UTC)

Isn't that really a series of books so the citation should be something more like:
{{Cite book |last=Laustsen |first=Anders |last2=Bak |first2=Rasmus O. |editor=Yonglun Luo |date=2019 |title=CRISPR Gene Editing: Methods and Protocols |chapter=Electroporation-Based CRISPR/Cas9 Gene Editing Using Cas9 Protein and Chemically Modified sgRNAs |location=New York |publisher=Springer |pages=127–134 |doi=10.1007/978-1-4939-9170-9_9 |pmid=30912044 |isbn=978-1-4939-9169-3}}
Laustsen, Anders; Bak, Rasmus O. (2019). "Electroporation-Based CRISPR/Cas9 Gene Editing Using Cas9 Protein and Chemically Modified sgRNAs". In Yonglun Luo (ed.). CRISPR Gene Editing: Methods and Protocols. New York: Springer. pp. 127–134. doi:10.1007/978-1-4939-9170-9_9. ISBN 978-1-4939-9169-3. PMID 30912044.
Trappist the monk (talk) 00:53, 31 July 2019 (UTC)
Yes, with |series=Methods in Molecular Biology and |volume=1961. —David Eppstein (talk) 01:38, 31 July 2019 (UTC)
Also, if the 'fix' is to keep the template as {{cite journal}}, the next release of Module:Citation/CS1 suite will require {{cite journal}} to have |journal=.
Trappist the monk (talk) 00:56, 31 July 2019 (UTC)

Minor change which should not be done as a stand-alone edit

Status
{{fixed}}
Reported by
Jonatan Svensson Glad (talk) 19:55, 30 July 2019 (UTC)
What happens
https://en.wikipedia.org/w/index.php?title=Laver_table&diff=prev&oldid=908605409
We can't proceed until
Feedback from maintainers


See also User talk:Citation bot/Archive 17#Untitled new bug. Jonatan Svensson Glad (talk) 19:55, 30 July 2019 (UTC)

Caps: AAPS

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 11:29, 31 July 2019 (UTC)
What should happen
[98]
We can't proceed until
Feedback from maintainers


Caps: Série A et B / Series A & B / Series A and B

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 12:16, 31 July 2019 (UTC)
What should happen
[99]
We can't proceed until
Feedback from maintainers


Fails on Soil

Status
{{wontfix}} just a time out. I ran in non-slow mode successfully. Ran on test cluster in slow mode successfully (that won’t commit though)
Reported by
Headbomb {t · c · p · b} 20:02, 31 July 2019 (UTC)
What happens
The bot crashes
What should happen
No crash
We can't proceed until
Feedback from maintainers


Not sure i'd call it crashing, but it does quit before it finishes the page. Is the page too big at 389kb? When I tried, it got about half way through to checking AdsAbs for the citation of DOI 10.1111/j.1438-8677.1971.tb00715.x   — Chris Capoccia 💬 01:39, 1 August 2019 (UTC)
600 references is quite near the point where I've often seen citation bot fail. Nemo 06:55, 1 August 2019 (UTC)

Full run (sadly tests can not save) Time: 43.14 minutes, Memory: 42.01MB AManWithNoPlan (talk) 14:54, 2 August 2019 (UTC)

Does it go down if we reduce the timeout? 20 seconds multiplied by 500 makes for a very high upper limit. Nemo 15:35, 2 August 2019 (UTC)

It can work if the article is split in half and run the bot two times for each half. QuackGuru (talk) 17:18, 2 August 2019 (UTC)

Foreign language capitalization

Status
{{fixed}}
Reported by
Doremo (talk) 07:29, 1 August 2019 (UTC)
We can't proceed until
Feedback from maintainers


The bot is incorrectly capitalizing non-English journal names, as here, where razgledi should not be capitalized. Doremo (talk) 07:29, 1 August 2019 (UTC)

Do not truncate google.com/search/ URLs

Status
{{fixed}}
Reported by
Nemo 20:45, 1 August 2019 (UTC)
What should happen
nothing i.e. special:diff/908914340
We can't proceed until
Feedback from maintainers


PMC cleanup

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 13:54, 31 July 2019 (UTC)
What should happen
[100]
We can't proceed until
Feedback from maintainers


Compatible license

See this edit. I would like the bot to automatedly do this without having to summon the bot.

See this edit. I would like the bot to automatedly do this without having to summon the bot.

This is not a bot bug. Is it possible to program the bot to automatedly restore the required proper attribution in accordance with WP:MEDCOPY? If this bot can't be programmed to do this then which bot on Wikipedia can be programmed to do this? QuackGuru (talk) 17:34, 2 August 2019 (UTC)

The link is not required by the license: what matters is that you name the authors and the license. So, personally I prefer to leave the URL in the citation. Nemo 17:42, 2 August 2019 (UTC)
Spanning over multiple templates is beyond the scope of this bot. Not really sure what the distinction between referencing something and adding a separate "we stole a bunch of text from this freely copyable source" template also is. The extra template relisting the exact same information is quite ugly. AManWithNoPlan (talk) 17:50, 2 August 2019 (UTC)
Seems like this would be better: Smith, G. C.; Pell, J. P. (2006). "Parachute use to prevent death and major trauma related to gravitational challenge: Systematic review of randomised controlled trials". The International Journal of Prosthodontics. 19 (2): 126–8. doi:10.1136/bmj.327.7429.1459. PMC 300808. PMID 16602356.   This article incorporates text available under the CC BY 4.0 license. AManWithNoPlan (talk) 17:53, 2 August 2019 (UTC)
That is missing the authors and a link to the full paper. Proper and full attribution is required for each license. See Template:CC-notice. QuackGuru (talk) 18:31, 2 August 2019 (UTC)

BU RoBOT disabled. There may be something useful to salvage. QuackGuru (talk) 20:05, 2 August 2019 (UTC)

{{wontfix}} this bot is a poor choice. AManWithNoPlan (talk) 20:23, 4 August 2019 (UTC)

More references with DOI but no template

Can the bot be slightly more comprehensive in catching references with unstructured citations like this? (Where I had to manually remove everything and replace with cite journal + doi.) Nemo 16:23, 12 July 2019 (UTC)

the bot does this kind of thing, when it sees that citation templates dominate over non-citation templates. We avoid running a bulldozer over citevar AManWithNoPlan (talk) 16:46, 12 July 2019 (UTC)
Yes, I'm asking if it would be fine to catch a case like this one I linked. If so, I could submit a patch. Nemo 17:05, 12 July 2019 (UTC)
I verified that the specific case you mention is not supported. A patch would be good. AManWithNoPlan (talk) 17:40, 12 July 2019 (UTC)
{{tl|wontfix}} that's just too complicated and risky for an automated process. AManWithNoPlan (talk) 13:35, 19 July 2019 (UTC)

Sorry, dearchived because I'm still looking for good examples to treat: special:diff/907215103, special:diff/907215476, special:Diff/907219641.

<ref>[https://doi.org/10.5752/P.2175-5841.2011v9n22p396 Abumanssur, Edin Sued. 2011. “A conversão ao pentecostalismo em comunidades tradicionais.” Horizonte 9 (22): 396–415. DOI: doi.org/10.5752/P.2175-5841.2011v9n22p396 <span></span>].</ref>
<ref>M.C. Curthoys, M. C., and H. S. Jones, "Oxford athleticism, 1850–1914: a reappraisal." ''History of Education'' 24.4 (1995): 305–317. [http://www.tandfonline.com/doi/abs/10.1080/0046760950240403?journalCode=thed20 online]</ref>
<ref>Aday, S. (2010), "[http://www3.interscience.wiley.com/journal/123303811/abstract Chasing the bad news: An analysis of 2005 Iraq and Afghanistan war coverage on NBC and Fox News channel]", ''Journal of Communication'' 60 (1), pp. 144–164</ref>
* Brockliss, Laurence W B, ''The University of Oxford: A History'', [[Oxford University Press]] (Oxford, 2016); 11th century to present; {{doi|10.1093/acprof:oso/9780199243563.001.0001}} online

We can also send an entire line to Citoid and it will use the CrossRef service to get suggestions on what that might be. Sometimes the result is far off, but we can try and make sure it's similar enough. Nemo 10:09, 21 July 2019 (UTC)

{{wontfix}} because of risks of deleting notes, etc. and CITEVAR rules. AManWithNoPlan (talk) 15:23, 5 August 2019 (UTC)

Do not use <br> in title=

Status
{{fixed}}
Reported by
Redalert2fan (talk) 12:29, 31 July 2019 (UTC)
What happens
when converting to citweb and adding a title= <br> was added 2 times in title =
What should happen
Do not add <br> in title= / strip <br> from the rest of the title
Relevant diffs/links
[101] [102]
We can't proceed until
Feedback from maintainers


A quite weird instance, it does seem that on the reference there are 2 titles used because the press release discusses multiple things so the actual given title is "Bombardier Announces Financial Results for the Third Quarter Ended September 30, 2015 <br><br>Government of Québec Partners with Bombardier for $1 billion in C Series as Certification Nears". However I think this is clearly unwanted because it adds unnsecary blank lines in the reflist. --Redalert2fan (talk) 12:29, 31 July 2019 (UTC)

More minor changes that should not be done as single edit

Status
{{fixed}} by just removing work parameter that is useless in cite web
Reported by
Redalert2fan (talk) 14:12, 31 July 2019 (UTC)
What happens
empty work = is changed to empty website =
What should happen
Do not do this as the only edit, and if made in addition to other changes empty work = should be removed instead of changed
Relevant diffs/links
[103] [104]
We can't proceed until
Feedback from maintainers


These edits are done to prevent future errors. The better parameter is website not work for this citation, so we fix it now. AManWithNoPlan (talk) 14:25, 31 July 2019 (UTC)

Wouldn't it be beter then to remove it completely in cases like these citations when it is empty? That would remove possibilities for future errors. --Redalert2fan (talk) 14:36, 31 July 2019 (UTC)
That assumes the website is empty on purpose, rather than by omission. Headbomb {t · c · p · b} 19:53, 31 July 2019 (UTC)
In my experience a large fraction of {{cite web}} templates should really be {{cite journal}}, {{cite magazine}}, {{cite news}}, etc., and their |work= parameters should really be the title of the journal, magazine, or newspaper. Calling it a website makes a stupid use of the wrong template even stupider, and will no doubt encourage users to fill it in with the url or hostname instead of the actual title of the collective work. I think switching the name of the parameter in this way is a bad idea. —David Eppstein (talk) 01:51, 1 August 2019 (UTC)

Adding chapter-url identical to URL

Status
{{fixed}}
Reported by
Nemo 09:59, 4 August 2019 (UTC)
What should happen
Maybe special:diff/909274087 ?
We can't proceed until
Feedback from maintainers


Never seen that before. Probably should also detect and fix this too. AManWithNoPlan (talk) 20:31, 4 August 2019 (UTC)

Redundant URL with suffix not removed

Status
{{fixed}}
Reported by
Nemo 17:57, 4 August 2019 (UTC)
What happens
Nothing
What should happen
special:diff/909325211
Replication instructions
Variants of the URL to which the DOI resolves, which just differ by a prefix like /abs or a suffix like /epdf, seem not to be matched as redundant at the moment.
We can't proceed until
Feedback from maintainers


garbage publisher

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 03:25, 27 July 2019 (UTC)
What should happen
[105]
We can't proceed until
Feedback from maintainers


This is because people think journals with a PMC/PMID entry are published by 'National Center for Biotechnology Information, U.S. National Library of Medicine'. Headbomb {t · c · p · b} 03:25, 27 July 2019 (UTC)

there are lot of things where that is the valid publisher. Will have to think about. AManWithNoPlan (talk) 21:28, 5 August 2019 (UTC)
It won't be a legit publisher for a journal, though. Headbomb {t · c · p · b} 21:30, 5 August 2019 (UTC)
https://github.com/ms609/citation-bot/pull/2051 AManWithNoPlan (talk) 14:16, 6 August 2019 (UTC)

If title = journal, TNT both and refill

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 04:17, 27 July 2019 (UTC)
What should happen
[106]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2054 AManWithNoPlan (talk) 14:54, 6 August 2019 (UTC)

If a title is in allcaps (and long?), TNT and reget

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 08:11, 1 August 2019 (UTC)
What should happen
[107]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2054 AManWithNoPlan (talk) 14:54, 6 August 2019 (UTC)

More JSTOR

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 19:39, 5 August 2019 (UTC)
What should happen
[108], [109], [110]
We can't proceed until
Feedback from maintainers


Only stable JSTORs that match specific patterns are processed. I will have to look into adding more Regex. AManWithNoPlan (talk) 21:27, 5 August 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2052 AManWithNoPlan (talk) 14:16, 6 August 2019 (UTC)

Broken AcademiaEdu and Silverchair URLs

Status
{{fixed}}
Reported by
Nemo 13:43, 6 August 2019 (UTC)
What happens
There are a few hundreds horrid links to watermark.silverchair.com and s3.amazonaws.com/academia.edu.documents/ which are broken.
What should happen
special:diff/909610935
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2053 AManWithNoPlan (talk) 14:28, 6 August 2019 (UTC)

URL change requests can also be made at WP:URLREQ which can do many URL-specific issues like URLs located outside of CS1|2 templates, {{webarchive}} templates, archive-url additions and deletions, fixing bad encoding, updating IABot database, URLs on Commons, etc.. -- GreenC 14:47, 6 August 2019 (UTC)
Thank you. However, the last time I understood it doesn't handle outright removal of URLs which are pure garbage. Adding wayback machine links to those garbage URLs only multiplies the garbage for no benefit. Nemo 16:48, 6 August 2019 (UTC)

Ugh there's another one, onlinelibrarystatic.wiley.com/store/ Nemo 18:15, 6 August 2019 (UTC)

MR Support

I find the bot an extremely good idea; many thanks to its developers! Here is one feature request: the bot should also apply to snippets such as

{{Citation|mr=MR0258885}}

by referring to MathSciNet (in this case to ([111]) and retrieve the information from there (or possibly retrieve the doi from there and then proceed as usual). Thanks for considering this extension! Jakob.scholbach (talk) 09:56, 23 July 2019 (UTC)

The big issue is that MathSciNet is subscription only, and will likely not allow bots to scrape the data. Headbomb {t · c · p · b} 11:03, 23 July 2019 (UTC)
we already support this, but explicitly do not enable this feature because you have to manually do a captcha. AManWithNoPlan (talk) 11:12, 23 July 2019 (UTC)
OK, I am entirely ignorant about how the bot works, but if I am able to access the mathscinet content in my browser, is there no way of giving the same access to the bot?
Of course I would not expect it to work for someone without a subscription. Jakob.scholbach (talk) 11:42, 23 July 2019 (UTC)
I just saw at the github page that the bot is able to pull data from JSTOR. JSTOR is also behind a pay-wall, so what precisely is the difference here? Jakob.scholbach (talk) 11:44, 23 July 2019 (UTC)
jstor is not behind a paywall. You can visit it all day and night. Secondly, the meta-data is per my request not protected by any captcha on jstor. AManWithNoPlan (talk) 11:57, 23 July 2019 (UTC)
I have a couple ideas and will look again. I must admit I was wrong. MR is NOT CAPTCHA protected. AManWithNoPlan (talk) 11:22, 25 July 2019 (UTC)
I feel you will likely get hit by the MR banhammer for bots, but it's at least worth investigating using existing MR to complete the rest of the information. I think the confusion happened because Zbl is captcha protected. You won't be able to search MR though, that's a subscription only thing. Headbomb {t · c · p · b} 13:17, 25 July 2019 (UTC)

Am I seeing it in action? special:diff/908301084 at least fixed the case. Thanks, Nemo 21:00, 28 July 2019 (UTC)

nope. AManWithNoPlan (talk) 00:25, 7 August 2019 (UTC)
I will look at. For example, some even have doi information https://mathscinet.ams.org/mathscinet-getitem?mr=22222 AManWithNoPlan (talk) 01:10, 7 August 2019 (UTC)
will now see if MR linked page has a DOI and will add that. {{fixed}} as we can since refs are mostly free format AManWithNoPlan (talk) 19:21, 7 August 2019 (UTC)

Bot down?

I can't seem to make the bot edit for a few hours now. Headbomb {t · c · p · b} 01:52, 7 August 2019 (UTC)

Very much so. I have asked for a reboot. AManWithNoPlan (talk) 14:03, 7 August 2019 (UTC)
Webservice restarted and bot appears operational. Martin (Smith609 – Talk) 15:24, 7 August 2019 (UTC)

{{fixed}}

Better batch/queuing handling

When you ask the bot to run on say Category:Foobar, the entire category will enter the job queue and get processed. So if means if you have something like

Extended content
  • 10:00:00 am Category:Foobar A is requested to be processed by User:A
  • 10:00:01 am Foobar B is requested to be processed by User:B
  • 10:00:02 am Category:Foobar C is requested to be processed by User:C
  • 10:00:03 am Foobar D is requested to be processed by User:D

You could very well have

10:00 Foobar A1 is being processed
10:01 Foobar A2 is being processed
10:02 Foobar A3 is being processed
10:03 Foobar A4 is being processed
10:04 Foobar A5 is being processed
10:05 Foobar A6 is being processed
10:06 Foobar A7 is being processed
10:07 Foobar A8 is being processed
10:08 Foobar A9 is being processed
10:10 Foobar A10 is being processed
10:11 Foobar A11 is being processed
10:12 Foobar A12 is being processed
10:13 Foobar A13 is being processed
10:14 Foobar A14 is being processed
10:15 Foobar A15 is being processed
10:16 Foobar A16 is being processed
10:17 Foobar B is being processed
10:18 Foobar C1 is being processed
...
13:14 Foobar C235 is being processed
13:15 Foobar D is being processed

Leading to massive delays for User B and User D. A fairer queuing process would be to put each request into a bin

  • Bin A [16 articles]
  • Bin B [1 article]
  • Bin C [235 articles]
  • Bin D [1 article]

And cycle between active 'bins' until each get empty. So you'd have a queue that looks like

10:00 Foobar A1 is being processed
10:01 Foobar B1 is being processed
10:02 Foobar C1 is being processed
10:03 Foobar D1 is being processed
10:04 Foobar A2 is being processed
10:05 Foobar B2 is being processed
10:06 Foobar A2 is being processed
10:07 Foobar B2 is being processed
...
10:35 Foobar A16 is being processed
10:36 Foobar B16 is being processed
10:36 Foobar B17 is being processed
10:36 Foobar B18 is being processed
10:36 Foobar B19 is being processed
...
13:15 Foobar C235 is being processed

Headbomb {t · c · p · b} 21:27, 29 June 2019 (UTC)

I would have to think about that. It your description of the current mode of operation is off; but, there could be improvements done. AManWithNoPlan (talk) 22:58, 29 June 2019 (UTC)

Whatever the current logic is, the taxonbar run right now is blocking anyone else from requesting edits. Similar things happen whenever I requested category runs. Headbomb {t · c · p · b} 23:30, 29 June 2019 (UTC)
There is no logic. Tasks are processed—for the most part—as received. Category and multiple page runs are treated as multiple tasks. AManWithNoPlan (talk) 02:02, 30 June 2019 (UTC)
The bot just does the entire category in one PHP request which has no knowledge of other people waiting. I see two possibilities: 1) ask Toolforge sysadmins how to get it to handle more requests concurrently; 2) split up the category job in multiple requests, e.g. by making the category page redirect to the process page URL for one title which will process just one page and redirect to the next and so on.
Arguably, the fundamental problem with category runs is that they mostly encounter pages which don't need to be treated. This run presumably went through over 1000 pages, checking all their URLs and identifiers and everything, but was only needed for less than 200. On the other hand, the whole point of the bot is that it saves time to a human who would otherwise have to do the hard work, such as selecting the pages which need some edits. Nemo 06:10, 30 June 2019 (UTC)
Refill2 uses celery to manage worker. If we go that type of route, then the category API would be changed to list generator that then calls the page API with a list. Single point of entry. AManWithNoPlan (talk) 15:12, 30 June 2019 (UTC)
Yes, with a tiny bit of additional complexity (preferably handled by some external library) the multi-page editing could be handled much better. Nemo 15:47, 30 June 2019 (UTC)

On top of binning, there could be some parallel processing of some kind, like having multiple instances of Citation bot running on the tool server, and when one of them was ready to make an edit, it would get queued. This way if you run on an article that takes ~10 minutes to process, other articles could still get dealt with. Headbomb {t · c · p · b} 23:19, 2 July 2019 (UTC)

Job Arrays work on Toolforge, it will run 16 slots at a time filling in empty slots until the submitted job queue is done (unlimited size). Requires something like ZOT to do file locking on disk writes, or application-level file locking. -- GreenC 01:17, 3 July 2019 (UTC)
I have to say that with the way the bot is being used right now, this really, really would help. I made a ~50 article request last night that took something like 3-4 hours to process. Would have been nice to be able to use the bot on select articles while the large run was going on. Headbomb {t · c · p · b} 15:51, 3 July 2019 (UTC)
Since usage has gone up recently by quite a bit this would definitely help (obviously). This actually would enable more people to use the bot at the same time, or if a single person splits their request to do said request faster. We do have to then look out for people who might (ab)use this by just submitting 10x the jobs and taking everything for themselves either by accident or lack of patience but it seems that that might be mitigated by keeping some "slots" free for "Expand citations" via the toolbar, AFCH templates and single page request via the api/interface if that would be possible. --Redalert2fan (talk) 12:50, 14 July 2019 (UTC)
Usage should go down substantially now the Marianne account is blocked. I and a few others will still make big requests, but at least it won't be constant. Headbomb {t · c · p · b} 20:03, 14 July 2019 (UTC)
Definitely better since the block yes, however when 2 users run (like you and I at the moment of posting) there is already a noticeable delay. While not constant right now now it could become a problem if even more people use the bot. In my opinion it would be better to "future proof" the bot, I understand this takes a lot of work but again in my opinion if it can be done would be helpful. --Redalert2fan (talk) 17:21, 19 July 2019 (UTC)
Yes, even small-ish ~100 article runs are nightmares to do sometimes. I found that asking to do more that than are often leads to large delays and timeout errors. Which is a shame because you can find articles in need of highly-probably cleanup/tidying that the bot could do (like running on all pages that contain url=https://www.tandfonline.com/doi/...), but those often number in the thousands. Headbomb {t · c · p · b} 17:30, 19 July 2019 (UTC)
I just waited an hour for a batch of about 10 to start and got a 504 timeout, not very encouraging to operate. I have no problem with waiting and running again but others might not and lose interest. Productivity is being lost sadly. I'm not particularly looking for the bot to be quicker, when one person runs it the time it takes to check is fine, what I would be looking for is that multiple users can use it at the same time. Would it be possible to run more instances at the same time? Yes the bot might have to throttle its edits but that's better than having user jobs not starting within a reasonable time. Redalert2fan (talk) 18:42, 19 July 2019 (UTC)

I suggest you do not run it in slow mode. Disables AdsAbs and zotero AManWithNoPlan (talk) 18:05, 19 July 2019 (UTC)

Not sure what more would be lost, so I'd rather run the full gamut of fixes. Also ADSABS is very desirable. Headbomb {t · c · p · b} 18:18, 19 July 2019 (UTC)
Speaking of AdsAbs, seems we might run out of uses again today. Redalert2fan (talk) 18:42, 19 July 2019 (UTC)
The easiest solution is to further reduce the timeout on individual requests to Zotero and others: it helps avoid traffic jams when there are too many requests and/or a single URL inside a page is especially slow.
Let me also remind that when citation bot doesn't make you happy you can always spend some time on Wikipedia:OABOT! Nemo 18:38, 19 July 2019 (UTC)
OABot is good after cleanup has been done usually. Headbomb {t · c · p · b} 20:39, 19 July 2019 (UTC)
Yes, and many articles are now ready for an OAbot run: there are about 30k articles in the queue as of now, with 35k link suggestions. Nemo 08:53, 20 July 2019 (UTC)
I added a note on the bot's userpage, letting people know of OAbot. Feel free to tweak it. Headbomb {t · c · p · b} 19:32, 20 July 2019 (UTC)
Surprised to see that nobody blocked OABot for adding CiteSeerX links. Go figure. — kashmīrī TALK 03:49, 27 July 2019 (UTC)

Some statistics on the busiest months, just for context:

Extended content
MariaDB [enwiki_p]> select substr(rev_timestamp, 1, 6) as date, count(rev_id) AS count from revision_userindex where rev_actor=307 group by date having count > 2000;                                     +--------+-------+
| date   | count |
+--------+-------+
| 200810 |  2260 |
| 200812 | 47504 |
| 200903 |  2963 |
| 200904 | 16344 |
| 200905 |  7279 |
| 201001 |  4072 |
| 201003 |  4356 |
| 201012 |  4251 |
| 201103 |  2818 |
| 201105 |  2398 |
| 201302 |  2453 |
| 201303 |  2244 |
| 201403 |  2935 |
| 201410 |  2059 |
| 201708 |  6116 |
| 201805 |  3245 |
| 201808 |  6531 |
| 201809 | 10076 |
| 201810 |  7365 |
| 201811 | 11001 |
| 201812 | 10289 |
| 201901 | 19332 |
| 201902 | 47795 |
| 201903 | 28010 |
| 201906 |  7557 |
| 201907 | 26383 |
+--------+-------+
26 rows in set (5 min 21.82 sec)

Nemo 09:33, 22 July 2019 (UTC)

Stupid large runs hogging all the resources... a category with 5K+ articles is not great. Headbomb {t · c · p · b} 23:38, 29 July 2019 (UTC)
It used to have 16k! When either you or Chris are using the bot for batch runs, I just go do something else. :) Nemo 06:54, 30 July 2019 (UTC)
Yeah, well not much choice. I limit mine in bunches of 100 usually, this way any other request made will not be delayed for too long and won't time out. But it would be nice to just be able "Alright, deal with those X thousand pages with this stuff that's completely fixable". Headbomb {t · c · p · b} 12:05, 30 July 2019 (UTC)

This is getting really, really annoying to have request constantly timeout for hours because large categories are being requested. Please prioritize this. Headbomb {t · c · p · b} 09:09, 5 August 2019 (UTC)

I have no idea how how the tool servers handle multiple requests. Is seems as if they all run in parallel and the tool server only give so much cpu to the tools as a whole. AManWithNoPlan (talk) 12:06, 5 August 2019 (UTC)
@Smith609 and Kaldari: any ideas/feedback here? Headbomb {t · c · p · b} 12:35, 5 August 2019 (UTC)
I am willing to bet money that this is 95% zotero/citoid and 5% the bot. I have an idea. AManWithNoPlan (talk) 17:34, 6 August 2019 (UTC)
I've implemented Nemo's suggestion for citation bot and restarted the webservice. Let me know if it makes any difference. Martin (Smith609 – Talk) 15:54, 7 August 2019 (UTC)
Well, so far, nope. But someone just requested Category:Living people to be processed, with 900K articles in it. Please kill that run! Headbomb {t · c · p · b} 17:40, 7 August 2019 (UTC)

Today it feels better for me: I managed to use the gadget with very good response times even as Headbomb was doing some batches. Nemo 12:57, 8 August 2019 (UTC)

The tool does feel better/faster. However, I've yet to see different batches run alternatively. Nemo's success is possibly due to breaks in my requests (I ask for ~100 articles at a time which gives the bot a chance to catch up on other requests without timeouts). I do recall being able to use the citation helper script while the bot was doing a batch run though. Headbomb {t · c · p · b} 13:56, 8 August 2019 (UTC)
Nope, I mean I get speedy responses from the gadget in the midst of one of your run, in the same minutes when I see the bot perform several edits. the requests for single pages are much more efficient than batch requests, yes. Nemo 14:08, 8 August 2019 (UTC)
I confirm speedy response via the Citations of the gadget. Doesn't work through toolbar link/API however. Headbomb {t · c · p · b} 16:23, 8 August 2019 (UTC)
Many small requests work better than few huge ones, if you want I can write you a small script to do it efficiently. Email me to have it in your inbox. Nemo 17:21, 8 August 2019 (UTC)


Flagging as {{fixed}} for now. Will loop back as needed. Continue discussion under white list topic as needed. AManWithNoPlan (talk) 15:07, 9 August 2019 (UTC)

Links to search.proquest.com

What's the point of all those search.proquest.com links? When I click one from an otherwise complete citation template, I'm not even presented with a title for the resource, so I can't be sure whether the link points to something else entirely. I see they're sometimes pasted as part of some ready made textual citation with a "Retrieved from" link, so I doubt the editors were actually interested in keeping such links. Are they fine to remove? Nemo 18:12, 12 July 2019 (UTC)

if you are at the library (or have a library card), you can login and get them. Also, the link sometimes leads to a preview. Often when logged in with my library card, I can get a preview. AManWithNoPlan (talk) 20:44, 12 July 2019 (UTC)
But how does one verify the link leads to the correct resource, without access? Nemo 21:40, 12 July 2019 (UTC)
I think it might be better to use Template:ProQuest within |id= instead of placing the paywalled URL in |url=, but agree with AManWithNoPlan that there’s no need to remove the link altogether. Umimmak (talk) 20:48, 12 July 2019 (UTC)
Using |id= seems indeed superior to me. Is that something we can do systematically? Nemo 21:40, 12 July 2019 (UTC)
some other bot needs to change all the proquest.umi.com links into the equivalent search.proquest.com urls too (the document numbers are not the same 🙄) AManWithNoPlan (talk) 03:08, 14 July 2019 (UTC)
the bot now does extensive pro quest url cleanup. The umi.com ones are now fixed and most proxies and session specific information should be removed. AManWithNoPlan (talk) 12:49, 20 July 2019 (UTC)

{{fixed}} AManWithNoPlan (talk) 15:13, 9 August 2019 (UTC)

If title ends with 'on JSTOR', TNT title and reget

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 02:21, 8 August 2019 (UTC)
What should happen
[112]
We can't proceed until
Feedback from maintainers


To be clear, this isn't simply stripping 'on JSTOR' form the title, but rather reseting it entirely. Headbomb {t · c · p · b} 02:21, 8 August 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2060 AManWithNoPlan (talk) 13:51, 8 August 2019 (UTC)

title / encyclopedia duplication in cite encyclopedia

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 09:16, 8 August 2019 (UTC)
What happens
[113]
We can't proceed until
Feedback from maintainers


Code did not realize that encyclopeAdia was alias. AManWithNoPlan (talk) 15:08, 9 August 2019 (UTC)

10.5555 / Global Plants DOI invalid

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 09:31, 8 August 2019 (UTC)
What happens
[114]
We can't proceed until
Feedback from maintainers


10.5555 is a test doi prefix and will never resolve. On Wikipedia, the vast majority of them are for JSTOR Global Plants. In fact, nearly all 10.5555/... DOIs can probably be removed and converted to |url=https://plants.jstor.org/stable/10.5555/.... They should check if that url resolves however, since there are some 10.5555 DOIs that are tests for other things. Headbomb {t · c · p · b} 09:31, 8 August 2019 (UTC)

Any other 10.5555 DOI should be removed if there's a working URL provided. In total, those account for 435/ 3,281 = 13.25% of Category:Pages with DOIs inactive as of 2019 Headbomb {t · c · p · b} 09:40, 8 August 2019 (UTC)
https://github.com/ms609/citation-bot/pull/2059 AManWithNoPlan (talk) 13:38, 8 August 2019 (UTC)
@AManWithNoPlan: that should also remove |doi-broken-date=, e.g. [115] ... Headbomb {t · c · p · b} 16:10, 9 August 2019 (UTC)
Very true indeed. https://github.com/ms609/citation-bot/pull/2066 AManWithNoPlan (talk) 17:48, 9 August 2019 (UTC)

If title ends with 'IEEE Xplore Document', TNT title and reget

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 08:29, 9 August 2019 (UTC)
What should happen
[116]
We can't proceed until
Feedback from maintainers


Not fixed. At least not fully. I had to manually TNT [117] + [118]. Headbomb {t · c · p · b} 18:39, 9 August 2019 (UTC)

Not sure what the delay was. AManWithNoPlan (talk) 18:51, 9 August 2019 (UTC)

Caps: IEEE/Acm → IEEE/ACM

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 08:31, 9 August 2019 (UTC)
What should happen
[119]
We can't proceed until
Feedback from maintainers


remove doi-broken-date if no doi

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 16:12, 9 August 2019 (UTC)
What should happen
[120]
We can't proceed until
Feedback from maintainers


Kinda duplicate with one above, but this should be generalized behaviour, not just specific to 10.5555 broken DOIs. Headbomb {t · c · p · b} 16:12, 9 August 2019 (UTC)

Already thought of that. https://github.com/ms609/citation-bot/pull/2066 AManWithNoPlan (talk) 17:47, 9 August 2019 (UTC)
Not fixed [121] + [122] Headbomb {t · c · p · b} 18:46, 9 August 2019 (UTC)
it can take a second for source new code to start executing. Slower than usual today (or you are faster). AManWithNoPlan (talk) 18:52, 9 August 2019 (UTC)

archive url not removed when chapter url was removed

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 18:37, 9 August 2019 (UTC)
What happens
[123]
What should happen
[124]
We can't proceed until
Feedback from maintainers