User talk:Citation bot/Archive 19

Latest comment: 4 years ago by AManWithNoPlan in topic Discussion at Village Pump
Archive 15 Archive 17 Archive 18 Archive 19 Archive 20 Archive 21 Archive 25

Unusefull edit?

Please see this (diff) edit. Changing an empty url to chapter-url does not seem to be really useful like this, combined with the fact that the only other change with the edit was ISSN --> issn . --Redalert2fan (talk) 14:33, 26 October 2019 (UTC)

fixing minor typos and encouraging people to do the right thing does have value. But perhaps the change description should not think ISSN to issn is a remove and an add. Also, how did you activate the script since it says “other”? AManWithNoPlan (talk) 15:33, 26 October 2019 (UTC)
https://tools.wmflabs.org/citations/index.html using the process page feature. Redalert2fan (talk) 15:43, 26 October 2019 (UTC)
I should fix that! AManWithNoPlan (talk) 15:44, 26 October 2019 (UTC)
UCB_Other fix https://github.com/ms609/citation-bot/pull/2219 AManWithNoPlan (talk) 16:27, 26 October 2019 (UTC)
Overly exaggerated edit summary fix (ISSN to issn no longer called a removal and an addition) https://github.com/ms609/citation-bot/pull/2220 AManWithNoPlan (talk) 16:33, 26 October 2019 (UTC)
{{fixed}} for the better. AManWithNoPlan (talk) 22:09, 26 October 2019 (UTC)

Character � added

Status
{{fixed}}
Reported by
Redalert2fan (talk) 15:35, 26 October 2019 (UTC)
What happens
what�s
What should happen
what's
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers


That should be fixable. Odd use of a Unicode character. AManWithNoPlan (talk) 15:45, 26 October 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2223 AManWithNoPlan (talk) 22:14, 26 October 2019 (UTC)

Accept Terms and Conditions on JSTOR

Can the bot be programmed to fix this:

{{Cite web|url=https://www.jstor.org/tc/accept?origin=%2Fstable%2Fpdf%2F1835935.pdf|title=Accept Terms and Conditions on JSTOR|website=www.jstor.org|access-date=2019-08-13}}

to be changed to

{{Cite journal|jstor = 1835935|title = South Russia in the Prehistoric and Classical Period|journal = The American Historical Review|volume = 26|issue = 2|pages = 203–224|last1 = Rostovtsev|first1 = M.|year = 1921|doi = 10.2307/1835935}}

Jonatan Svensson Glad (talk) 17:31, 26 October 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2222 and https://github.com/ms609/citation-bot/pull/2221 AManWithNoPlan (talk) 18:30, 26 October 2019 (UTC)

{{fixed}}

adds another HDL

Status
new bug
Reported by
Jonatan Svensson Glad (talk) 22:05, 26 October 2019 (UTC)
What happens
Bot add |url=http://hdl.handle.net/10438/12272 when |hdl=10419/186646 exists (note differnet handles)
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Management_entrenchment&diff=923179125&oldid=906965276
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2225 AManWithNoPlan (talk) 23:51, 26 October 2019 (UTC)

Much better code now. Mostly {{fixed}} AManWithNoPlan (talk) 00:14, 27 October 2019 (UTC)
This will seal the deal https://github.com/ms609/citation-bot/pull/2226 AManWithNoPlan (talk) 00:21, 27 October 2019 (UTC)

support more RIS usages

Status
{{fixed}}
Reported by
Jonatan Svensson Glad (talk) 17:25, 26 October 2019 (UTC)
What happens
changed {{Cite web|title = Ladies of Soul on JSTOR|jstor = j.ctt2tv6sv.11}} to {{Cite book|jstor = j.ctt2tv6sv.11|title = [Part Three: Introduction]|pages = 103–105|last1 = Freeland|first1 = David|year = 2001|isbn = 9781578063314|publisher = University Press of Mississippi}}
What should happen
|chapter=[Part Three: Introduction]|title=Ladies of Soul
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Philadelphia_International_Records&diff=923144547&oldid=885625069
We can't proceed until
Feedback from maintainers


TY  - CHAP
TI  - [Part Three: Introduction]
T2  - Ladies of Soul

https://github.com/ms609/citation-bot/pull/2228 AManWithNoPlan (talk) 00:03, 28 October 2019 (UTC)

More JSTOR formats

Status
{{fixed}}
Reported by
Jonatan Svensson Glad (talk) 19:29, 27 October 2019 (UTC)
What should happen
replace |url=https://www.jstor.org/stable/pdfplus/10.2307/651152.pdf with |jstor=651152
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=French_Revolution&diff=923313734&oldid=923313461
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2227/ AManWithNoPlan (talk) 23:46, 27 October 2019 (UTC)

New handle

Status
{{fixed}}
Reported by
Jonatan Svensson Glad (talk) 20:15, 27 October 2019 (UTC)
What should happen
Replace |url=http://repository.bilkent.edu.tr/bitstream/handle/11693/49114/Monarchists_Against_Their_Monarch_the_Rightists'_Criticism_of_Tsar_Nicholas_II.pdf?sequence=1 with |hdl=11693/49114
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2227/ AManWithNoPlan (talk) 23:46, 27 October 2019 (UTC)

Are you a robot?

Status
{{fixed}}
Reported by
Jonatan Svensson Glad (talk) 01:24, 28 October 2019 (UTC)
What should happen
Blacklist |title=Bloomberg – Are you a robot?
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=User%3AJosve05a%2Fcite-sandbox&diff=prev&oldid=923357687
We can't proceed until
Feedback from maintainers


See https://en.wikipedia.org/w/index.php?search=insource%3A%2FAre+you+a+robot%2F&title=Special%3ASearch&go=Go Jonatan Svensson Glad (talk) 01:24, 28 October 2019 (UTC)

why yes we are 🤣😂. Will add to bad titles list. AManWithNoPlan (talk) 01:33, 28 October 2019 (UTC)
https://github.com/ms609/citation-bot/pull/2227 AManWithNoPlan (talk) 01:40, 28 October 2019 (UTC)

URL

Is there any way to decrypt |url=https://www.bloomberg.com/tosv2.html?vid=&uuid=367763b0-e798-11e9-9c67-c5e97d1f3156&url=L25ld3MvYXJ0aWNsZXMvMjAxOS0wNi0xMC9ob25nLWtvbmctdm93cy10by1wdXJzdWUtZXh0cmFkaXRpb24tYmlsbC1kZXNwaXRlLWh1Z2UtcHJvdGVzdA== to https://www.bloomberg.com/news/articles/2019-06-10/hong-kong-vows-to-pursue-extradition-bill-despite-huge-protest or at least blacklist extracting info from URLs with "tosv2.html"? Jonatan Svensson Glad (talk) 05:30, 28 October 2019 (UTC)

The other problem, they prevent archive bots from archiving the page, so when the page dies there will be no archive. This might be better discussed at Village Pump technical to see if anyone has ideas for decryption or determining underlying URL somehow. -- GreenC 17:51, 28 October 2019 (UTC)
https://github.com/ms609/citation-bot/pull/2228 this will block the bot from looking at them. AManWithNoPlan (talk) 17:54, 28 October 2019 (UTC)
It's BASE64 encoded. VERY easy to decode. AManWithNoPlan (talk) 17:55, 28 October 2019 (UTC)
https://github.com/ms609/citation-bot/pull/2234 AManWithNoPlan (talk) 21:40, 29 October 2019 (UTC)
{{fixed}} https://en.wikipedia.org/w/index.php?title=Tulsi_Gabbard&type=revision&diff=923664735&oldid=923664366 AManWithNoPlan (talk) 00:19, 30 October 2019 (UTC)

Caps: J Sch Nurs

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 18:19, 29 October 2019 (UTC)
What should happen
[1]
We can't proceed until
Feedback from maintainers


Editor is not an author

Status
{{fixed}} by rejecting the word editor and the whole name.
Reported by
Jonatan Svensson Glad (talk) 22:02, 16 October 2019 (UTC)
What happens
|last1=Editor
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Anti-Turkism&diff=921633541&oldid=921610799
We can't proceed until
Feedback from maintainers


This should be a blacklisted author. Jonatan Svensson Glad (talk) 22:02, 16 October 2019 (UTC)

Probably a good thing that wasn't blacklisted. That flags this citation as having bad data. (If we had simply said, don't allow "Editor" through, then the first name would be wrong, as it includes "Diplomatic".) --Izno (talk) 23:40, 16 October 2019 (UTC)

ISBN-10 and not ISBN-13 from Amazon URL

Status
{{fixed}}
Reported by
Jonatan Svensson Glad (talk) 00:21, 28 October 2019 (UTC)
What happens
Bot fetches ISBN-10 from Amazon-links
What should happen
Converting from |url=https://www.amazon.com/Travelers-Third-Reich-Fascism-1919-1945/dp/1681777827/ should give a ISBN-13 and not an ISBN-10.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Weimar_Republic&diff=prev&oldid=923349916
(Running the bot again converts |isbn=1681777827 to |isbn=978-1681777825)
We can't proceed until
Feedback from maintainers


We don’t convert old ISBN. We don’t get the year until late in the process. I need to add checking ISBN again at the end. AManWithNoPlan (talk) 00:45, 28 October 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2235 AManWithNoPlan (talk) 01:13, 30 October 2019 (UTC)

Reuters x2

Status
{{fixed}}
Reported by
Jonatan Svensson Glad (talk) 05:36, 28 October 2019 (UTC)
What happens
Bot adds |newspaper=Reuters without removing |agency=[[Reuters]]
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=User:Josve05a/cite-sandbox&diff=923383058&oldid=923383056
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2236 AManWithNoPlan (talk) 11:30, 30 October 2019 (UTC)

Ignore roman numeral 'parts' in title for title matching purposes

For instance, in doi:10.1017/S0080456800002751 the title is listed as XI.—On q-Functions and a certain Difference Operator. This should be treated as equivalent to On q-Functions and a certain Difference Operator. Headbomb {t · c · p · b} 04:33, 18 October 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2240 AManWithNoPlan (talk) 15:11, 31 October 2019 (UTC)
{{fixed}} AManWithNoPlan (talk) 15:42, 31 October 2019 (UTC)

More betterly cleaning up of garbage volumes/issues

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 05:22, 28 October 2019 (UTC)
What should happen
[2]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2241 AManWithNoPlan (talk) 15:52, 31 October 2019 (UTC)

MOS:FOREIGNTITLE violations

Status
{{fixed}} although MOS:FOREIGNTITLE does not apply to journal titles. That’s a free for all with more standards than there should be
Reported by
David Eppstein (talk) 02:00, 7 November 2019 (UTC)
What happens
French journal names (whose spelling in the original source uses sentence case) are converted to English-style capitalization where all words are capitalized
What should happen
Per MOS:FOREIGNTITLE, "Retain the style of the original for modern works."
Relevant diffs/links
Special:Diff/924964896; note change to capitalization of journals TTR : traduction, terminologie, rédaction and Etudes irlandaises. I know less about Estonian capitalization rules but I suspect that changing the capitalization of Studia humaniora Estonica was also a mistake.
We can't proceed until
Feedback from maintainers


We capitalize Latin titles normally. The Études one would have been caught if the accent was used. The other ones should have their exceptions coded. Headbomb {t · c · p · b} 02:10, 7 November 2019 (UTC)

That said, TTR capitalizes itself normally (https://www.erudit.org/fr/revues/ttr/) Headbomb {t · c · p · b} 02:10, 7 November 2019 (UTC)
there is actually conflicting rules for this on Wikipedia styles. I will add exceptions. AManWithNoPlan (talk) 02:13, 7 November 2019 (UTC)

Fix broken doi

Status
{{fixed}}
Reported by
Jonatan Svensson Glad (talk) 23:33, 7 November 2019 (UTC)
What should happen
https://en.wikipedia.org/w/index.php?title=Judith_Tonhauser&diff=925117879&oldid=925117790
We can't proceed until
Feedback from maintainers


Unrelated, but WOW! the doi and the jstor are the same, but point to different websites. And that my friends is why they are not redundant identifiers. AManWithNoPlan (talk) 00:00, 8 November 2019 (UTC)

As I understand it, theoretically at least, where the doi goes can depend on who is asking (so that if the same resource is offered by different publishers, then different readers could be directed to the ones for which they have subscriptions). Anyway, in this case the right thing to do seems obvious, but how are we to know that some crazy publisher won't put # characters into their dois? The ones with parentheses, angle brackets, colons, and semicolons are bad enough. Maybe, if we are to do this sort of processing, there should be some sanity check that the pre-fix doi is broken and the post-fix doi is not? —David Eppstein (talk) 02:08, 8 November 2019 (UTC)
we do lots of sanity checking. It’s nuts. Just added more. AManWithNoPlan (talk) 02:19, 8 November 2019 (UTC)
don’t forget DOIs with emojis in them 🤨 AManWithNoPlan (talk) 03:19, 8 November 2019 (UTC)
I wouldn't be surprised. —David Eppstein (talk) 05:57, 8 November 2019 (UTC)
The DOI always goes to the same URL for everyone on the official resolver. To get people to different URLs based on the DOI or other searches, universities use OpenURL or other resolver before the DOI.org resolution, or the publisher has its own DOI resolver after doi.org which might be doing anything (a few hundreds publishers have one and even CrossRef has no idea how many they are or what they're doing).
Additionally, Google Scholar has agreements with some universities to use/prefer their local URL resolver instead of the URL it would normally point to, for users connecting from institutional IP addresses. Hence, one might think they're clicking the "usual" publisher or GS-preferred link when they're actually clicking a link provided by the library. Nemo 07:37, 8 November 2019 (UTC)

Replacement of URL with doi-parameter causes dead-link

Status
They actually {{fixed}} it!
Reported by
Rfassbind – talk 21:52, 18 October 2019 (UTC)
What happens
replacement of |url= with |doi= renders incorrect URL
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=1680_Per_Brahe&diff=872772143&oldid=849266221
We can't proceed until
Feedback from maintainers


Springer changes their urls regularly. Which is why dois are better long. We have special Code to make sure the above does actually work. Springer link lies to us. Will add more code. AManWithNoPlan (talk) 10:51, 19 October 2019 (UTC)

I have whined to springer and crossref AManWithNoPlan (talk) 11:54, 26 October 2019 (UTC)

Thank you. Since the broken springer-URL is referenced in hundreds of articles, how long do you expect me to wait for a fix to happen before I start restoring the original URL myself? Rfassbind – talk 04:41, 29 October 2019 (UTC)
Publishers are slow dinosaurs. I would wait at least a month before starting to worry. Nemo 16:03, 29 October 2019 (UTC)

Caps: Geologiska Föreningen i Stockholm Förhandlingar

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 18:54, 9 November 2019 (UTC)
What should happen
[3]
We can't proceed until
Feedback from maintainers


The Swedish word i is like the English word in. Jonatan Svensson Glad (talk) 20:44, 9 November 2019 (UTC)
a quick check of Reddit reveals Sweden does not exist https://www.reddit.com/r/finlandConspiracy/comments/8jceqb/finnish_propaganda_trying_to_get_us_to_think/?utm_source=amp&utm_medium=&utm_content=comments_view_all 🙄🙄🙄🙄🙄🙄. I will work on this. AManWithNoPlan (talk) 01:03, 10 November 2019 (UTC)
Nah, that's Norway that has dissapeared. Jonatan Svensson Glad (talk) 03:14, 10 November 2019 (UTC)
https://github.com/ms609/citation-bot/pull/2246 AManWithNoPlan (talk) 16:53, 11 November 2019 (UTC)

Deliberate reference to review database erroneously converted into reference to the article it reviews

Status
ALL attempts to use MR data disabled—not sure if it will ever be turned back on given its propensity for having DOIs for the reviewed work. {{fixed}}
Reported by
David Eppstein (talk) 18:34, 13 November 2019 (UTC)
What happens
[4]
What should happen
Nothing
We can't proceed until
Feedback from maintainers


Don't provide the DOI/JSTOR then, since those are about the article it reviews, and not the review. Headbomb {t · c · p · b} 00:04, 14 November 2019 (UTC)
I didn't provide them. They were added erroneously by Citation bot in a pass a week earlier that I didn't catch until now [5].
That's the real bug then. Bot shouldn't add non-MR identifiers if |journal=Mathematical Reviews. Headbomb {t · c · p · b} 00:17, 14 November 2019 (UTC)
Or MathSciNet, or [[Mathematical Reviews]], or Zentrallblatt, or who knows how many other ways people might choose to write the same thing or how many other non-mathematical ones there might be that I haven't heard of. Instead of adding special rules like that, how about noticing that the journal name and author are totally different from the article the added data is for and not changing citations that violate those expectations? —David Eppstein (talk) 00:22, 14 November 2019 (UTC)
I'm sort of wondering if {{cite journal}} is appropriate for use with |mr= in this application. The link created by |mr=870473 links to this thing that MathSciNet calls Relay Station. At the Relay Station, readers get bibliographic detail for a journal article and if available, a link to the online article via doi or whatever. There isn't any bibliographic detail there for the review which, I presume, is linked through the 'Username/Password Subscribers access MathSciNet here' link (it has the same identifier value). Perhaps this is a case for a {{cite mr}} template that accepts and requires only |mr= as an identifier along with the typical reviewing author, review title, review date, etc bibliographic details and links to the login page instead of the Relay Station; |mr= used in any other cs1|2 template continues to act as it does now.
Trappist the monk (talk) 00:51, 14 November 2019 (UTC) 12:45, 14 November 2019 (UTC) (withdrawn)
Wouldn't it be convenient if you could wish away your bugs by making other people do the work of choosing and using different templates for the buggy cases. Perhaps you are unaware, but for people with access the MR link goes to an actual published review, not just the relay thing that the unsubscribed see. It is more or less the same as for most subscription-only dois: people marked as subscribers by their IP address or cookies see the full content, and everyone else gets a weaker substitute. Also, reviews from that time period were published in a physical journal, titled Mathematical Reviews. It is only later that they were converted into database entries in the MathSciNet database. That's why the abbreviation is "MR". As such, they are content published in a journal, {{cite journal}} is appropriate, and |mr= is the correct way to link to the review. Also, the citations in question actually used {{citation}} so another cite-series template wouldn't have been appropriate. There is no login link that we should be directing people to instead, and your assumption that there is a different place to find the review, that should be linked differently, is false. —David Eppstein (talk) 01:11, 14 November 2019 (UTC)
PS sometimes the meta goes to even deeper levels. Here's an MR entry containing a review by Albert C. Lewis of a review by Victor Pambuccian of a book of Hilbert's lectures: MR3155342. —David Eppstein (talk) 01:29, 14 November 2019 (UTC)
David, the time has come for you to read those reviews and write a reply letter and add a level of meta. Added bonus points for deliberately sneaking and subtle error into your letter to leave open the possibility of a reply article. AManWithNoPlan (talk) 12:16, 14 November 2019 (UTC)
I never meta-joke that I didn't like. XOR'easter (talk) 17:13, 14 November 2019 (UTC)

Adds weird journal for some IEEE conferences

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 03:33, 16 November 2019 (UTC)
What happens
[6]
What should happen
Not that
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2248 AManWithNoPlan (talk) 19:03, 16 November 2019 (UTC)

Caps: RTÉ

Status
{{fixed}}
Reported by
Jonatan Svensson Glad (talk) 21:36, 19 November 2019 (UTC)
What happens
Rté News
What should happen
RTÉ News
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Apple_TV%2B&diff=prev&oldid=927021304
We can't proceed until
Feedback from maintainers


Wrong authors for cite arXiv

Status
{{fixed}}
Reported by
176.61.33.191 (talk) 11:49, 23 November 2019 (UTC)
What happens
cite arXiv got the wrong authors
Relevant diffs/links
see https://en.wikipedia.org/w/index.php?title=Three-body_problem&diff=925658666&oldid=925544950 (the final change) and the correct author list at https://arxiv.org/abs/1910.07291
We can't proceed until
Feedback from maintainers


Likely a case of bad metadata. Headbomb {t · c · p · b} 14:21, 23 November 2019 (UTC)
arXiv changed their API. https://github.com/ms609/citation-bot/pull/2252 AManWithNoPlan (talk) 15:24, 23 November 2019 (UTC)
we will no longer use the multi search API. AManWithNoPlan (talk) 15:25, 23 November 2019 (UTC)

Bad publisher data from archive.org

Status
{{fixed}}
Reported by
Jonatan Svensson Glad (talk) 23:04, 23 November 2019 (UTC)
What happens
Bot adds both location and publisher in |publisher=, seperated with  :
What should happen
Multiple things:
  1. Add them as seperate parameters
    • Alternativly don't fetch this data from archive.org
  2. Blacklist  : in publisher data from archive.org
  3. Clean up current bad parameters where a [place][space]:[space][publisher] is found in |publisher= is the cite template has a link to archive.org
We can't proceed until
Feedback from maintainers


Internet Archive metadata is highly variable in format, completeness and reliability. I'd be super cautious about dumping their metadata into Wikipedia. -- GreenC 23:46, 23 November 2019 (UTC)
We are very selective about what we take from them. Adding this soon: https://github.com/ms609/citation-bot/pull/2254 AManWithNoPlan (talk) 21:32, 24 November 2019 (UTC)

Example publisher data from this book:

New York : Macmillan ; London : Collier Macmillan

One of many variations. The publisher can also appear within multiple locations on the page. This is basic code for extracting from the HTML, but I know there are other variety it misses.

  # itemprop="publisher">New York : Viking</span>
  if match(fp, "(?i)itemprop[ ]*[=][ ]*\"[ ]*publisher[ ]*\"[ ]*[>][^<]*[^<]", dest) > 0:
    gsub("(?i)itemprop[ ]*[=][ ]*\"[ ]*publisher[ ]*\"[ ]*[>]", "", dest)
    addKeyPairValue(iaTable, id, "IApub", strip(dest) )

  # >dc.publisher: Longmans Green And Co. Bombay<
  elif match(fp, "(?i)[>][ ]*dc[.]publisher[ ]*[:][^<]*[^<]", dest) > 0:
    gsub("(?i)[>][ ]*dc[.]publisher[ ]*[:]", "", dest)
    addKeyPairValue(iaTable, id, "IApub", strip(dest) )

  # >Publisher: The Clarendon Press; Oxford; 1909<
  elif match(fp, "(?i)[>][ ]*publisher[ ]*[:][^<]*[^<]", dest) > 0:
    gsub("(?i)[>][ ]*publisher[ ]*[:]", "", dest)
    addKeyPairValue(iaTable, id, "IApub", strip(dest) )

-- GreenC 23:40, 24 November 2019 (UTC)

Caps: vir

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 06:17, 24 November 2019 (UTC)
What should happen
[7]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2253 AManWithNoPlan (talk) 21:27, 24 November 2019 (UTC)

Better ieeexplore support

Status
{{wontfix}} IEEE is too opaque
Reported by
Headbomb {t · c · p · b} 05:36, 15 November 2019 (UTC)
What should happen
[8]
We can't proceed until
Feedback from maintainers


This was achieved by replacing the ieeexplore.org url with the doi found on the corresponding ieeexplore.org page. Headbomb {t · c · p · b} 05:36, 15 November 2019 (UTC)

Sometimes it's possible to find the DOI from CrossRef or derivatives, looking for an URL which ends in "arnumber=8386824" or a DOI which ends in "8386824" (in the example). Nemo 08:08, 15 November 2019 (UTC)
If not actually parsing the page to search for the doi on the page, then make sure that the prefix is 10.1109 for IEEE journals. Headbomb {t · c · p · b} 09:16, 15 November 2019 (UTC)
IEEE takes pride in blocking bots. Sometimes we work sometimes we don’t. I will investigate reverse lookup of url in crossref. AManWithNoPlan (talk) 16:58, 15 November 2019 (UTC)

Removes URL for IUCN Red List

Status
{{notabug}}
Reported by
Umimmak (talk) 21:53, 20 November 2019 (UTC)
What happens
Bot removes URL when there is a DOI for IUCN Red List citations despite it being recommended to include both. Unfortunately, both the new DOI-based URLs and the old ID-based URLs are problematic. A DOI links to a permanent web page with a specific year's assessment that will never be updated, so when a new assessment is issued, a new DOI will be created and the old one will then point to the previous assessment. An ID-based URL should always link to the current assessment, but that URL is not guaranteed to work indefinitely. Thus, it is probably best to use both, and to use the ID-based URL if only one URL will be used.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Ba_humbugi&curid=55945782&diff=927146556&oldid=915761684
We can't proceed until
Feedback from maintainers


No, the static page is best, per WP:SAYWHEREYOUGOTIT, and per the information listed at the redlist at the time it was cited. If you follow the 'old' link, the page will mention there is an update, so if you need the updated information, you can check it then. Headbomb {t · c · p · b} 00:26, 21 November 2019 (UTC)
This was discussed at length before (example). I still didn't get confirmation of whether it's true that IUCN reuses the DOI for significantly different documents (i.e. that an assessment can change content without a new assessment being released, and that this results in a new ID in the URL but not a new DOI). Nemo 09:26, 21 November 2019 (UTC)
Let us continue this discussion, but in the mean time https://github.com/ms609/citation-bot/pull/2264 AManWithNoPlan (talk) 17:05, 26 November 2019 (UTC)
The DOIs are static. When there's a new version, it has a new doi, as can be seen in 10.2305/IUCN.UK.2012.RLTS.T195519A2383117.en. Headbomb {t · c · p · b} 18:38, 26 November 2019 (UTC)

Access date removal bug

Status
{{notabug}}
Reported by
Mark Schierbecker (talk) 05:39, 22 November 2019 (UTC)
What happens
archiveurl parameter not treated as url
What should happen
No edit needed when archiveurl specified
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Rock_Hill,_Missouri&curid=123224&diff=927371471&oldid=923378647
We can't proceed until
Feedback from maintainers


So annoying when parameters are used wrong. AManWithNoPlan (talk) 11:51, 22 November 2019 (UTC)

The template also treats the access-date as wrong. Will look at fixing bad templates, but this is not a bug. AManWithNoPlan (talk) 16:53, 26 November 2019 (UTC)

JSTOR book meta data

Status
{{fixed}}
Reported by
Jonatan Svensson Glad (talk) 22:51, 23 November 2019 (UTC)
What happens
Multiple things
  1. Adding a |chapter= despite not being an actual chapter
  2. Adding a |chapter= which is included in the |title=
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=User%3AJosve05a%2Fcite-sandbox&diff=prev&oldid=927656726
We can't proceed until
Feedback from maintainers


What I see is this. So, they use the Chapter field for secondary sub-title when doing books. Will work on.

TY  - BOOK
TI  - Benevolent Assimilation
T1  - The American Conquest of the Philippines, 1899-1903

AManWithNoPlan (talk) 22:45, 25 November 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2258 AManWithNoPlan (talk) 22:53, 25 November 2019 (UTC)

Fails to convert a JSTOR

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 04:44, 25 November 2019 (UTC)
What should happen
[9]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2255 AManWithNoPlan (talk) 22:25, 25 November 2019 (UTC)

Remove soft hyphens

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 05:50, 25 November 2019 (UTC)
What should happen
[10]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2257 AManWithNoPlan (talk) 22:41, 25 November 2019 (UTC)

Series: Advances in Pharmacology

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 14:59, 25 November 2019 (UTC)
What should happen
[11]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2256 AManWithNoPlan (talk) 22:28, 25 November 2019 (UTC)

Series: Inorganic Syntheses

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 07:42, 26 November 2019 (UTC)
What happens
[12]
What should happen
[13]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2262 AManWithNoPlan (talk) 12:14, 26 November 2019 (UTC)

ZooKeys issues

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 07:53, 12 November 2019 (UTC)
What happens
[14]
What should happen
Same + [15]
We can't proceed until
Feedback from maintainers


ZooKeys is like that. You can safely TNT |issue= every time for those. Headbomb {t · c · p · b} 07:53, 12 November 2019 (UTC)

What makes you think zookeys is unique with issue=1 data entry error, of are you just saying that since Zookeys has no volumes it is very unlikely to be correct? AManWithNoPlan (talk) 12:34, 16 November 2019 (UTC)
ZooKeys has issues, no volumes. Whenever you have a volume for ZooKeys, the bot should discard volume/issue/pages and re-populate the fields. Or something to that effect. Headbomb {t · c · p · b} 15:48, 16 November 2019 (UTC)
"we already blow away volumes." The issue isn't that you are not blowing away volumes, but rather that when you blow volumes, you should also blow issues. Otherwise you remove volumes, and more often than not leave an erroneous volume in. Headbomb {t · c · p · b} 20:55, 26 November 2019 (UTC)
https://github.com/ms609/citation-bot/pull/2266 AManWithNoPlan (talk) 21:44, 26 November 2019 (UTC)

Need to run twice?

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 16:55, 16 November 2019 (UTC)
What happens
[16]+[17]
What should happen
Should happens in the same edit
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2265 AManWithNoPlan (talk) 20:48, 26 November 2019 (UTC)

Don’t remove rubbish URLs if someone grabbed an archive of it

Status
{{fixed}}
Reported by
Djm-leighpark (talk) 12:03, 24 November 2019 (UTC)
What happens
Leaves red text of broken syntax
Relevant diffs/links
[18]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2263 AManWithNoPlan (talk) 16:49, 26 November 2019 (UTC)

Caps: NeuroReport

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 07:09, 27 November 2019 (UTC)
What should happen
[19]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2265 AManWithNoPlan (talk) 12:56, 27 November 2019 (UTC)

Fails to add class to cite arxiv

Status
{{notabug}} since it works now
Reported by
Headbomb {t · c · p · b} 23:56, 29 November 2019 (UTC)
What should happen
[20]
We can't proceed until
Feedback from maintainers


Replaced ProQuest URL with ID field, leaving {{cite web}} with no URL field

Status
{{fixed}}
Reported by
Logan Talk Contributions 00:52, 30 November 2019 (UTC)
What happens
It replaced the ProQuest URL in the URL field {{cite web}} in Digital distribution with the corresponding identifier in the ID field, which left the usage of the template in a broken (error message) state, since it requires a URL.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Digital_distribution&diff=prev&oldid=928548054
We can't proceed until
Feedback from maintainers


True. Should change template type also. AManWithNoPlan (talk) 02:04, 30 November 2019 (UTC)

doubly true since cite web was wrong to begin with. AManWithNoPlan (talk) 02:06, 30 November 2019 (UTC)
https://github.com/ms609/citation-bot/pull/2269 AManWithNoPlan (talk) 14:13, 30 November 2019 (UTC)

Remove redundant ingentaconnect.com/content/

Status
{{fixed}}
Reported by
Nemo 12:47, 30 November 2019 (UTC)
What happens
ingentaconnect.com/content/ is not removed because it 404s. All URLs under this path are simply an aggregator/syndicated database and never add anything to the DOI, so the should always removed if the citation has a DOI. Most of them are also dead, presumably because the licenses from the publisher to the redistributor have expired. (This kind of databases fell out of fashion in the early 2000s.)
What should happen
special:diff/928606820
Relevant diffs/links
special:diff/928606820
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2268 AManWithNoPlan (talk) 13:59, 30 November 2019 (UTC)

support existing others

Don’t add more AManWithNoPlan (talk) 16:58, 29 November 2019 (UTC)

Don't add more what? Headbomb {t · c · p · b} 17:32, 29 November 2019 (UTC)
https://github.com/ms609/citation-bot/pull/2270 AManWithNoPlan (talk) 15:44, 30 November 2019 (UTC)
{{fixed}} AManWithNoPlan (talk) 19:37, 1 December 2019 (UTC)

lww.com is redundant with ovid.com

Status
{{fixed}}
Reported by
Nemo 11:58, 1 December 2019 (UTC)
What happens
A couple thousand redundant links to journals.lww.com URLs do not get removed because the DOI points to their alternative URL at ovid.com, for instance doi:10.1097/LGT.0b013e3181af30ef goes to [21] instead of the ancient [22].
What should happen
special:diff/928750782
Relevant diffs/links
special:diff/908979799
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2272 AManWithNoPlan (talk) 23:21, 1 December 2019 (UTC)

meta.wkhealth.com is dead

Status
{{fixed}}
Reported by
Nemo 12:03, 1 December 2019 (UTC)
What happens
Links to meta.wkhealth.com, like [23], give an ERR_CONNECTION_RESET and are not removed (in addition they are probably slowing down any citation bot run which stumbles upon one).
What should happen
Remove all such URLs inside citation templates.
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2272 AManWithNoPlan (talk) 23:21, 1 December 2019 (UTC)

Missed a DOI expansion when not already in a template, just wrapped in ref tags

Status
new bug
Reported by
Headbomb {t · c · p · b} 12:26, 30 November 2019 (UTC)
What should happen
[24]
We can't proceed until
Feedback from maintainers


That is not in CrossRef database oddly. AManWithNoPlan (talk) 14:00, 30 November 2019 (UTC)

Weird, shoving it in a {{cite journal}} with |doi= gave [25] the first time, and I had to add the title here. Headbomb {t · c · p · b} 14:14, 30 November 2019 (UTC)
maybe it’s back? Crossref is not perfect? AManWithNoPlan (talk) 15:18, 30 November 2019 (UTC)
A plain <ref>https://doi.org/10.1023/A:1008280705142</ref> still won't expand. So wondering if something's possible, like shove in {{cite journal |doi=10.1023/A:1008280705142}} or {{cite journal |url=https://doi.org/10.1023/A:1008280705142}} before trying to expand. Headbomb {t · c · p · b} 15:22, 30 November 2019 (UTC)
{{notabug}} we require some type of title to be found for a plain url to be replaced. AManWithNoPlan (talk) 01:05, 3 December 2019 (UTC)

GIGO with named references

Status
new bug
Reported by
Mikeblas (talk) 00:33, 3 December 2019 (UTC)
What happens
Citation bot causes a duplicate reference error after editing the page
What should happen
Bots should do no harm -- they should not create errors in pages
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Kipunada&type=revision&diff=928662233&oldid=922763185&diffmode=source
Replication instructions
Looks like Citation bot modified a reference that was identically defined in the article and a template it used. When Citation bot made the modification to the reference definition, it didn't change the name of the reference, giving the reference a conflicting definition with the reference of the same name in the template. This resulted in big red error text in the "References" section of the rendered article: "Cite error: The named reference "KKSK" was defined multiple times with different content (see the help page)". While it's not great to have identical reference definitions, Wikipedia allows it and seems unlikely to change. Either Citation bot should handle this situation correctly (by renaming the definition it changes) or should avoid making changes.
We can't proceed until
Feedback from maintainers


{{notabug}} References from included page messes up things AManWithNoPlan (talk) 00:57, 3 December 2019 (UTC)

I don't think "notabug" is a correct evaluation. Before Citation bot edited the page, it had no errors. After Citation bot edited the page, it was in worse shape, showing user-visible red error text in the references section when the was none before. "Garbage in" also mis-characterizes the situation and I think demonstrates a lack of willingness to consider the situation fully. If we were to stipulate that the input was "garbage", then we should expect an autoamted process to either reject that input, not make things worse, or repair the bad input directly. -- Mikeblas (talk) 02:33, 3 December 2019 (UTC)

Strip Bloomberg URL

Status
{{fixed}}
Reported by
Jonatan Svensson Glad (talk) 13:15, 1 December 2019 (UTC)
What should happen
Remove ?utm_source=google&utm_medium=bd&cmpId=google from |url=https://www.bloomberg.com/news/articles/2019-10-03/trump-s-story-of-hunter-biden-s-chinese-venture-is-full-of-holes?utm_source=google&utm_medium=bd&cmpId=google
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2273 AManWithNoPlan (talk) 21:10, 2 December 2019 (UTC)

Caps: USGS WRIR

Status
{{fixed}}
Reported by
Nemo 17:26, 3 December 2019 (UTC)
What should happen
Keep "USGS WRIR" uppercase?
Relevant diffs/links
special:diff/929098120
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2275 AManWithNoPlan (talk) 19:32, 3 December 2019 (UTC)

author link and inventive editors

Status
{{fixed}} . now ignores these
Reported by
Trappist the monk (talk) 13:56, 4 December 2019 (UTC)
What happens
|author=[[Ian Freckelton{{!}}Freckelton, Ian]]|author=Ian Freckelton{{!}}Freckelton, Ian |author-link=Ian Freckelton{{!}}Freckelton, Ian
which gives:
Ian Freckelton|Freckelton, Ian (1 November 2005). "Madhouse: A Tragic Tale of Megalomania and Modern Medicine (Book review)". Psychiatry, Psychology and Law. 12 (2): 435–438. doi:10.1375/pplt.12.2.435. {{cite journal}}: Check |author1-link= value (help)
What should happen
Perhaps nothing. The citation worked before the bot edit. It is ok for |author= to be linked; the primary purpose of |author-linkn= is to link |lastn= / |firstn= pairs (this also applies to |editor=, |translator=, ...)
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2279 AManWithNoPlan (talk) 15:35, 4 December 2019 (UTC)

Leading zero in IEEE document numbers

Status
{{fixed}}
Reported by
Nemo 17:52, 4 December 2019 (UTC)
What happens
ieeexplore.ieee.org URLs are converted from an URL parameter format to a /document/N format, without changing the integer. An URL with leading zero partially fails to load with this new format. This may explain why it's not removed even if it's redundant.
What should happen
Remove leading zero?
Relevant diffs/links
special:diff/929255457
We can't proceed until
Feedback from maintainers


I'll remove those 39 broken URLs later if nobody beats me at it. Nemo 17:55, 4 December 2019 (UTC)
https://github.com/ms609/citation-bot/pull/2279 AManWithNoPlan (talk) 19:54, 4 December 2019 (UTC)

Dotted year cleanup

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 01:21, 5 December 2019 (UTC)
What should happen
[26]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2279 AManWithNoPlan (talk) 12:09, 5 December 2019 (UTC)

Series: Advances in Enzymology and Related Areas of Molecular Biology

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 11:59, 5 December 2019 (UTC)
What happens
[27]
What should happen
[28]
We can't proceed until
Feedback from maintainers


This is possibly caused by the hyphen difference in Advances in Enzymology and Related Areas of Molecular Biology and Advances in Enzymology - and Related Areas of Molecular Biology. Headbomb {t · c · p · b} 11:59, 5 December 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2279 AManWithNoPlan (talk) 12:08, 5 December 2019 (UTC)

Regular expression failure when extracting Templates

Status
I managed to {{fixed}} it
Reported by
Redalert2fan (talk) 22:54, 5 December 2019 (UTC)
What happens
! Regular expression failure in McDonnell Douglas F-15E Strike Eagle when extracting Templates
Replication instructions
Run via web form on McDonnell Douglas F-15E Strike Eagle.
We can't proceed until
Feedback from maintainers


Cannot perfectly fix since it seems to be out of memory bug, but I have an idea. AManWithNoPlan (talk) 15:43, 6 December 2019 (UTC)

Converts volumes to issues for books

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 23:33, 5 December 2019 (UTC)
What happens
[29]
What should happen
This should only be done for journals
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2281 AManWithNoPlan (talk) 15:45, 6 December 2019 (UTC)

PMID website changing

Might want to have a look at this post and see if anything needs to change in Citation bot. Whatamidoing (WMF) maybe also for Citoid change? --Izno (talk) 03:36, 6 December 2019 (UTC)

This for now. https://github.com/ms609/citation-bot/pull/2281 AManWithNoPlan (talk) 15:30, 6 December 2019 (UTC)
{{fixed}} for now. AManWithNoPlan (talk) 20:49, 6 December 2019 (UTC)

Removal of html comments

Why does the bot remove html comments from references, as here? – Uanfala (talk) 12:45, 6 December 2019 (UTC)

I don't know, but isn't it unusual to have the comment "inside" the parameter name? Usually it's at the end of a parameter content (i.e. before the next pipe). Nemo 12:54, 6 December 2019 (UTC)
Sometimes horribly setup references do lead to such problems. {{notabug}}, since so rare. AManWithNoPlan (talk) 15:17, 6 December 2019 (UTC)
See for instance special:diff/929597433 where the comment to the non-empty parameter was left. Nemo 22:04, 6 December 2019 (UTC)

orphans |chapter-url-access=

Status
{{fixed}}
Reported by
Trappist the monk (talk) 14:59, 6 December 2019 (UTC)
What happens
removes |chapter-url= but fails to remove |chapter-url-access=; also fails to remove |access-date=
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2281 AManWithNoPlan (talk) 15:30, 6 December 2019 (UTC)

Also remove empty month and day when date is set

Status
{{fixed}}
Reported by
Redalert2fan (talk) 20:22, 6 December 2019 (UTC)
What happens
year= was removed, while empty month= and day= were left when date=2019-08-22 was set
What should happen
also remove empty month= and day=
Relevant diffs/links
[30]
We can't proceed until
Feedback from maintainers


I was about to report this myself. Headbomb {t · c · p · b} 20:32, 6 December 2019 (UTC)
https://github.com/ms609/citation-bot/pull/2284 AManWithNoPlan (talk) 20:53, 6 December 2019 (UTC)

Bad title

Status
{{fixed}}
Reported by
Redalert2fan (talk) 20:34, 6 December 2019 (UTC)
What happens
title = Aanmelden of registreren om te bekijken
What should happen
do not add this title
Relevant diffs/links
[31]
We can't proceed until
Feedback from maintainers


Translates to: Log in or register to view (from Dutch). Redalert2fan (talk) 20:34, 6 December 2019 (UTC)

Apart from Facebook not being a good ref to use, this might come up on other dutch sites. Redalert2fan (talk) 20:36, 6 December 2019 (UTC)
https://github.com/ms609/citation-bot/pull/2283 AManWithNoPlan (talk) 20:48, 6 December 2019 (UTC)

Japanese characters in title

Status
{{fixed}}
Reported by
Redalert2fan (talk) 20:49, 6 December 2019 (UTC)
What happens
title= �$B=w$N9a$j!C�(BTBS�$B%F%l%S�(B
What should happen
title = 女の香り|TBSテレビ
Relevant diffs/links
[32]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2285 AManWithNoPlan (talk) 21:16, 6 December 2019 (UTC)

International Astronomical Union Circular

Status
{{fixed}}
Reported by
Trappist the monk (talk) 12:22, 7 December 2019 (UTC)
What happens
converts {{cite web}} to {{cite journal}}; iauc is not really a journal but it is a periodical so perhaps better to convert to {{cite periodical}} as here; no |volume= for iauc and only one of |issue= or |number=; I don't really know if Green is the author of the circular items or is more an editor but editor seemed to me a better choice.
What should happen
diff
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers


Cite journal is fine. Headbomb {t · c · p · b} 13:25, 7 December 2019 (UTC)
https://github.com/ms609/citation-bot/pull/2287 AManWithNoPlan (talk) 21:24, 7 December 2019 (UTC)

Remove search.serialssolutions.com proxy links

Status
{{fixed}}
Reported by
Nemo 16:55, 7 December 2019 (UTC)
What happens
We only have about a hundred pages with links to search.serialssolutions.com, but they don't seem to add any value. They're like proxy links, for instance [33] which asks credentials for https://library.nd.edu.au . Possibly not worth making a specific rule or anything, but if it fits in the list of proxies it might be ok.
What should happen
Removing the links?
Relevant diffs/links
special:diff/929430593 (link to bv8ja7kw5x.search.serialssolutions.com in cite journal without DOI is not removed)>
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2287 AManWithNoPlan (talk) 21:23, 7 December 2019 (UTC)

Remove broken www.informaworld.com/smpp/ when redundant

Status
{{fixed}}
Reported by
Nemo 19:42, 7 December 2019 (UTC)
What happens
All the www.informaworld.com/smpp/ are left present even if they redirect to a useless frontpage of T&F. They should be removed by citation bot when a DOI or other identifier is present. (Other bots will need to take care of the more incomplete citations.)
What should happen
Special:Diff/929720285
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2287 AManWithNoPlan (talk) 21:23, 7 December 2019 (UTC)

Remove broken www.sciencedirect.com/science when redundant

Status
{{fixed}}
Reported by
Nemo 19:45, 7 December 2019 (UTC)
What happens
As above, every http://www.sciencedirect.com/science?_ob URL (notice the URL parameter) is broken and redirects to an error page. There is no need to check that it corresponds to the DOI before removing it, unlike with the /science/article/pii/ etc. URLs.
What should happen
special:diff/929191727
Relevant diffs/links
special:diff/929191727
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2287 AManWithNoPlan (talk) 21:23, 7 December 2019 (UTC)

Thanks. I've not checked the code in depth, should perhaps the "?" be escaped if that's used in a regex? Nemo 21:59, 7 December 2019 (UTC)

Caps: SCH, JPN

Status
mostly {{fixed}}
Reported by
Headbomb {t · c · p · b} 04:02, 9 December 2019 (UTC)
What should happen
[34] (likewise for JPN, although I don't have a diff right now)
We can't proceed until
Feedback from maintainers


Probably a good idea to leave all SCH and JPN alone, either way. Headbomb {t · c · p · b} 04:02, 9 December 2019 (UTC)

Do not set missing article title to journal name for Google Books

Status
{{fixed}}
Reported by
Worldbruce (talk) 12:31, 9 December 2019 (UTC)
What happens
Bot adds |title=Indian Journal of Linguistics to a {{cite journal}} which already has |journal=Indian Journal of Linguistics and has a Google Books url.
What should happen
Don't add a title to {{cite journal}} unless you have a way to know the actual article title.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Bengali_dialects&diff=929970423&oldid=929912235
We can't proceed until
Feedback from maintainers


Caps: Biochimica et Biophysica Acta (BBA)

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 12:56, 9 December 2019 (UTC)
What happens
[35]
What should happen
[36]
We can't proceed until
Feedback from maintainers


ProQuest Dissertations Publishing

Status
{{fixed}}
Reported by
Nemo 07:50, 9 December 2019 (UTC)
What happens
"ProQuest Dissertations Publishing" is not a publisher (the university/whatever is), just a syndicator (and an agrgegator among many, like https://theses.ai/ , http://www.dart-europe.eu/ , https://www.base-search.net/ ). It may be removed from |publisher= and maybe even |via=, in {{cite thesis}} too. There are only a hundred or so cases though, so feel free to wontfix.
What should happen
special:diff/929950066
We can't proceed until
Feedback from maintainers


Bad title

Status
{{fixed}}
Reported by
Redalert2fan (talk) 01:06, 11 December 2019 (UTC)
What happens
title= You are being redirected
Relevant diffs/links
[37]
We can't proceed until
Feedback from maintainers


Erroneous move of publication-place to location

Status
{{notabug}} unless the discussion elsewhere says that the standard has change. If so, please come back here.
Reported by
Martin of Sheffield (talk) 09:24, 3 December 2019 (UTC)
What happens
It is still changing "publication-place" to "location" whereas Template:Citation §3.5.6 shows that using "location" is supported as a fall-back, the correct parameter is "publication-place". In effect the bot is downgrading the citation.
Relevant diffs/links
Diff here
We can't proceed until
Feedback from maintainers


You're right that Template:Citation currently lists "publication-place" first when naming the variants of this parameter, however I don't see any specific discussion of whether or why it should be preferred. Help:Citation_Style_1#Work_and_publisher quite clearly prefers "location" and Help:Citation Style 2 doesn't list "publication-place" among the intended differences, so it's not unreasonable to interpret that "location" is preferred or acceptable for both classes of templates.
If the consensus is different, of course, I'm sorry for the mistake. I found some 2007 and 2010 discussions indicating that "publication-place" was there earlier parameter and "location" was added later, and various discussions where the was some confusion about place of publication vs. place inside the work, but as recently as 2014 a need for documentation was expressed, specifically with regard to the unclear nature of "publication-place". The discussion went on to other topics but the lack of a clear answer back then suggests that there is no specific consensus preferring this form of the parameter over another, otherwise someone would have mentioned it immediately. I don't know if some other discussion happened more recently which made "publication-place" preferred. Nemo 11:05, 3 December 2019 (UTC)
I take your point about the two help pages, but when using a template I normally go to the template's documentation as authoritative. There is a subtle distinction between the two parameters:
  1. written at Haydock, "Terrible colliery explosion", Monmouthshire Merlin, Monmouth: William Christopher, 14 June 1878, retrieved 3 December 2019
  2. "Terrible colliery explosion", Monmouthshire Merlin, Monmouth: William Christopher, 14 June 1878, retrieved 3 December 2019
  3. "Terrible colliery explosion", Monmouthshire Merlin, Haydock: William Christopher, 14 June 1878, retrieved 3 December 2019
Citation (1) has both a location where the report was written, and a publication place where the newspaper went to press. In (2) the location has been deleted, and the citation formats correctly. In (3) the publication place has been omitted, and the location now is assumed to be the publication place, which in this case it isn't. Sorry I can't expand further, I've just had a text calling me away. Back tomorrow evening. Martin of Sheffield (talk) 13:23, 3 December 2019 (UTC)
The correct location to seek a clarification is Help talk:CS1, not to request the bot to do X/Y/Z. The template documentation is more-or-less not authoritative, especially when there are apparently ambiguities, though you might prefer otherwise. --Izno (talk) 13:27, 3 December 2019 (UTC)
related conversation started: Help talk:Citation Style 1 § publication-place, place, or location and their proper use
Trappist the monk (talk) 14:38, 3 December 2019 (UTC)
Actually I'm requesting that the bot DOES NOT undo the work of editors, not that it DOES anything. If you think the documentation for the template is wrong, perhaps that is the place to take the discussion and ask the template maintainers to modify the template? Would you like me to raise the issue there for you? Regards, Martin of Sheffield (talk) 23:06, 4 December 2019 (UTC)
Ttm started the discussion for you as to the validity of the |publication-place= parameter in toto. Please feel free to participate. --Izno (talk) 01:20, 5 December 2019 (UTC)
I've suggested a change to the documentation for {{citation}} to align it with your preferences. BTW, I didn't see anything there from TTM, isn't template talk:citation the correct place for documentation errors? Probably best to close off this bug report if it is the documentation at citation that is the problem. Regards, Martin of Sheffield (talk) 11:39, 6 December 2019 (UTC)
Umm, I answered that discussion at Template talk:Citation § Publication-place, location and Citation bot. Because I had already started another discussion about |publication-place=, |location=, and |place= at Help talk:Citation Style 1 § publication-place, place, or location and their proper use (mentioned in my post above) I think that the discussion at Template talk:Citation should be closed and made part of the earlier discussion at WT:CS1.
Trappist the monk (talk) 19:41, 8 December 2019 (UTC)

ProQuest

Status
{{notabug}}, please take conversation to https://en.wikipedia.org/wiki/Template_talk:ProQuest
Reported by
Keith D (talk) 18:47, 8 December 2019 (UTC)
What happens
Conversion of a URL to a {{ProQuest}} call looses the information from |url-access=
What should happen
Should retain information that subscription is required for the link.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Tickton&curid=2951971&diff=929806948&oldid=921797064
We can't proceed until
Feedback from maintainers


Not a bug; |url-access= applies to |url= which links |title=. Identifiers (|id= in this case) do not link |title= so |url-access= would be misapplied. Additionally, sources linked through identifiers are presumed to lie behind some sort of paywall / registration barrier. cs1|2 does not highlight the norm so retaining some sort of subscription information would be contrary to the way cs1|2 works.

Trappist the monk (talk) 19:35, 8 December 2019 (UTC)

also, there is work (or at least talk of work) of having the ProQuest template automatically rewrite URLs based upon being at a library. AManWithNoPlan (talk) 21:07, 8 December 2019 (UTC)
Based on some JavaScript, I suppose? HTML is cached and the same for all (unregistered) users, templates can't do much. Nemo 06:31, 9 December 2019 (UTC)
It's rare (and good) for url-access to be compiled in relation to ProQuest URLs, although it should be. If the issue is the loss of the lock icon, maybe we can continue at Template talk:ProQuest: I think it's never open access, so maybe we can just add the lock by default? Nemo 06:31, 9 December 2019 (UTC)

Treat en/em dashes as equivalent to hyphens for purpose of title matching

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 13:44, 11 December 2019 (UTC)
Relevant diffs/links
In [38], I had to TNT the title to get a journal match. Unplug-Don't and Unplug—Don't were considered too different to match. The same should apply for the html version &ndash; and &mdash; Also non-breaking hyphens, double/triple hyphen/dashes and minus signs if those aren't already considered equivalent.
We can't proceed until
Feedback from maintainers


Invalid ISBN added

Status
{{notabug}}, and please look to https://en.wikipedia.org/wiki/Help:CS1_errors#bad_isbn as suggested
Reported by
Jonesey95 (talk) 00:46, 12 December 2019 (UTC)
What happens
Invalid ISBN was added for a conference proceedings
What should happen
Bot should never add an invalid ISBN
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Dynamic_program_analysis&type=revision&diff=930212040&oldid=921483059
Replication instructions
Run the bot on the previous version of the article linked in the diff above.
We can't proceed until
Feedback from maintainers


It's technically invalid because the control character fails the checksum, but ISBN 1595939934 is widely used and most booksources links happily return results, including Open Library and Karlsruhe. These conference proceedings often end up being poorly edited volumes, so it doesn't surprise me if this was printed with a wrong ISBN. There is no ideal solution here.
Help:CS1_errors#bad_isbn recommends adding |ignore-isbn-error=true in such a case if there is no alternative. I guess here we can use the alternate ISBN 1581139934 which is a bit more used. Nemo 07:45, 12 December 2019 (UTC)

Missed a title

Status
{{notabug}} not in CrossRef
Reported by
Headbomb {t · c · p · b} 14:25, 13 December 2019 (UTC)
What should happen
[39]
We can't proceed until
Feedback from maintainers


Possible a database issue. Headbomb {t · c · p · b} 14:25, 13 December 2019 (UTC)

CAPS NBER

Status
{{fixed}}
Reported by
Jonatan Svensson Glad (talk) 00:03, 13 December 2019 (UTC)
What happens
Nber
What should happen
NBER
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Trump_tariffs&diff=prev&oldid=930507889
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2304 AManWithNoPlan (talk) 19:16, 13 December 2019 (UTC)

Fix TODO

There are couple in the code. Note to not forget them and the master build failure. AManWithNoPlan (talk) 23:56, 25 November 2019 (UTC)

and code coverage too AManWithNoPlan (talk) 01:47, 9 December 2019 (UTC)

{{fixed}}

Wrong title?

Status
{{fixed}}
Reported by
Redalert2fan (talk) 13:44, 14 December 2019 (UTC)
What happens
title= Ой!
What should happen
title= Верни мою любовь (сериал) or "Верни мою любовь" (2014) as per other edit below.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Cruel_Love&type=revision&diff=930723334&oldid=930717629
We can't proceed until
Feedback from maintainers


Now, I don't speak Russian but it seems to happen because a redirect happens from https://www.kinopoisk.ru/film/verni-moyu-lyubov-2014-846894 to https://www.kinopoisk.ru/film/846894/ . I think Ой! (Oh!) is being used as an error message here, or making it clear to wait for loading.

In this other edit regarding the same website no redirect happend and the correct title was added. Redalert2fan (talk) 13:49, 14 December 2019 (UTC)

class deprecated warning

AManWithNoPlan (talk) 20:14, 19 December 2019 (UTC)

soon {{fixed}} https://github.com/ms609/citation-bot/pull/2319 AManWithNoPlan (talk) 14:01, 20 December 2019 (UTC)

Caps: AIDS Patient Care and STDS; AIDS Patient Care STDS → AIDS Patient Care and STDs; AIDS Patient Care STDs

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 12:07, 18 December 2019 (UTC)
What should happen
STDs instead of STDS, e.g. [40].
We can't proceed until
Feedback from maintainers


|DOI= is a legitimate alias of |doi=

Status
{{fixed}}
Reported by
Trappist the monk (talk) 11:47, 20 December 2019 (UTC)
What happens
bot doesn't recognize |DOI=
Relevant diffs/links
dif
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2318 AManWithNoPlan (talk) 14:04, 20 December 2019 (UTC)

fix master build PageTest::testBotExpandWrite test

AManWithNoPlan (talk) 19:48, 21 December 2019 (UTC)

{{fixed}} Travis IP addresses are blocked. Disable test. AManWithNoPlan (talk) 20:56, 21 December 2019 (UTC)

fix wikipediabottest::testCategoryMembers test

{{fixed}} category had been cleaned. Changed category we were checking in tests suite. AManWithNoPlan (talk) 19:48, 21 December 2019 (UTC)

incomplete edit summary

Status
{{fixed}}
Reported by
Redalert2fan (talk) 18:14, 20 December 2019 (UTC)
What happens
dash in pages= changed but not mentioned in edit summary
Relevant diffs/links
[41]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2321 AManWithNoPlan (talk) 02:16, 21 December 2019 (UTC)

Better series handling: Antibiotics and Chemotherapy

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 16:14, 25 December 2019 (UTC)
What happens
[42] (2nd thing the bot touched in the diff)
What should happen
[43]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2345 🎅🏻 AManWithNoPlan (talk) 17:17, 25 December 2019 (UTC)

Better series handling: Studies in Bilingualism

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 16:33, 25 December 2019 (UTC)
What should happen
[44]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2345 🎅🏻 AManWithNoPlan (talk) 17:19, 25 December 2019 (UTC)

JSTOR cleanup

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 19:55, 25 December 2019 (UTC)
What should happen
[45]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2367 AManWithNoPlan (talk) 01:15, 31 December 2019 (UTC)

Bot down

It fails on every page. Both gadget and bot itself. Hiccup? Bigger issue? Headbomb {t · c · p · b} 23:36, 26 December 2019 (UTC)

Same here. When I click on the Citations button it gives me "Error: Citations request failed". Trying to use the bot directly redirects to a 503 page. --Ihaveacatonmydesk (talk) 16:51, 27 December 2019 (UTC)
it appears that over Christmas those with power are on vacation. AManWithNoPlan (talk) 20:34, 27 December 2019 (UTC)
The development version is still live (https://tools.wmflabs.org/citations-dev/), but it doesn't seem to do everything that https://tools.wmflabs.org/citations does, as I was trying to use it to expand abbreviated journal titles, which it didn't. Seppi333 (Insert ) 01:39, 28 December 2019 (UTC)
I wouldn’t use that version. AManWithNoPlan (talk) 01:44, 28 December 2019 (UTC)
@Seppi333: also the bot never expanded abbreviated journals. Not on its own at least. Headbomb {t · c · p · b} 04:14, 28 December 2019 (UTC)
Then how were you doing it? Seppi333 (Insert ) 04:14, 28 December 2019 (UTC)
Deleting abbreviations manually and letting the bot fill them. Then taking care of what the bot didn't do. Headbomb {t · c · p · b} 04:38, 28 December 2019 (UTC)

That seems to be it. Sad. BernardoSulzbach (talk) 19:04, 28 December 2019 (UTC)

@DBarratt (WMF), Kaldari, Mattsenate, Maximilianklein, and Smith609: anything that can be done here? You're listed as contact people on the error message/toolabs page. Headbomb {t · c · p · b} 12:48, 30 December 2019 (UTC)
I think that maintane_files.php corrupted the files. I have removed that tool from the source tree so it cannot happen again. AManWithNoPlan (talk) 13:12, 30 December 2019 (UTC)
@AManWithNoPlan: I'm getting a 503 message whenever I try to run the bot; I don't think that fixed it, at least on my end. --Nathan2055talk - contribs 21:38, 30 December 2019 (UTC)
@AManWithNoPlan: Yup, still not fixed when I tried it today. Tgeorgescu (talk) 10:06, 31 December 2019 (UTC)

Please don't ping me, I am not an operator. I can’t reboot it. AManWithNoPlan (talk) 11:52, 31 December 2019 (UTC)

Seems like this incident shows it's time to extend reboot privileges to a few other people. --Ihaveacatonmydesk (talk) 17:38, 31 December 2019 (UTC)
its rather important to have this running--Ozzie10aaaa (talk) 17:56, 1 January 2020 (UTC)

I asked Krenair to restart the service, so it should be working now, however, there is a syntax error somewhere causing the tool to kill itself. Jonatan Svensson Glad (talk)

I've tweaked it a bit, try now. If it doesn't work, or it breaks itself again, we should probably wait for a real maintainer of the tool to sort things out. --Krenair (talkcontribs) 01:03, 2 January 2020 (UTC)
After some more fiddling around I believe it's working without any more local hacks from me. It seems the tool on toolforge had a broken file from an automatic update mechanism that is being removed, FYI maintainers: I've reset the repository in public_html from ef1ea17a4d1d2bc0adbcce6032a768f91b53ec40 to 8d755d36a9e5e023c690c47be7bf10bd5422f00 to drop the automatic local commits to constants/capitalization.php. --Krenair (talkcontribs)
Can confirm things are running now. Headbomb {t · c · p · b} 02:37, 2 January 2020 (UTC)
Thanks for fixing it. Grimes2 (talk) 09:22, 2 January 2020 (UTC)

{{fixed}}

Bot down again

Same as before. Headbomb {t · c · p · b} 23:24, 6 January 2020 (UTC)

Odd. AManWithNoPlan (talk) 00:13, 7 January 2020 (UTC)
https://en.wikipedia.org/wiki/User_talk:Krenair#Citation_bot {{fixed}} AManWithNoPlan (talk) 01:29, 7 January 2020 (UTC)

Bloomberg

When the bot goes to the Bloomberg website, it returns the title "Are you a robot?" See https://en.wikipedia.org/w/index.php?title=Pankisi&oldid=882816353 Kaltenmeyer (talk)

{{fixed}} AManWithNoPlan (talk) 11:48, 7 January 2020 (UTC)

removing links to worldcat

Status
{{notabug}}
Reported by
🌿 SashiRolls t · c 20:09, 24 November 2019 (UTC)
What happens
access to /viewport is zapped.
What should happen
access to /viewport should not be zapped.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Tiffany_Midge&oldid=926539338
We can't proceed until
Feedback from maintainers


I suspect this is probably a feature rather than a bug, but I don't understand why this should be a feature... seems very counter-intuitive. The difference appears to be that the deleted url led directly to the full-text whereas the OCLC field does not lead to /viewport. (not sure where to click to get there either) 🌿 SashiRolls t · c 20:09, 24 November 2019 (UTC)

"Preview this book" right below the image. AManWithNoPlan (talk) 21:22, 24 November 2019 (UTC)
"deleted url led directly to the full-text " that is simply untrue. It leads to a limited google books preview. AManWithNoPlan (talk) 21:23, 24 November 2019 (UTC)
OK, I understand a bit better now. Clicking on preview this book, and then clicking on google preview is what I missed... because I thought it was a worldcat digitization. I only scrolled through the first few fifteen-twenty pages, so did not realize it was partial. I have to say it's not very user friendly to have a link to (partial) full-text labelled 1113896227 instead of just directly linked from the reference title, but then I suppose we are expecting wiki-readers to be sufficiently geeky to know that 1113896227 will lead them to more info whereas the secret code 978-1-496-21803-2 leads nowhere useful (like the bluelinks to ISBN and OCLC). Thanks for looking into it and explaining the odd logic. :) 🌿 SashiRolls t · c 22:09, 24 November 2019 (UTC)
One of the objections to including links to Google Books is that what different readers will see varies unpredictably, and may change. These WorldCat digitized previews are stable, which is a major plus. They don't allow linking to the specific page, which we've come to do, but I think the stability can only be a plus. ISBNs and OCLC numbers lead to full bibliographic info, but the reader is still stymied if they can't get access to the book, which is quite common. (Interlibrary loan is very limited for readers in most places, and we can hardly expect readers to always buy a book, or even to be able to do so in whatever country they live in.) So where's the downside of also adding a link that guarantees they can scroll to the relevant page? In particular, it's hardly a duplication at all, especially since this OCLC link is largely unknown; I had no idea it existed. Yngvadottir (talk) 22:44, 24 November 2019 (UTC)
These WorldCat digitized previews are stable, which is a major plus. Not true. The OCLC viewport link is just a link to a Google Book preview. Google books did the scanning. Worldcat simply builds a little box and links to the google scan in that box. This is the same mechanism that other websites (unrelated to google maps) use to display a little box with google maps content. The problems with google books preview that you describe above are still there. My vote is to always remove worldcat links from |url= when there is a matching |oclc= identifier.
Trappist the monk (talk) 23:16, 24 November 2019 (UTC)
This is not a vote. Wikipedia already says to not link to google books, unless it is a complete and free preview. Can someone find that policy and link it here. These are worse than google book links. They point to some random page instead of a front page or a specifically chosen page. AManWithNoPlan (talk) 17:12, 25 November 2019 (UTC)
I agree that it would be good for someone (perhaps you?) to dig up this policy that you say you've seen, as it would directly contradict the Citing Sources guideline, which I'm more familiar with. (NB: it says quite clearly that the OCLC, ISBN, etc. can coexist with the link in the citation as of this writing). 🌿 SashiRolls t · c 19:59, 25 November 2019 (UTC)
It seems part of a wider problem of what exactly should be the algorithm for handling these identifiers and links which may not be as perfect as they sometimes make out to be ... should there be a policy of using only the restrictive and perfect identifiers available as per here albeit at the result denying access to those who have no such access to the source which is available elsewhere ... this seems it line with the url blue linking approach at Wikipedia:Bots/Noticeboard#IABot blue linking to Internet archive books and Wikipedia:Bots/Noticeboard#User:GreenC bot and edit filters where the GreenC approach is to not to use the ol= identifier and use the URL. I can see there may be reasons for the approach but I would like to see evidencing of clearing guidelines rather than people's opinions. It is not unknown for me to goto a library or purchase a resource so oclc has its uses. Thankyou.Djm-leighpark (talk) 20:17, 25 November 2019 (UTC)
This is unfortunate because archive.org is 100% viewable for free (with 1-time registration) - which is not the case for Google where you only get a partial view. With archive.org you can link to any page within a book for a free 2-page preview (no registration), which is not the case with Google which can only preview certain pages. However, understood archive.org does not have every book that Google might. In my experience Google Book scans come and go, they are not a library and take books (or previews) offline for commercial reasons so no guarantee those scans will be accessible in the future. Also Wikipedia and archive.org are non-profits with close overlap of goals, while Google is a commercial book seller with different goals, we will favor non-profits over commercial given the choice. -- GreenC 20:45, 25 November 2019 (UTC)
In some ways I'd prefer to use "open library" rather than archive.org as archive.org is at least dual purpose, one is for storing/OCR'ing and provisioning either unrestricted free or by limited library lending; the other for archival of web pages. There perhaps may be no clashing between these BOTs but having had two articles where it has broken syntax'ed on me I'm not confident everything on the same page and perhaps guidelines should be updated so the old algorithms can be written and checked against them? (I may have strayed from the original bug) Djm-leighpark (talk) 20:59, 25 November 2019 (UTC)
I think you're a bit confused: openlibrary.org is a collection of catalog records to aid in the discovery of books; archive.org is the actual digital library. The archival of web pages is at web.archive.org. Nemo 21:34, 25 November 2019 (UTC)
I am sorry I am somewhat confused and ask stupid questions. It is in my nature and training.Djm-leighpark (talk) 21:43, 25 November 2019 (UTC)
Don't be sorry! It's essential to surface such misunderstandings, otherwise we're just going to talk past each other. A lot of people are confused by archive.org vs. web.archive.org etc., almost as many as wikimedia.org vs. mediawiki.org. ;-) Nemo 22:06, 25 November 2019 (UTC)

For me personally, the links to worldcat.org are completely useless because they don't load any preview at all unless I allow a series of cookies and third-party resources. Links to the splash page leading to a full text (for instance on biodiversitylibrary.org) are often useful, but I've yet to encounter a case where worldcat.org is the best link available for a given content. Nemo 21:34, 25 November 2019 (UTC)

  • Ahrons, E. L. (1954). L. L. Asher (ed.). Locomotive and train working in the latter part of the nineteenth century". Vol. six. W Heffer & Sons Ltd. OCLC 606019549. OL 21457769M. {{cite book}}: Invalid |ref=harv (help) ? Djm-leighpark (talk) 22:26, 25 November 2019 (UTC)

Mobile web

Is it possible for the bot to replace links to mobile sites such as https://m.washingtontimes.com/news/2017/may/2/peter-newsham-confirmed-as-chief-of-dc-police/ to https://www.washingtontimes.com/news/2017/may/2/peter-newsham-confirmed-as-chief-of-dc-police/ (see Special:Diff/930509786&oldid=930509645? Jonatan Svensson Glad (talk) 00:19, 13 December 2019 (UTC)

I think so, but that might be a better task for a different bot. AManWithNoPlan (talk) 11:54, 13 December 2019 (UTC)

{{notabug}} Best for a single mass run with a different bot. AManWithNoPlan (talk) 18:32, 8 January 2020 (UTC)

Incorrect PMC added

Here the bot added PMC 3435945 to the existing citation for PMID 19741352, the PMC is for a different paper. The PMC may be for a reprint of the cited paper but is in a different journal (also different year, volume, pages) so should not be added. What validation is the bot doing to determine that a PMC (that presumably has been found from a keyword search of PMC database) is for the correct paper? Thanks Rjwilmsi 15:53, 31 December 2019 (UTC)

I have reported the error to the database. AManWithNoPlan (talk) 17:11, 31 December 2019 (UTC)
Interesting, I can't see any data issue on the pubmed side (PMID 19741352 and PMC 3435945) - what am I missing? Thanks Rjwilmsi 18:12, 31 December 2019 (UTC)
it’s in the DOI to open source resolver. We do have lots of checks, but when the title and other things match we get fooled. AManWithNoPlan (talk) 19:50, 31 December 2019 (UTC)

{{notabug}} that we can fix. AManWithNoPlan (talk) 18:32, 8 January 2020 (UTC)

Fails on Ion channel

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 13:36, 7 January 2020 (UTC)
What happens
Fails to edit.
We can't proceed until
Feedback from maintainers


Has something to do with Biorxiv / doi = 10.1101/... Headbomb {t · c · p · b} 13:36, 7 January 2020 (UTC)

I am currently adding hundreds of test cases to the cdde base and have removed several functions that are not called and fixed a half dozen minor bugs. The file containing the biozrx code is next. I will jump ahead to that file and fix this. AManWithNoPlan (talk) 12:29, 8 January 2020 (UTC)

! User is either invalid or blocked on en.wikipedia.org

Status
new bug
Reported by
Nessie (talk) 15:20, 8 January 2020 (UTC)
What happens
Unlike earlier today, i get an error
! User is either invalid or blocked on en.wikipedia.org
Relevant diffs/links
https://tools.wmflabs.org/citations/process_page.php?slow=on&edit=webform&page=Pipefish%7CPlatymantis+bayani%7CPrasinohaema+virens%7CPseudohaje+nigra%7CPuya+compacta%7CQuasipaa%7CSchefflera+bractescens%7CSea+Turtles+911%7CSenecio+antisanae%7CSenecio+iscoensis%7CSenecio+lamarckianus%7CShrine+of+Bayazid+Bostami%7CSifaka%7CSinogastromyzon%7CSmoky+mouse%7CSorbus+admonitor%7CSportive+lemur%7CSyagrus+macrocarpa%7CTatra+National+Park%2C+Slovakia%7CTonkin+weasel%7CTristramella+magdelainae%7CVespula+atropilosa%7CWagner%27s+viper%7CWildlife+of+Ethiopia%7CWildlife+of+R%C3%A9union%7CYermo+xanthocephalus&cat=
We can't proceed until
Feedback from maintainers


I will flag as {{fixed}}, since it seems to work now. Not really our problem since it points to a wiki server problem. AManWithNoPlan (talk) 18:37, 8 January 2020 (UTC)

If chapter/title are identical, TNT them

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 04:19, 2 January 2020 (UTC)
What should happen
If you have something like
  • Christopher Min K (2007). "Structure and Function of α‐Tocopherol Transfer Protein: Implications for Vitamin e Metabolism and AVED". Structure and function of alpha-tocopherol transfer protein: implications for vitamin E metabolism and AVED. Vitamins & Hormones. Vol. 76. pp. 23–43. doi:10.1016/S0083-6729(07)76002-8. ISBN 978-0-12-373592-8. PMID 17628170.

or

  • Christopher Min K (2007). "Structure and Function of alpha‐Tocopherol Transfer Protein: Implications for Vitamin e Metabolism and AVED". Structure and function of alpha-tocopherol transfer protein: implications for vitamin E metabolism and AVED. Vitamins & Hormones. Vol. 76. pp. 23–43. doi:10.1016/S0083-6729(07)76002-8. ISBN 978-0-12-373592-8. PMID 17628170.

where |chapter= = |title= in cite book, then TNT them and get

We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2422 AManWithNoPlan (talk) 20:15, 15 January 2020 (UTC)

Change year to date when it makes sense

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 14:38, 5 January 2020 (UTC)
What should happen
[46]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2422 AManWithNoPlan (talk) 20:14, 15 January 2020 (UTC)

Ignore diacritics for title matching

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 15:33, 6 January 2020 (UTC)
What should happen
[47]
We can't proceed until
Feedback from maintainers


These were after TNTing both the title and the journal. TNTing the journal alone isn't enough. Headbomb {t · c · p · b} 15:33, 6 January 2020 (UTC)

https://github.com/ms609/citation-bot/pull/2422 AManWithNoPlan (talk) 20:14, 15 January 2020 (UTC)

Removing intended colons from citation title

Status
{{fixed}} for many cases
Reported by
Knuthove (talk) 19:09, 15 January 2020 (UTC)
What happens
Your bot removed two colons from the title of a citation that I think were supposed to be there
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Indian_School%2C_Al-Ghubra&type=revision&diff=935880647&oldid=935799427
We can't proceed until
Feedback from maintainers


I came across this edit just now, where the title of the cited page is ":: ISG ::" and your bot changed it to ":: ISG". Now ":: ISG ::" isn't a very good title for a citation, but I wanted to alert you to this behaviour.

https://github.com/ms609/citation-bot/pull/2422 AManWithNoPlan (talk) 20:14, 15 January 2020 (UTC)

Semantic Scholar

I think it's not fair to replace public repository like zenodo or that belong to university with Semantic Scholar. Look at the privacy policy https://allenai.org/privacy-policy.html and the trackers. Regards,

LaMèreVeille (talk) 15:22, 16 January 2020 (UTC)

{{notabug}} we do not do that. AManWithNoPlan (talk) 16:13, 16 January 2020 (UTC)

Timing out, not processing URLs

I fed the bot 2017–18 Chelsea F.C. season with "Thorough mode" ticked, and got back:

> Using Zotero translation server to retrieve details from URLs.
   ! Operation timed out after 5001 milliseconds with 0 bytes received   For URL: http://www.statto.com/football/teams/chelsea/history
   ! Operation timed out after 5000 milliseconds with 0 bytes received   For URL: http://www.skysports.com/football/news/11668/10870337/billy-gilmour-completes-move-to-chelsea-from-rangers
   ! Operation timed out after 5001 milliseconds with 0 bytes received   For URL: https://metro.co.uk/2017/05/09/daishawn-redan-reportedly-agrees-to-join-chelsea-over-manchester-united-6625627/
   ! Operation timed out after 5001 milliseconds with 0 bytes received   For URL: http://www.chelseafc.com/news/latest-news/2017/07/new-nike-kits-available-now.html
   ! Operation timed out after 5001 milliseconds with 0 bytes received   For URL: http://www.chelseafc.com/news/latest-news/2017/07/caballero-signs.html
   ! Operation timed out after 5001 milliseconds with 0 bytes received   For URL: http://www.chelseafc.com/news/latest-news/2017/07/loan-return-for-palmer.html
   ! Giving up on URL expansion for a while
 

It's been doing this for all pages I feed it since yesterday.

Running the page through without "Thorough mode" ticked does nothing with the bare URLs - David Gerard (talk) 18:12, 16 January 2020 (UTC)

{{notabug}} just high usage. AManWithNoPlan (talk) 20:48, 16 January 2020 (UTC)

Fails to decapitalize

Status
mostly {{fixed}}
Reported by
Headbomb {t · c · p · b} 01:40, 7 November 2019 (UTC)
What should happen
[48]
We can't proceed until
Feedback from maintainers


I had to whack on the bot to make this happen. It should have decapitalized FRONTIERS IN IMMUNOLOGY and BIOGERONTOLOGY on its own (adding the '(journal)' pipe was me, i don't expect the bot to do that). Headbomb {t · c · p · b} 01:40, 7 November 2019 (UTC)

I think you posted the wrong edit link. But it sounds like you want us to fix fully capitalized journal names like we do titles that are all caps. Is that correct? AManWithNoPlan (talk) 15:43, 7 November 2019 (UTC)
Yes that's the wrong link. However, we already decapitalize all caps journals usually, see e.g. [49]. Headbomb {t · c · p · b} 18:45, 7 November 2019 (UTC)
fixing links currently does not work via the gadget since the bot is not logged in to query the database. It should be possible to use curl to get the same information. AManWithNoPlan (talk) 00:04, 8 November 2019 (UTC)
https://github.com/ms609/citation-bot/pull/2422 AManWithNoPlan (talk) 20:16, 15 January 2020 (UTC)

Adds empty placeholder parameters

Status
new bug
Reported by
Headbomb {t · c · p · b} 17:33, 17 January 2020 (UTC)
What happens
[50]
We can't proceed until
Feedback from maintainers


As best as I can tell, the bot ran during a git update and had mixed state files. AManWithNoPlan (talk) 17:58, 17 January 2020 (UTC)

Should consider temporarily blocking the bot from running during an update. --Izno (talk) 19:05, 17 January 2020 (UTC)
{{fixed}} and block added AManWithNoPlan (talk) 13:50, 18 January 2020 (UTC)
This isn't really fixed. The bot will fail to edit if you have an empty |title=/|journal=, probably because it's still trying to add the placeholders, and gets blocked from doing so. Headbomb {t · c · p · b} 17:15, 18 January 2020 (UTC)

That’s now {{fixed}} also. AManWithNoPlan (talk) 21:34, 18 January 2020 (UTC)

Fix apostrophe

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 17:55, 17 January 2020 (UTC)
What should happen
[51]
We can't proceed until
Feedback from maintainers


Not sure what character that is, but it should be fix. At least for this journal if this doesn't generalize. Headbomb {t · c · p · b} 17:55, 17 January 2020 (UTC)

{{wontfix}} it is a greek mark being misused. We cannot just get rid of it. AManWithNoPlan (talk) 13:53, 18 January 2020 (UTC)
It's a random acute accent. Like I said, if this doesn't generalize, then at least for this journal. Headbomb {t · c · p · b} 16:53, 18 January 2020 (UTC)
Why? Is there some reason that people who reference this journal are character use impaired? AManWithNoPlan (talk) 21:35, 18 January 2020 (UTC)
It's in the database, so if you provide doi:10.1515/bchm2.1932.208.4.129 you will get |journal=Hoppe-Seyler´s Zeitschrift für physiologische Chemie. Headbomb {t · c · p · b} 22:08, 18 January 2020 (UTC)

Error processing Kray twins page

Status
{{fixed}}
Reported by
B.Rossow · talk 23:18, 18 January 2020 (UTC)
What happens
When trying to process the Kray twins page, the tool returns "Page is a redirect. Page 'Kray_twins' not found." The page exists and is not a redirect.
What should happen
Page should be processed as expected.
Relevant diffs/links
https://tools.wmflabs.org/citations/process_page.php?edit=toolbar&slow=1&page=Kray_twins
We can't proceed until
Feedback from maintainers


presentation of handles loses useful information

Someone has been running this bot over Queensland content, e.g [52] and it is stripping out the name of the website (which denies the reader the knowledge that it comes from a reliable source -- The State Library of Queensland) in favour of making the rather ugly handle visible to reader. I don't have a problem with the URL being replaced with a handle but could we make the visible text of the handle the name of the handle naming authority (if available) or website/publisher (alternatively) State Library of Queensland or simply retain the name of the website/publisher where provided). Thanks Kerry (talk) 07:49, 30 December 2019 (UTC)

Actually, it looks like there are too many primary sources and not enough secondary sources and these primary sources are missing more important info like author, work & publisher. Is the library that holds the records even important?  — Chris Capoccia 💬 11:40, 30 December 2019 (UTC)
If we were talking about a random library, I would agree it probably didn't matter, but when it is the library with the statuatory obligation to collect and preserve Queensland content, then I think it is a matter of relevance/reliability that it is included in their collection. That's why (if it were technically possible), then the name of the handle provider should be automtically included in the citation, but if that's not possible, then leaving the website/publisher in place is the 2nd best solution. Kerry (talk) 03:17, 2 January 2020 (UTC)
The website information is incorrect. The website if hdl.handle.net. So, the original template has the library in the wrong place. Should probably be in a different field. AManWithNoPlan (talk) 18:30, 8 January 2020 (UTC)

{{notabug}}

\ce garbage title from arxiv

Status
{{fixed}}
Reported by
Izno (talk) 19:37, 7 January 2020 (UTC)
What happens
\ce garbage title from arxiv
What should happen
Not entirely sure. I followed it up with [53] but I don't think that's a general solution. The bot might guess by unescaping the {} and then swapping the \ce for <chem>.
Relevant diffs/links
[54]
We can't proceed until
Feedback from maintainers


Better handling of things that should be archive-url

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 18:19, 9 January 2020 (UTC)
What should happen
[55]
We can't proceed until
Feedback from maintainers


Extra long list of bogus parameter changes

Status
{{fixed}}
Reported by
Nemo 21:18, 24 January 2020 (UTC)
What happens
The list of changes in the edit summary becomes so long that the edit summary is truncated.
What should happen
Don't list the parameters of a renamed template as altered/added, perhaps? Not sure what's going on.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Temuan_people&diff=prev&oldid=937386557
We can't proceed until
Feedback from maintainers


I will look into that. After years of people wanting the summary to add more and more; this is a new one where there seems a lot of extra stuff listed. AManWithNoPlan (talk) 18:51, 25 January 2020 (UTC)

Since it seems to be triggering a lot with my batches, maybe it has something to do with consecutive large category scans? --~ฅ(ↀωↀ=)neko-channyan 19:31, 25 January 2020 (UTC)
Here is another example. The bot adds two parameters, |publisher= and |type= (the type addition is somewhat dubious, by the way). The edit summary is:
"Alter: doi-broken-date, title, template type, author, url, id, pages. Add: type, publisher, title-link, bibcode, doi, archive-date, archive-url, pmid, pages, issue, volume, author-link, newspaper, year, url, chapter-url, date, title. Removed parameters. Formatted dashes. Some additions/deletions were actually parameter name changes."
It is impossible to guess what actually happened from this summary. —David Eppstein (talk) 20:19, 25 January 2020 (UTC)
It's only during category runs. I figured it out and will fix it soon. AManWithNoPlan (talk) 21:55, 25 January 2020 (UTC)

Incorrect jstor causes Citation bot to make the citation even worse

Status
{{fixed}}
Reported by
David Eppstein (talk) 20:26, 25 January 2020 (UTC)
What happens
For reasons unrelated to Citation bot, the jstor ids in Special:Diff/937558195 were incorrect (their final digit was truncated). Citation bot does not notice that the journal, authors, etc of the citations have nothing to do with the listed jstor ids, and replaces valid information with information drawn from the incorrect ids, making the citations even farther from correct than they were before.
What should happen
Citation bot should recognize that there's a problem here and either flag it as a problem or give up without attempting to fix the citations
We can't proceed until
Feedback from maintainers


No authorship indicated

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 05:55, 27 January 2020 (UTC)
What should happen
[56]
We can't proceed until
Feedback from maintainers


Pubmed weirdness

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 06:56, 27 January 2020 (UTC)
What happens
[57]
What should happen
Not this part [58].
We can't proceed until
Feedback from maintainers


ZooKeys issues, still not fixed

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 11:53, 28 November 2019 (UTC)
What happens
[59]
What should happen
[60]
We can't proceed until
Feedback from maintainers


We only change it if it’s set to one. The problem is that the existing data looks reasonable with 12. AManWithNoPlan (talk) 11:59, 28 November 2019 (UTC)

Zookeys will always have issues that match the bold part of 10.3897/zookeys.772.24410. Headbomb {t · c · p · b} 12:47, 28 November 2019 (UTC)
https://github.com/ms609/citation-bot/pull/2471 AManWithNoPlan (talk) 19:45, 23 January 2020 (UTC)

Fix apostrophes in links

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 21:43, 4 December 2019 (UTC)
What should happen
[61]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2473 AManWithNoPlan (talk) 20:13, 23 January 2020 (UTC)

Bbc.com

Status
{{fixed}}
Reported by
Jonatan Svensson Glad (talk) 01:10, 15 December 2019 (UTC)
What should happen
Remove |publisher=Bbc.com when adding |work=BBC News
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=National_Congress_%28Sudan%29&diff=prev&oldid=930800719
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2472 AManWithNoPlan (talk) 19:44, 23 January 2020 (UTC)

Caps: Sch

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 15:21, 28 January 2020 (UTC)
What happens
Sch → SCH
What should happen
Sch → Sch: [62]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2519 AManWithNoPlan (talk) 17:57, 28 January 2020 (UTC)

Caps: iScience

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 16:23, 28 January 2020 (UTC)
What should happen
[63]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2519 AManWithNoPlan (talk) 17:57, 28 January 2020 (UTC)

Don't remove : in authorlink

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 00:32, 29 January 2020 (UTC)
What should happen
[64]
We can't proceed until
Feedback from maintainers


This breaks interwikilinks. Probably the same in title-link and other -link parameters. Headbomb {t · c · p · b} 00:32, 29 January 2020 (UTC)

https://github.com/ms609/citation-bot/pull/2523 AManWithNoPlan (talk) 16:09, 29 January 2020 (UTC)

Don't add ISSNs

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 00:37, 29 January 2020 (UTC)
What should happen
Don't do this part [65]
We can't proceed until
Feedback from maintainers


These add little to no value, and there is no consensus to add those by bots. Headbomb {t · c · p · b} 00:37, 29 January 2020 (UTC)

It shouldn't be, unless the Open-Access DOI resolved to worldcat url, which would be an issn. That's my only thought without looking at the code. We do add ISSN when removing worldcat urls since we are not "adding" it. But, if the url came in as new the code does not detect that. Its probably something else since the code is odd at times. AManWithNoPlan (talk) 11:58, 29 January 2020 (UTC)

bbc titles

Also, is it possible to see if the bot can fetch a "clean" title if it ends with - BBC News as in |title=Omar al-Bashir: How Sudan's military strongmen stayed in power - BBC News Jonatan Svensson Glad (talk) 01:11, 15 December 2019 (UTC)

{{wontfix}} since it is too likely that new title will not be any better, and in fact worse since time has passes. AManWithNoPlan (talk) 17:18, 30 January 2020 (UTC)

Bot edits cause articles to be added to cleanup category, despite being okay

Status
new bug
Reported by
Coolabahapple (talk) 02:04, 29 January 2020 (UTC)
What happens
creates reference errors by changing from cite web to cite document and not including a periodical name (CS1 errors: missing periodical)
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Moreton_Central_Sugar_Mill_Cane_Tramway&type=revision&diff=936862532&oldid=931890117
We can't proceed until
Feedback from maintainers


Hi, came across a number of errors in the Australian project cleanup listing (see here under "New Articles") Citation bot has changed cite web to cite document without including a periodical name here, here, here, here. (There are most likely more as the cleanup list has over 1400 "no periodical title" listed, i've noticed that the number of this error has increased over the last few months). Coolabahapple (talk) 02:04, 29 January 2020 (UTC)

So what's the issue? Periodicals aren't required. Headbomb {t · c · p · b} 02:13, 29 January 2020 (UTC)
From the update name of this bug, it sounds like something to take to Help talk:CS1. Headbomb {t · c · p · b} 03:17, 29 January 2020 (UTC)
{{cite document}} redirects to {{cite journal}} which does require |work= or one of its aliases. The edit the bot made is incorrect. --Izno (talk) 04:11, 29 January 2020 (UTC)
The issue is that the template needs fixing. It's a leftover/oversight from the mandatory periodical thing from a few months back. It's also why those 'errors 'aren't visible, several aren't actually errors. Headbomb {t · c · p · b} 04:30, 29 January 2020 (UTC)
{{notabug}} please finish discussion in the citation template talk space. AManWithNoPlan (talk) 13:25, 30 January 2020 (UTC)

Google book

Status
{{fixed}}
Reported by
Redalert2fan (talk) 15:27, 1 February 2020 (UTC)
What happens
Website= Google book not removed. example page
What should happen
https://en.wikipedia.org/w/index.php?title=%27Ubadah_ibn_al-Samit&type=revision&diff=938646764&oldid=938646419
We can't proceed until
Feedback from maintainers


Website= Google Books gets automatically removed, but when it is (misspelled as) "Google book" the bot misses it. Redalert2fan (talk) 15:27, 1 February 2020 (UTC)

https://github.com/ms609/citation-bot/pull/2542 AManWithNoPlan (talk) 17:03, 1 February 2020 (UTC)

Pages= null

Hello, could (or should) the bot auto remove pages= null as I did myself here? It does not seem particularity useful to include and probably a mistake while being imported by someone or some tool. Redalert2fan (talk) 17:16, 1 February 2020 (UTC)

Seems like unless the book is written by a geek that thinks they are cute, that would never be right. Probably a tool or database that turned NULL into a string. AManWithNoPlan (talk) 17:36, 1 February 2020 (UTC)
{{fixed}}. AManWithNoPlan (talk) 18:28, 1 February 2020 (UTC)

Proquest

This was one of the edits where it completely removed a publisher of content and dumped in a random date. Not quite sure why this was done, but I reverted the edit. - NeutralhomerTalk • 04:12 on February 4, 2020 (UTC)

That's because ProQuest is not the publisher. Headbomb {t · c · p · b} 06:11, 4 February 2020 (UTC)
Actually, here it is. That's very unusual, giving it's 99%+ abused to mean something hosted in a ProQuest database, rather than being actually published by ProQuest LLC. The bot should avoid removing ProQuest LLC, given that's clearly not the databases. Headbomb {t · c · p · b} 06:13, 4 February 2020 (UTC)
Now looks for "LLC" {{fixed}} AManWithNoPlan (talk) 14:59, 4 February 2020 (UTC)

online books

Status
new bug
Reported by
Topo122 (talk) 19:44, 19 December 2019 (UTC)
What happens
Changes template type
We can't proceed until
Feedback from maintainers


In Fabian S. Woodley Citation bot changed this:

To this:

It obscures the fact that the reference is to the on-line version of the Oxford Dictionary of National Biography. And the Oxford Dictionary of National Biography is not a journal. Topo122 (talk) 19:44, 19 December 2019 (UTC)

Should be a cite dictionary / cite book. Headbomb {t · c · p · b} 20:22, 19 December 2019 (UTC)
I think whether it's online or offline is immaterial. At the end of the day, it's still a dictionary/book (and happens to be available online which was the source accessed). I do agree that it shouldn't be cite journal, but similarly it shouldn't be cite web. --Izno (talk) 22:11, 19 December 2019 (UTC)

{{cite web}} is the wrong template; better is {{cite ODNB}} (which itself uses {{cite encyclopedia}}):

{{cite ODNB |last1=Courtney |first1=W. P. |last2=Hinings |first2=Jessica |title=Woodley, George |doi=10.1093/ref:odnb/29929}}
Courtney, W. P.; Hinings, Jessica. "Woodley, George". Oxford Dictionary of National Biography (online ed.). Oxford University Press. doi:10.1093/ref:odnb/29929. (Subscription or UK public library membership required.)

Trappist the monk (talk) 22:33, 19 December 2019 (UTC)

Given https://api.crossref.org/works/10.1093/ref:odnb/29929, can all items with a DOI of type "reference-entry" use {{cite encyclopedia}}, with whatever is the "container-title" going into |encyclopedia=? The manual says it shouldn't be used for just any book with multiple authors; on the other hand, not all reference works are books. Nemo 05:15, 20 December 2019 (UTC)

Didn't know about {{cite ODNB}} - very useful - I'll use it in future! Topo122 (talk) 11:42, 21 December 2019 (UTC)

{{fixed}} AManWithNoPlan (talk) 19:27, 4 February 2020 (UTC)

Zombie DOI

Status
{{fixed}} for this journal
Reported by
Whywhenwhohow (talk) 21:21, 21 January 2020 (UTC)
What happens
it removes the doi-broken-date for a broken doi and it removes the url when the broken doi doesn't resolve properly.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Proton-pump_inhibitor&diff=936896597&oldid=935811681
Replication instructions
The DOI 10.1111/j.1572-0241.2006.00844.x resolves to a not found page
We can't proceed until
Feedback from maintainers


The doi should be tested to verify it resolves properly before removing doi-broken-date.

Consider verifying that the page reached via both the url and doi parameters agree before removing the url in the citation.

we do verify them. But the process is not 100% perfect. This is one reason why DOIs are superior to URLs. Urls move around, and dois eventually follow them And a quick google search finds them. These transition periods are annoying. AManWithNoPlan (talk) 13:37, 22 January 2020 (UTC)
in this case the doi is in crossref. The journal is on Elsevier, the doi is owned by Blackwell, and the wrong url is Springer! The real problem is that the doi is not inactive, but wrong! The doi needs removed and replaced with a comment. AManWithNoPlan (talk) 13:50, 22 January 2020 (UTC)
What is the verification? Why does it remove the doi-broken-date for a doi that resolves to a page with a 404 status? Whywhenwhohow (talk) 21:29, 23 January 2020 (UTC)
We verify serveral things. But, when CrossRef has it, we take that as almost gold. AManWithNoPlan (talk) 23:23, 23 January 2020 (UTC)
will look at 404 AManWithNoPlan (talk) 23:24, 23 January 2020 (UTC)
You got me thinking. This will help and maybe fully fix it. https://github.com/ms609/citation-bot/pull/2476 AManWithNoPlan (talk) 00:50, 24 January 2020 (UTC)
And now this: https://github.com/ms609/citation-bot/pull/2477 AManWithNoPlan (talk) 01:11, 24 January 2020 (UTC)
Thanks. Whywhenwhohow (talk) 03:52, 24 January 2020 (UTC)
Still trying to figure our how to treat this properly. I have reported the DOI problem. I have done this three times before. They all have gotten resolved. Hopefully this fourth DOI complaint gets fixed too. AManWithNoPlan (talk) 22:56, 31 January 2020 (UTC)

Caps: I/i

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 00:03, 3 February 2020 (UTC)
What should happen
[66]
We can't proceed until
Feedback from maintainers


Failed to fix linked caps

Status
{{fixed}} soon
Reported by
Headbomb {t · c · p · b} 00:30, 3 February 2020 (UTC)
What should happen
[67]
We can't proceed until
Feedback from maintainers


We don't fix links with more dead links at this time. AManWithNoPlan (talk) 17:46, 4 February 2020 (UTC)

Red links are irrelevant, the point is that those should be capitalized, just like everything else. There is nothing special about a linked term vs an unlinked one. Headbomb {t · c · p · b} 22:24, 4 February 2020 (UTC)

Better detection of websites in journal parameter please

Status
{{fixed}} mostly
Reported by
Headbomb {t · c · p · b} 11:22, 3 February 2020 (UTC)
What happens
|journal=ebi8.uniprot.org|journal=Ebi8.uniprot.org
What should happen
|journal=ebi8.uniprot.org|website=ebi8.uniprot.org

|journal=Ebi8.uniprot.org|website=ebi8.uniprot.org
|website=Ebi8.uniprot.org|website=ebi8.uniprot.org

Relevant diffs/links
[68]
We can't proceed until
Feedback from maintainers


bot adds |editor1-last= and |editor1-first= when |editor-last1= and |editor-first1= are already present

Status
{{fixed}}
Reported by
Trappist the monk (talk) 20:55, 4 February 2020 (UTC)
What happens
bot adds |editor1-last= and |editor1-first= when |editor-last1= and |editor-first1= are already present
What should happen
bot should have done nothing; |editor1-last= is an alias of |editor-last1= and |editor1-first= is an alias of |editor-first1=
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers


The CrossRef code has been broken for years. I just fixed it. That seems to have exposed the lack of support for the five bazillion aliases for the same thing. Will have a fix out soon. AManWithNoPlan (talk) 22:16, 4 February 2020 (UTC)

Author field unlinking

Status
{{notabug}}
Reported by
SilkTork (talk) 16:34, 5 February 2020 (UTC)
What happens
bot unlinks names linked in author field
What should happen
nothing
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Brewing&diff=929503239&oldid=928812653
We can't proceed until
Feedback from maintainers


It doesn't, it links them via |authorlink=. Headbomb {t · c · p · b} 18:11, 5 February 2020 (UTC)
this edit {{fixed}} the invalid COINS data. AManWithNoPlan (talk) 18:57, 5 February 2020 (UTC)
There was no invalid COinS metadata as a result of wikilinking |author=. The edit added |pmc=, |pmid=, and |doi= to an unrelated template so the edit as a whole was not a cosmetic edit. Metadata was never an issue with the |author-link= 'fixes'.
Trappist the monk (talk) 23:53, 5 February 2020 (UTC)

Changing ISBN to isbn

Status
{{notabug}}
Reported by
SilkTork (talk) 16:36, 5 February 2020 (UTC)
What happens
changes ISBN to isbn
What should happen
nothing
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Brewing&diff=929503239&oldid=928812653
We can't proceed until
Feedback from maintainers


|isbn= is the canonical parameter, while |ISBN= isn't wrong, having lowercase identifier parameters is better and reduces problems for AWB routines and other bots. Headbomb {t · c · p · b} 18:14, 5 February 2020 (UTC)
they have also dropped some all caps aliases recently. AManWithNoPlan (talk) 19:00, 5 February 2020 (UTC)

Curious regarding Citation Bot unlinking and relinking and changing ISBN to isbn

Copied over from User:Smith609's talkpage:

Hi. I find your bot really useful in sorting out and updating cites. However, I'm curious regarding two things.

Why does Citation Bot unlink and then relink authors, as here: [69] when it changed this: "author=Roger Protz" to this: "author=Roger Protz|author-link=Roger Protz". I checked, and the end result is the same. I understand that some people use author-link because they feel that authors in cites should have their names displayed backwards, as in "author=Protz, Roger", which can't be linked, so author link is required. (I'm not clear as to why some editors do this as citations are not listed alphabetically, and it is harder to read and recognise someone's name when it is presented backwards, but they do and others copy, so be it.) But if a name is linked in the author field, it is generally because the name is presented with the words in the right order. Your bot would only need to make changes if the name was incorrectly linked, but your bot would not know that, as even if you asked it to detect if a comma was in the link, there are names which have commas, such as Prince Edward, Earl of Wessex (though I suppose your bot could check the link to see if it is working?). If there is a problem with directly linking authors in the author field it would be useful to know, so I can make adjustments. But if there is no problem then it might be worth having a think as article watchers could be having their watchlist light up and go check over what your bot has done pointlessly.

And the ISBN number. The bot changes ISBN to isbn, but both display on the page as ISBN, and link appropriately. Is there something unseen which means that some bots or some software will not function if isbn is shown as ISBN in the cite template? Again, it would be useful to know, so I could adjust my own editing to avoid causing problems. But if it serves do real purpose, then doing it can cause a mild nuisance to article watchers.

Regards, and thanks for the work you do. SilkTork (talk) 12:17, 3 February 2020 (UTC)

Re linking of author names, see the documentation for the cite templates, e.g. Template:Cite book#Authors, which say last: Surname of a single author. Do not wikilink—use author-link instead. I don't have an answer for the ISBN change, except to say that you do not need to use the lower-case form. – Jonesey95 (talk) 13:49, 3 February 2020 (UTC)
Your link is to when surname only is used - such as "Protz" (which wikilinking would rarely result in arriving at the correct article - Protz), not as in the example I gave above where the whole name is given - Roger Protz. Separating surname and first name is done by a number of editors, though it is not helpful, as it presents the author's name backwards in a non-alphabetical list. SilkTork (talk) 21:33, 3 February 2020 (UTC)
|author= is an alias of |last=, so the instructions apply to both parameters. Wikilinks should not be used in either parameter, or in |first=. – Jonesey95 (talk) 23:40, 3 February 2020 (UTC)
Cool. You sound as if you know something Jonesey95. Why shouldn't links be used in |author=? They can be used, and they do work. So what problems are being caused by using them that aren't caused by using them in author-link= ? SilkTork (talk) 11:56, 4 February 2020 (UTC)
There is a partial explanation at Template:Cite book#COinS. Tools that use author information from WP citations end up with bad data. – Jonesey95 (talk) 15:51, 4 February 2020 (UTC)
I'm not understanding the explanation in your link as it doesn't appear to relate to your view that it is OK to put a wiki link in author-link= but not in author=. Is there some special coding in the author-link= field that makes it OK, but that special coding is absent in author= which causes "bad data"? Surely the solution (if there is such a problem) would be to put the special coding in the author= field as well? My own understanding of why we have the author-link= field is not because it has special coding to allow a wiki-link but because a) a number of editors like to place the author's name backwards in a belief that this is how we display author information in citations, but a backwards name can't be linked without piping, and I understand piping is a problem in templates, and b) because some names are disambiguated; so a separate field was created. But if the author's name is not backwards or contains a disamb - such as Michael Jackson (writer), then it can be linked. I have done this for years without, to my knowledge, breaking the internet. But as a bot has been designed to undo perfectly correct links in author= and place them elsewhere in author-link=, I'd like to know - for certain, from someone who knows - if that is actually necessary because then I will stop putting links in the author= field. But if it's just a mistaken assumption that we can't link a name in the author= field that is correctly displayed, then this bot should be adjusted. If, Jonesey95 (or anyone else), you do know for certain that harm will be done by linking a correct article name in the author= field, please point it out to me. SilkTork (talk) 18:22, 4 February 2020 (UTC)
|lastn= and |firstn= render the author's name in surname given name order. This is very commonly used in bibliographic listings so that readers can quickly locate the source when the article uses short-form (Harvard) referencing. When this form is used, |author-linkn= wikilinks both names. While it is possible to separately wikilink both the surname and the given name, that is redundant so should be avoided. I have occasionally seen cs1|2 templates where editors have only wikilinked |lastn=. I know of no technical reason why this should not be allowed.
When using |authorn=, wikilinking the assigned value is allowed because Module:Citation/CS1 (the engine that drives the cs1|2 templates) is smart enough to extract the important bits from the wikilink, piped or no, for rendering and for the citation's metadata. In author, contributor, editor, interviewer, and translator name lists any of these forms is allowed:
|authorn=[[<author name>]]
|authorn=[[<author article link>|<author name>]]
|authorn=<author name> |author-linkn=<author article link>
I know of no technical reason to prefer any one of the above over the others.
Trappist the monk (talk) 19:27, 4 February 2020 (UTC)
ISBN is an alias for isbn in the template, and there are other tools that do the same change for that reason. By the way, you should probably discuss bot related issues on the bot page. AManWithNoPlan (talk) 14:03, 3 February 2020 (UTC)
I assumed this bot was run by Smith609, so I thought I'd reach out here first as this isn't a bot broken report, more of a query regarding the bot's operation. At this point I don't know if it is a bot problem, or if I am doing something incorrect. But I will take your advice and copy this discussion to the bot page. Thanks. SilkTork (talk) 18:33, 4 February 2020 (UTC)
  • I've just noticed on the diff [70] that it says: Activated by User:Nemo bis. I'm not familiar with how this bot works - but is it likely that Nemo bis is the one who set up the instructions for the bot to delink "author=Roger Protz", and create "author-link=Roger Protz"? SilkTork (talk) 18:38, 4 February 2020 (UTC)
    • Like all bots, it is activated by someone or something. Nemo bis activated the bot, but Nemo has no control over what the bot does. AManWithNoPlan (talk) 18:56, 4 February 2020 (UTC)
    • It means I asked the bot to work on that page. (I'm no longer using the bot.) I do agree with those changes, especially changing parameter names from "ISBN" to "isbn": it has no visible effect I know of, but it reduces confusion and errors with some things which expect the standard parameter name. "Roger Protz" was not unlinked either (I can't see any occurrence which isn't a link): the link was just expressed in another way which is more compatible with some things and which is apparently recommended by the documentation. Nemo 18:56, 4 February 2020 (UTC)
Roger Protz was unlinked. It was unlinked from one field and then created in another link. Which apparently serves no purpose according to Trappist the monk, and I'm inclined to believe them as they seem to speak knowledgeably about the template. As the edits serve no purpose, they shouldn't be done as the bot is then just making work for no valid reason; but doing those unnecessary changes will prompt article page watchers to check over the edits to make sure nothing has been broken. So, as neither the ISBN change nor the author field change do anything necessary, per WP:COSMETICBOT, could someone adjust the bot so it stops tampering with those fields. SilkTork (talk) 16:31, 5 February 2020 (UTC)
I've filled in bot reports for the author field change and the ISBN change. I've probably worded it incorrectly, but I think the intent is clear. SilkTork (talk) 16:35, 5 February 2020 (UTC)
If these are cosmetic changes, then the edit you flagged is fine: "Such changes should not usually be done on their own, but may be allowed in an edit that also includes a substantive change". Nemo 17:31, 5 February 2020 (UTC)

These are not cosmetic to users of COINS information. Other all caps aliases have been removed and template simplification is a goal as for the ISBN change. AManWithNoPlan (talk) 18:56, 5 February 2020 (UTC)

{{notabug}} since COINS data is repaired. AManWithNoPlan (talk) 15:38, 6 February 2020 (UTC)

Transforms bad interlanguage links into worse interlanguage links

Status
{{fixed}} already
Reported by
CiaPan (talk) 13:56, 6 February 2020 (UTC)
What happens
When the citation template contains an interlanguage link e.g. in the 'authorlink' parameter (which should not happen, anyway it happened), then bot removes the leading colon, thus transforming the link into a corresponding-page interlanguage link. The example edit contains two such modifications, from ':fr:Christophe Galfard' to 'fr:Christophe Galfard' and from ':nl:Thomas Hertog' to 'nl:Thomas Hertog', which spoiled the 'Other languages' links to French and Nederlands Wikipedia.
What should happen
If the leading character is a colon, test whether the next few characters are ASCII lowercase lettters followed by another colon; if so, remove the whole prefix. Possibly the check could use a dictionary or a list of known language prefixes, as well as prefixes for wiki domains (like wiktionary, wikispecies and others)...?
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Stephen_Hawking&diff=prev&oldid=936095918
We can't proceed until
Feedback from maintainers


ClueBot III escapes status templates when archiving

The archive configuration includes many of the status templates (e.g., {{tl|Fixed}}) in the |archivenow= parm. As a result, ClueBot III turns {{tl|Fixed}} into {{Tl|Fixed}} when it archives. Why is this desirable/necessary? —[AlanM1(talk)]— 09:55, 7 February 2020 (UTC)

bot changes cite web to cite ODNB but leaves |work= parameter

Status
{{Fixed}}
Reported by
Trappist the monk (talk) 12:52, 7 February 2020 (UTC)
What happens
bot changes cite web to cite ODNB but leaves |work= parameter
What should happen
remove any |work= alias parameter except the pseudo-alias parameter |encyclopedia=
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers


{{cite ODNB}} is a wrapper template of {{cite encyclopedia}}. As such it sets certain parameters to default values so that editors don't have to. One of those is:

|encyclopedia={{{encyclopedia|[[Dictionary of National Biography#Oxford Dictionary of National Biography|Oxford Dictionary of National Biography]]}}}

|encyclopedia= is one of a few parameters that masquerade as periodicals but aren't. Someday there may be a fix for that in cs1|2.

Trappist the monk (talk) 12:52, 7 February 2020 (UTC)

File breaking

Status
{{notabug}} Looks like an autoed bug in https://en.wikipedia.org/wiki/Wikipedia:AutoEd
Reported by
- Sumanuil (talk) 19:04, 7 February 2020 (UTC)
What happens
Adds space before file extensions, breaking them.
What should happen
Not this.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Provisional_Government_of_the_French_Republic&diff=939626984&oldid=938091794
We can't proceed until
Feedback from maintainers


Why do you believe it was Citation bot that made the error here, Sumanuil? There were 3 or 4 tools used in the mix there. --Izno (talk) 19:12, 7 February 2020 (UTC)

Actually, I'm not sure. But this is where the 'report bug' link went. - Sumanuil (talk) 19:20, 7 February 2020 (UTC)

That is https://en.wikipedia.org/wiki/Wikipedia:AutoEd I am 99% sure. I use it often, but you really have to check it, since it is not reliable AManWithNoPlan (talk) 21:18, 7 February 2020 (UTC)

converts bare arxiv url to cite document when a bibcode is found

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 04:21, 6 February 2020 (UTC)
What happens
cite document [71]
What should happen
cite arxiv [72]
We can't proceed until
Feedback from maintainers


{{cite arxiv}} does not support |bibcode=. I wonder if simply dropping the extra and mostly useless bibcode is bettter. AManWithNoPlan (talk) 14:41, 8 February 2020 (UTC)

I see why that code does not always run. Fixing it now. https://github.com/ms609/citation-bot/pull/2596 AManWithNoPlan (talk) 14:53, 8 February 2020 (UTC)

Makes up URL for fatally incomplete cite web

Status
{{fixed}}
Reported by
Nemo 08:28, 6 February 2020 (UTC)
What happens
When "cite web" is used to cite a document which was accessed online but doesn't have an URL, citation bot converts the "website" field into an URL, which may or may not be relevant.
What should happen
The citation doesn't get any worse with the edit by citation bot, but having an URL which points to the root of the domain obscures the fact that the citation really has no URL. Maybe in the specific linked case a {{citation}} with the website in |via= would be more correct, but I'm not sure.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Kenny_G&diff=prev&oldid=939381574
We can't proceed until
Feedback from maintainers


Obviously, when |website=https://XXXX.YYY.ZZZ.zzz/DSFADS/SDFDSF/DSFAD then conversion to |url= makes sense. Probably, |via= make sense for urls that are just the hostname. AManWithNoPlan (talk) 14:18, 8 February 2020 (UTC)

https://github.com/ms609/citation-bot/pull/2595 AManWithNoPlan (talk) 14:27, 8 February 2020 (UTC)

Creates broken citations by adding urls to templates with title=none

Status
{{fixed}}
Reported by
David Eppstein (talk) 02:38, 8 February 2020 (UTC)
What happens
The citation template allows citations with |title=none, but only when no url is provided, because the title is what gets linked to the url. (Other kinds of links, like doi or pmid, are allowed.) Citation bot doesn't pay attention to this restriction and adds urls to citations that have |title=none. This does not actually work to add the url to the citation, and instead causes the citation template to produce an error message.
What should happen
I'm not sure what the best thing to do is here, but not this. Maybe just not add the urls when there is nothing to link them to. Another possibility would be to change the title from none to something else, taken from the metadata it is using to match the url to the citation.
Relevant diffs/links
Special:Diff/937566836
We can't proceed until
Feedback from maintainers


Only {{cite journal}}/{{citation}} in journal mode permits |title=none. --Izno (talk) 03:10, 8 February 2020 (UTC)

At June Barrow-Green, I've twice had to remove URLs added by Citation bot that point to the wrong place in addition to breaking the templates. The links were supposed to be to reviews of a book, but they pointed to a scanned copy of the doctoral thesis the book was made from. XOR'easter (talk) 04:37, 8 February 2020 (UTC)
Thanks for catching that. In the specific case of Barrow-Green, I've added a temporary exclusion for this bot until the problem is fixed. But that's a different bug: whatever algorithm the bot is using to match up these things is faulty in this case as well. By the way, if you were wondering why one might use title=none: Because none of these reviews really has its own separate title, and because making up something like "Review of [book title]" would be redundant (they are part of a list of reviews of that book labeled as such at the top of the list). jstor:237789, for instance, is labeled by jstor as "[Untitled]", labeled by doi.org as "Poincare and the Three-Body Problem. June Barrow-Green", or labeled on the actual journal page as "June Barrow-Green. Poincaré and the Three-Body Problem. (History of Mathematics, 11.) xvi + 272 pp., illus., figs., apps., bibl., index. Providence/London: American Mathematical Society/London Mathematical Society, 1997. $49." Which of those do you use as the title? Better just to omit it. And I have also seen similar examples where the big long listing of metadata is what you get as a title from doi.org, even including the price at the end. —David Eppstein (talk) 05:03, 8 February 2020 (UTC)
I have clamped down on the OA url adding and it now requires a higher match probability before adding. The unpaywall is sometimes overly optimistic. AManWithNoPlan (talk) 13:45, 8 February 2020 (UTC)
thanks for the note on title=none not allowing a url. https://github.com/ms609/citation-bot/pull/2593 AManWithNoPlan (talk) 14:14, 8 February 2020 (UTC)
Note, removing OAI-PMH matches by title and author affect over 3 million records. This is really an issue about reviews (and bad cataloguing thereof by publishers), which is important but affects a tiny minority of those 3 million records. The best way to handle it is to report issues to Unpaywall (I've already reported this): they are very responsive and everyone can see and share their code and data. Nemo 23:53, 8 February 2020 (UTC)
If your bot consistently uses a source of data known to be bad for a certain class of citations, and consistently breaks those citations, the problem is with your bot and its choice of data to use. Do not pass it off to other people and make it other people's work to correct your mistakes. —David Eppstein (talk) 00:46, 9 February 2020 (UTC)

Removed "::" from a title (I have no idea why)

I've reverted this part of the automated edit in 939746337. BernardoSulzbach (talk) 13:02, 8 February 2020 (UTC)

Double colons will no longer be removed. {{fixed}} AManWithNoPlan (talk) 14:13, 8 February 2020 (UTC)

MOS:DASH

Status
{{fixed}} does not try to fix incorrectly used postscript parameters
Reported by
– Arms & Hearts (talk) 12:47, 11 February 2020 (UTC)
What happens
Citation bots removes non-breaking spaces in the postscript parameter of {{cite journal}}, causing formatting that runs counter to MOS:DASH and common sense.
What should happen
Citation bot should leave postscripts, spaces, dashes, etc. alone.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Whites,_Jews,_and_Us&curid=60146720&diff=940211421&oldid=928828452
We can't proceed until
Feedback from maintainers


Caps: Avtomatika I TelemekhanikaAvtomatika i Telemekhanika

Self-explanatory. Headbomb {t · c · p · b} 13:33, 11 February 2020 (UTC)

{{fixed}}

look at code coverage and TODO

AManWithNoPlan (talk) 22:31, 15 December 2019 (UTC)

{{fixed}} about a dozen bugs AManWithNoPlan (talk) 22:43, 12 February 2020 (UTC)

caps

  • Also Fizika Goreniya I VzryvaFizika Goreniya i Vzryva Headbomb {t · c · p · b} 13:35, 11 February 2020 (UTC)

{{fixed}}

Semantic scholar

Since when has Citation bot been authorized to add Semantic Scholar URLs to citations, as it did in Special:Diff/937558164? Semantic Scholar is a web scraper that sometimes (unintentionally) copies pirated copies of papers. Because its copies do not include any information about where it found its files, they cannot be checked for being free of copyright violations and it cannot be trusted as a source for automatically-generated links. See WP:RSN#Semantic Scholar clarification request. Please immediately stop adding these links. —David Eppstein (talk) 20:34, 25 January 2020 (UTC)

It's definitely been doing this for quite a while. I've noticed the addition for the past week or so, and never considered checking to see if it was supposed to. I probably should have made a ticket when someone messaged me to complain about it, instead of assuming the bot was infaliable and the human mistaken. --~ฅ(ↀωↀ=)neko-channyan 20:43, 25 January 2020 (UTC)
To clarify the addition of Semantic Scholar links to Wikipedia citations: Semantic Scholar is a free, non-profit academic search and discovery engine developed by the Allen Institute for AI (AI2). Semantic Scholar is committed to providing high-quality results that respect copyright. We have licensing agreements to index scientific content from 550+ publishers, pre-print servers and academic societies and we are integrated with multiple data partners including PubMed, Microsoft Academic, Unpaywall and others that provide us with high-quality metadata for our results. As you mention, we also crawl the web for publicly accessible open-access PDFs, but we have procedures in place to address any copyright issues that may arise (please feel free to contact us at feedback@semanticscholar.org if you notice any issues).
Our goal in incorporating links to Semantic Scholar in Wikipedia citations is to provide an additional discovery entry point for Wikipedia users to explore our open literature graph and find additional relevant information that they are unlikely to find elsewhere. For example, we provide AI-based features such as citation classifications and high-quality supplemental content like videos, presentation slides, and links to code libraries (you can see an example here). If you have any additional questions or concerns please let us know, we are happy to provide additional information. Sebaskohl (talk) 00:50, 29 January 2020 (UTC)
To address similar concerns that were highlighted here and to satisfy copyright requirements for linking we plan to do the following:
1. Add an "is_publisher_licensed" boolean flag to the Semantic Scholar API to indicate when a paper has been licensed to us for indexing by one of our 550+ publisher and academic society partners via a signed indexing licensing agreement.
2. Add logic to only insert links via the Citation Bot if the flag is set to ensure that we are linking only to licensed content (this will prevent links to content that was crawled).
Let us know if this will address the concerns that have been raised in this discussion. Sebaskohl (talk) 17:31, 29 January 2020 (UTC)

{{fixed}} AManWithNoPlan (talk) 13:24, 30 January 2020 (UTC)

The fix is incorrectly coded, I've left some comments.[73] Nemo 14:17, 30 January 2020 (UTC)
please don't comment there. I will not read any more comments hidden on a merged pull request. They are hard to find. AManWithNoPlan (talk) 15:17, 30 January 2020 (UTC)
This update will reduce things quite a bit. https://github.com/ms609/citation-bot/pull/2532 AManWithNoPlan (talk) 15:37, 30 January 2020 (UTC)
Using Semantic Scholar as the URL for the citation is counter-intuitive and confusing. When I click on the title of the citation I end up at Semantic Scholar instead of the article. Folks need to know they need to click on the DOI to reach the actual article. The Semantic Scholar URL should be in a different field/parameter of the citation.Whywhenwhohow (talk) 20:04, 31 January 2020 (UTC)
Where is the discussion and consensus to use Semantic Scholar as the URL for a citation?Whywhenwhohow (talk) 20:08, 31 January 2020 (UTC)
Adding publicly available links was discussed as long as licensed. AManWithNoPlan (talk) 22:43, 31 January 2020 (UTC)
I would like to read the discussion. Can you provide a link? According to the cite journal doc, when a DOI is present the URL parameter is supposed to be used for its prime purpose of providing a convenience link to an open access copy which would not otherwise be obviously accessible. The Semantic Scholar pages are not open access copies of the articles. It takes multiple steps to reach the actual article for users that don't know to click the DOI link instead of the title link. If the Semantic Scholar links are useful they should be provided in a separate parameter. Whywhenwhohow (talk) 00:02, 1 February 2020 (UTC)
I don’t have time to find it. But, these links are not added when the open-access system reports that the publisher DOI is free nor does it get added if the doi is flagged in the template as free. AManWithNoPlan (talk) 01:46, 1 February 2020 (UTC)
I should add that if the CiteSeerX or PMC or arXiv is already present them it won’t add either. It’s a very last resort thing now that we filter them. AManWithNoPlan (talk) 01:50, 1 February 2020 (UTC)
If the DOI is unfree, it is very unlikely that the SemanticScholar pdf is an exact copy of the publisher journal version, free, and properly licensed. When I found a SemanticScholar copy of a paper that was otherwise paywalled a couple weeks back, and (indirectly) queried SemanticScholar about how they had obtained and licensed it, their immediate response was to take it down. So my strong impression is that any use you might make of direct links to their pdfs is likely to be inappropriate: either something free elsewhere, something they would take down if they only knew about it, or something that does not accurately represent the publication. It also does not appear to match the intent discussed by their representative above, of providing links to their indexing services. Such a link would only be provided by going to their landing page for a paper, rather than a direct link to the pdf, and could be useful even for paywalled papers that they do not provide pdfs for. But it would only make sense to link to this using an id, not through the url parameter of a citation. —David Eppstein (talk) 08:10, 4 February 2020 (UTC)
much of this discussion does not reflect the current state of the bot code. It will not add the link if there is an exciting arxiv, pmc, CiteSeerX, doi-free=yup, url, or if OA database reports the publisher is free, or if the schematic scholar link is scraped instead of licensed. It’s actually rare to add one. AManWithNoPlan (talk) 15:20, 4 February 2020 (UTC)
The bot appears to have changed to link to the index page of Semantic Scholar rather than to bare pdfs (or maybe it always did this when no pdf is linked): see e.g. Special:Diff/939500674. I think the link additions in this diff are completely ok from the copyright point of view. However, it is an inappropriate use of the url parameter, which should only be for links from which readers can find the paper itself. I agree with Whywhenwhohow above that this is a problem, and I would like to repeat the question: where, in the bot approval process, was this bot approved to add links of this nature to citations? —David Eppstein (talk) 21:29, 6 February 2020 (UTC)
I agree that linking to a semantic scholar page that doesn't containt a link to a PDF is utterly pointless. Headbomb {t · c · p · b} 21:32, 6 February 2020 (UTC)
I wouldn't say completely pointless: you can use those pages to find other works that cite the source, for instance. But because it is not actually a link to the paper itself it belongs in the id parameter (for lack of a designated special parameter for these links) rather than in the url parameter. —David Eppstein (talk) 21:42, 6 February 2020 (UTC)
While I can see possible utility in linking to a page that doesn't contain a link to a PDF, and am more relaxed about the use of url=, yet I will join David in questioning why (and how?) this Semantic Scholar "feature" came about. ♦ J. Johnson (JJ) (talk) 21:45, 6 February 2020 (UTC)
It appears it was added as a result of this change request on Github. I think it would probably be a good idea for the bot operator and maintainers to always request community feedback on this talk page about possible implementations of "new sources" and similar, given past history. It's not okay that this was inserted entirely off Wikipedia and flies somewhat in the face of WP:Consensus, and if I didn't feel as INVOLVED as I do about citation bot I'd have blocked the bot by now. --Izno (talk) 21:54, 6 February 2020 (UTC)
Isn't the Bot Approval Group supposed to approve significant changes in functionality? Where is their approval for this change? —David Eppstein (talk) 05:39, 7 February 2020 (UTC)
Here are some recent examples
https://en.wikipedia.org/w/index.php?title=Calcium_supplement&diff=928593068&oldid=918812972
https://en.wikipedia.org/w/index.php?title=Ceftriaxone&diff=931067948&oldid=923108617
https://en.wikipedia.org/w/index.php?title=Crohn%27s_disease&diff=929682634&oldid=929461108
https://en.wikipedia.org/w/index.php?title=DPT_vaccine&diff=928756663&oldid=927109877
https://en.wikipedia.org/w/index.php?title=Fecal_occult_blood&diff=931031388&oldid=928936257
https://en.wikipedia.org/w/index.php?title=Influenza&diff=938504673&oldid=938461227
https://en.wikipedia.org/w/index.php?title=Isoniazid&diff=937406134&oldid=936739097
https://en.wikipedia.org/w/index.php?title=Laryngopharyngeal_reflux&diff=928608763&oldid=912819138
https://en.wikipedia.org/w/index.php?title=MMR_vaccine&diff=928730115&oldid=927586891
https://en.wikipedia.org/w/index.php?title=Nicotinamide&diff=928945223&oldid=921045063
https://en.wikipedia.org/w/index.php?title=Nifedipine&diff=931143486&oldid=917258760
https://en.wikipedia.org/w/index.php?title=Oseltamivir&diff=934660610&oldid=933206612
https://en.wikipedia.org/w/index.php?title=Peanut&diff=931014845&oldid=926127787
https://en.wikipedia.org/w/index.php?title=Psoriasis&diff=928537694&oldid=927675315
https://en.wikipedia.org/w/index.php?title=Tamoxifen&diff=928017090&oldid=927879423
https://en.wikipedia.org/w/index.php?title=Hand_sanitizer&diff=939563747&oldid=939034797
https://en.wikipedia.org/w/index.php?title=Zinc_pyrithione&diff=939564935&oldid=935650865
https://en.wikipedia.org/w/index.php?title=Acetic_acid&diff=939565429&oldid=938713061
Whywhenwhohow (talk) 03:32, 7 February 2020 (UTC)

So that's why there's suddenly been an increase in semantic scholar links. An obscure repository is nowhere to get consensus, that needs to be done on Wikipedia, and there clearly isn't support for blindly adding semantic scholar links willy nilly, especially when there's no freely accessible PDF at the end of it. Headbomb {t · c · p · b} 05:49, 7 February 2020 (UTC)

I agree it's a bug to add a link if there's no PDF. [74] appears to be such a case. Unpaywall for doi:10.1080/10915810152630729 has no such error. Nemo 07:13, 7 February 2020 (UTC)
The increase came from unpaywall. We then had code implemented to greatly reduce that number being added. I will stop it for now. AManWithNoPlan (talk) 21:22, 7 February 2020 (UTC)
One problem is that the paywall lies https://api.unpaywall.org/v2/10.1080/10915810152630729?email=k@x.com AManWithNoPlan (talk) 21:24, 7 February 2020 (UTC)
Thanks AManWithNoPlan for disabling the API call for now until we figure out the right way to link to ensure there is consensus. Based on the follow-up discussion here it sounds like the right thing to do is to propose to add links to Semantic Scholar IDs as a new identifier type in the Citation Template which can then be used by the Citation Bot. This avoids instances where the URL doesn't give users direct access to the PDF, but will still give users the ability to access licensed content and leverage Semantic Scholar's discovery experience to find and discover research paper content (e.g. ability to browse citations/references, view figures and tables, view extracted snippets of information such as classified citation contexts, find supplemental content such as code libraries, videos, slides, clinical trials and more, etc. [see example]). If yes, it would be great if someone could point me to the right place where I should submit this request (I'm assuming the Citation Template Talk page?). We can then work with the Citation Bot owners to update the Semantic Scholar API call logic. Sebaskohl (talk) 21:27, 7 February 2020 (UTC)
That's a good idea. They seem to have a good set up images, links, etc. AManWithNoPlan (talk) 21:46, 7 February 2020 (UTC)
The paywall as in the big publishers? Sure, their metadata lies all the time. Unpaywall not quite: it sometimes has false negatives or false positives but that's a very small minority thanks to painstaking work over a number of years, open source code and thousands of libraries which use the software and report errors. In the example you link, [75] has no OA links to offer, just a generic link to the publisher and to the pubmed abstract. The publisher happens to have made this PDF available for now (Unpaywall would call it "bronze OA") but such PDFs vanish all the time on the publishers' websites, which are not as reliable as university-provided open archives. Nemo 21:49, 7 February 2020 (UTC)
(edit conflict) Sebaskohl, sure, you can ask a new identifier at Help_talk:Citation_Style_1; I expect there will be some questions but it can continue there. Because the identifier sometimes comes with a full text and sometimes not, it will also need an -access field, similar to the "hdl" field. Speaking of which, maybe it would be easier if you joined the Handle System, then you'd nicely fit in the existing identifiers. Nemo 21:49, 7 February 2020 (UTC)

if we have semantic scholar people here looking at creating a new identifier, I'd be thrilled for that. A few things though. Make the identifier short and snappy, like SemID (because SSID be very confusing), and have a clear structure to the identifier, whether it's pure numbers (|semid=0123456789), or something more elaborate (|semid=1998.02.01.012345). Having those allow us to have validation and makes it much easier to maintain and code bots for. Instead of something like |semid=1fa190b60988a4ad272e39e132bcc12b00429464 which is way too long and human-unreadable. Headbomb {t · c · p · b} 22:37, 7 February 2020 (UTC)

Thank you for the great suggestions Nemo and Headbomb! I will submit a request early next week after collecting some more feedback. The Semantic Scholar API supports redirects using a doi (e.g. http://api.semanticscholar.org/10.1038/nrn3241) which we can use as the identifier instead of our long IDs: (|semid=10.1038/nrn3241).Sebaskohl (talk) 23:16, 7 February 2020 (UTC)
@Sebaskohl: while it's a nifty feature to implement a DOI resolver (it makes it easy to find papers on SS, at least those with DOIs), several papers hosted on SS won't have DOIs, and it would generally make for a poor identifier and cause increased confusion between what is a semantic scholar link, and what's a non-semantic scholar link. Headbomb {t · c · p · b} 23:22, 7 February 2020 (UTC)
I also find it concerning that someone appearing to represent Semantic Scholar is here, apparently working with the goal of incorporating more links to their commercial site into the encyclopedia rather than with the goal of improving the encyclopedia, and with no user-page disclosure of the WP:COI. That is not what the encyclopedia is for and it appears to be a violation of the Wikimedia policies on undisclosed paid editing. —David Eppstein (talk) 00:26, 8 February 2020 (UTC)
Disclosure would be nice, but let's not throw unnecessary epithets. Semantic Scholar is proprietary, but it's not commercial as far as I can tell; moreover, the Allen Institute is a 501(c)(3). Nemo 00:40, 8 February 2020 (UTC)
Honestly I wish ResearchGate and IEEE would do the same and help us expand their references. AManWithNoPlan (talk) 02:22, 8 February 2020 (UTC)
Thank you for the additional feedback Headbomb! Much appreciated. We'll hold off on submitting the request for the identifier until we have a good solution in place that makes sense in terms of best practices. Also, apologies for not making it clearer earlier that I'm part of the Semantic Scholar team (I was hoping the initial overview that I provided in the conversation was sufficient). Semantic Scholar is a free and non-profit academic search and discovery engine developed by the Allen Institute for AI that does not generate any revenue (our site is free of advertising and always will be). Our mission is to contribute to humanity through high-impact AI research and engineering. Here's an example of a sub-project called Supp.ai that we launched last year to identify supplement-drug interactions in scientific literature (a highly unregulated industry) that showcases the type of research that we work on. We also open source our data and code whenever possible (subject to our content licensing agreements). I'm happy to provide additional context as needed! Sebaskohl (talk) 15:54, 10 February 2020 (UTC)
Perhaps |semanticscholar=Y would be the way to go, and it would use the DOI to make the link. That way the parameter could not be vandalized. Also, it would require a DOI first. Lastly, it would make for a pretty link, when all it said was something like See on SS, but better phrased than SS AManWithNoPlan (talk) 19:10, 10 February 2020 (UTC)


Flagging as {{fixed}} to archive it: most of the issues were resolved. The remaining ones are more of a template format/design issue. AManWithNoPlan (talk) 12:33, 15 February 2020 (UTC)

Citation bot

When the Citation bot is changing ONDB citations to the {{cite ODNB}} template, is there a way it could also remove the {{ODNBsub}} template associated with the citation? It not, we end up with references that look like this ("Amery, John (1912–1945)". Oxford Dictionary of National Biography (online ed.). Oxford University Press. 2006. doi:10.1093/ref:odnb/37112. (Subscription or UK public library membership required.) (subscription or UK public library membership required)". Thanks - SchroCat (talk) 17:16, 12 February 2020 (UTC)

I think this will do it once I deploy it. https://github.com/ms609/citation-bot/pull/2632 AManWithNoPlan (talk) 21:29, 13 February 2020 (UTC)
{{fixed}} AManWithNoPlan (talk) 12:31, 15 February 2020 (UTC)

Deleting used parameters?

Status
{{notabug}} user misunderstood edits and how templates work
Reported by
A876 (talk) 18:54, 14 February 2020 (UTC)
We can't proceed until
Feedback from maintainers


I noticed one change that puzzled me, and I found some more. (This bot does smart and useful work, but it might be hiccuping.)

  • It deleted "|format= PDF" for a working link to a PDF file. (A direct link, not an abstract or download page.) This is bad optics, if it's not actually wrong.
  • It deleted a blank "|first=" (okay), but it left (in the same cite) the subsequent "|date=|website=" (not sure of guidelines).
  • It deleted "|date=2017-09-08" from inside "|url-status=live|date=2017-09-08}}". (It was after "|url-status=live". (But "|url-status=live" shouldn't even be last; it should precede "archive-url=" and "archive-date=".)) - A876 (talk) 18:54, 14 February 2020 (UTC)
The removal of format=PDF has been widely discussed. As for the date, I see the opposite: it added the date. Nemo 20:11, 14 February 2020 (UTC)
Re. "|format= PDF": Having been "widely discussed" is not a passive act on the part of deleting "|format= PDF". (Less cryptically,) I looked at the documentation for Template:Cite web and I saw nothing about "|format=" being deprecated, disused, or delete-able. Instead, it is still included in two examples. That means the bot is deleting this field even as humans are adding it. If deleting "|format=" has been so "widely discussed" that a consensus was found and a decision was made (citation needed), then that fact must first be added to the template's documentation, and then the template must be altered to ignore the parameter. After that, bots can incidentally or systematically delete the disused field. Anything else LOOKS LIKE a bot running amok, carrying out an unsourced, un-agreed, counterproductive directive.
Re. "|date= ...": Oops. I might have seen another problem in a random look, but I lost the place because new edits keep pushing old entries down on the user-contributions page, and in too much of a hurry I mis-grabbed what I mis-perceived as another example. - A876 (talk) 06:06, 15 February 2020 (UTC)
that’s all good. I would much rather you show up and say I see a bug and it turn out to not be one then have you not mention it. We have people show up and say 'there is this bug I have been seeing for years and ....' and that is annoying AManWithNoPlan (talk) 12:37, 15 February 2020 (UTC)

author link and inventive editors (2)

Status
{{fixed}}
Reported by
Trappist the monk (talk) 14:23, 4 December 2019 (UTC)
What happens
|author1=[[Robert Jay Charlson|Charlson]] |first1=R. J.|last1=[[Robert Jay Charlson|Charlson]] |first1=R. J. ... |author1-link=Robert Jay Charlson |author1=Charlson
What should happen
first:
|author1=[[Robert Jay Charlson|Charlson]]|last1=[[Robert Jay Charlson|Charlson]]
then:
|last1=[[Robert Jay Charlson|Charlson]] |first1=R. J.|last1=Charlson |first1=R. J. |author-link1=Robert Jay Charlson
or, do nothing because |last1= and |author1= are equal aliases
We can't proceed until
Feedback from maintainers


Update year when adding pagination

Status
{{fixed}}
Reported by
Martin (Smith609 – Talk) 09:40, 6 January 2020 (UTC)
What happens
When updating "in press" pagination of pp. 1--6 with "published" pagination, "year" is left as is.
What should happen
Year could be updated at the same time (paper was in press in 2019, but published in print

with final "year" of 2020)

Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Nectocotis&diff=prev&oldid=934401077
We can't proceed until
Feedback from maintainers


I will look into this. This year I am not giving up Wikipedia for Lent. AManWithNoPlan (talk) 17:06, 16 February 2020 (UTC)

Grove

  • Grove Music Online was converted to cite journal here, removing the access date, the subscription info, and I think causing "CS1 errors: missing periodical".
  • This also removed the access date and the subscription info, and probably generated the same error message. Grove isn't a journal. EddieHugh (talk) 22:53, 4 February 2020 (UTC)
Probably should be cite document instead. AManWithNoPlan (talk) 23:01, 4 February 2020 (UTC)
Perhaps {{GroveOnline}} is a better choice. {{cite document}} is merely a redirect to {{cite journal}} which requires |journal= or some other periodical parameter.
Trappist the monk (talk) 16:39, 5 February 2020 (UTC)
{{GroveOnline}} has problems of its own. EddieHugh (talk) 18:00, 5 February 2020 (UTC)
@Trappist the monk: {{cite document}} shouldn't required |journal=. Headbomb {t · c · p · b} 18:18, 5 February 2020 (UTC)
There is no template called {{cite document}}. The thing that is {{cite document}} is just a redirect to {{cite journal}}. It is {{cite journal}} that {{#invoke}}s Module:Citation/CS1. The module has no knowledge of {{cite document}}; never has.
Trappist the monk (talk) 23:46, 5 February 2020 (UTC)
@Trappist the monk: I'm aware of what the current status is. I'm saying what it should be. Headbomb {t · c · p · b} 00:32, 6 February 2020 (UTC)

10.1093 DOI non-journal special case code added for when converting url to doi. {{fixed}} AManWithNoPlan (talk) 12:29, 16 February 2020 (UTC)

Should not automatically convert work= to publisher=

Status
{{not a bug}}
Reported by
David Eppstein (talk) 01:38, 16 February 2020 (UTC)
What happens
{{cite web}} with publisher= parameter gets it renamed into work=
Relevant diffs/links
Special:Diff/941004586
We can't proceed until
Feedback from maintainers


In the diff that I gave above, the conversion from publisher to work is correct: Forbes is a magazine, not a magazine publisher. However, in many cases cite web is used for stand-alone titles that are not part of any larger work, or with a listed publisher (such as a news organization) instead of a work (the name of the publication in which the news organization published the reference). In those cases, leaving work empty and having a non-empty publisher field can be correct. Help:Citation Style 1 says that the publisher field should not be used to italicize metadata that really is the name of the work or website, but it does not say (and should not say) to avoid empty work and nonempty publisher. So unless Citation bot is a lot smarter than I expect it to be in understanding which publisher names really are work names and which are not, it should leave this field alone. —David Eppstein (talk) 01:38, 16 February 2020 (UTC)

Hmm. In all the examples I found the converted publisher/work is the name of a major newspaper or magazine. If this conversion is only done for a short whitelist of titles, rather than for all empty work nonempty publisher combinations, I think it could be ok (not a bug). I did find one other example, that puzzled me, though: in Special:Diff/940842479 the bot failed to convert "Los Angeles Times" from publisher to work. Is the LA Times not on the whitelist, or was it confused by the explicit empty work parameter that it removed? —David Eppstein (talk) 01:54, 16 February 2020 (UTC)
you are correct, it is a whitelist. LA Times added. For almost nothing except actual books most people mean work when they say publisher, but we use a whitelist since it’s not 100%. {{notabug}} AManWithNoPlan (talk) 12:25, 16 February 2020 (UTC)

It deletes "|format=_", even though {cite _} documentation shows it in examples

{{not a bug}}

This bot deleted "|format= PDF" for working links to PDF files. (Direct links, not links to abstracts or download pages.)
At Special:Diff/940164519 2020-02-10T17:34:18 and Special:Diff/940799133 2020-02-14T13:28:21.

I looked in the one location that makes sense to me, the documentation for Template:Cite web. It includes "|format=PDF" in three examples! I saw nothing about "|format=_" being deprecated, disused, or delete-able.

Citation bot deletes a parameter that humans are still advised to add.

IMHO, WTH? This is not a technical error; the bot isn't running amok. This is a policy error; it looks like a coder has run amok, giving the bot a directive that has no basis. (Does Wikipedia need another layer of watchers?)

(A reply to my prior report was "The removal of format=PDF has been widely discussed. ...." Nemo 20:11, 14 February 2020 (UTC)".) (I updated my prior report, but it got archived.)

I don't know whether deleting "|format=_" "has been widely discussed", or where to look. (I looked in one location that must agree.) Presumably a consensus was found and a decision was made. (Link please?)

Either way, action is required:

  • If it was decided to delete "|format=_", then it must be carried out sensibly (if retroactively). 1) The template's documentation must be adjusted. 2) The template must be altered to ignore the parameter. 3) After that, it is legitimate for editors and bots to incidentally (or systematically) delete the disused parameter.
  • If it was not decided, then this bot must stop undoing what the documentation suggests.

This bot does smart and useful work. Why does it also do something that is contradicted by template documentation? - A876 (talk) 20:26, 16 February 2020 (UTC)

|format=PDF is automatically added by templates, the documentation is out of date and having it in the edit window serves no purpose whatsoever. Headbomb {t · c · p · b} 20:35, 16 February 2020 (UTC)
See User talk:Citation bot/Archive 13#Remove format=pdf and variants when URLs end in .pdf for more details. Headbomb {t · c · p · b} 20:48, 16 February 2020 (UTC)

Explicit |format=pdf is not required as indeed the module automatically sets the parameter where it detects the file to be a PDF (to wit, I believe that is only URLs ending with .pdf--that's just from memory and it would be trivial to find the function in the code). There may be some cases where |format=PDF is preferred, as in the case of something like https://example.com/pdf/N1234; I do not believe Citation bot makes changes on such citations, but I could be wrong. --Izno (talk) 22:02, 16 February 2020 (UTC)

It doesn't. Headbomb {t · c · p · b} 22:06, 16 February 2020 (UTC)

Fails to add bibcodes

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 22:13, 18 February 2020 (UTC)
What should happen
[76]
We can't proceed until
Feedback from maintainers


Gotta love changing APIs. AManWithNoPlan (talk) 23:07, 18 February 2020 (UTC)

More DOIs for IEEE citations

According to a query, IEEE URLs remain among the most intractable for Citation bot: there are some 2-3000 which resist metadata fixes, largely because they don't have a DOI and the usual technical limitations make it hard to find one. Matching over the document/AR number in the CrossRef dump, I believe I can make a list of URLs linked in our articles and their corresponding DOI. Then, it would need to be added by a bot, probably with a regex replacement: is there some bot or AWB operator here interested in doing it? Nemo 10:12, 16 December 2019 (UTC)

Sounds like an AWB bot. It seems that https://ieeexplore.ieee.org/document/##### often seems to have a DOI of 10.1109/JOURNAL_CODE.YEAR.##### which means that there probably a unique number there. AManWithNoPlan (talk) 17:39, 17 February 2020 (UTC)
looks like one can get an account (probably a good job for AWB to do the initial run, but this bot could use a key for long-term purposes). https://developer.ieee.org/docs/read/Metadata_API_details https://developer.ieee.org/member/register https://developer.ieee.org/docs/read/Metadata_API_details AManWithNoPlan (talk) 17:45, 17 February 2020 (UTC)
Maybe IEEE would give you a spreadsheet that has ALL DOIs and numberical IDs in them? AManWithNoPlan (talk) 17:52, 17 February 2020 (UTC)
No need, I can make such a spreadsheet myself from a CrossRef dump. As soon as someone has a use for it, I can produce the list of substitutions needed. Nemo 20:39, 17 February 2020 (UTC)
How big a dump? AManWithNoPlan (talk) 22:29, 17 February 2020 (UTC)
Some tens of GB IIRC, why? Nemo 00:06, 18 February 2020 (UTC)
Just enhanced bot to get a lot more IEEE doi's. AManWithNoPlan (talk) 01:15, 18 February 2020 (UTC)
Tens of GBs seems like a lot more than I would think. AManWithNoPlan (talk) 01:30, 18 February 2020 (UTC)

Flagging as {{fixed}} for this bot. I have made some significant improvements to the bot, but a massive table is not our style nor would it hit the non-Template URLS. AManWithNoPlan (talk) 16:32, 20 February 2020 (UTC)

Overide volume=in press

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 19:05, 19 February 2020 (UTC)
What should happen
[77]
We can't proceed until
Feedback from maintainers


Removed a freely accessible url link that wasn't actually a repeated unique identifier

Status
{{notabug}}
Reported by
Biosthmors (talk) 16:25, 20 February 2020 (UTC)
What happens
A useful url was removed that was misclassified the url as a duplicate of a unique identifier
What should happen
This url should not be recognized as a match for a duplicate identifier
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Deep_vein_thrombosis&type=revision&diff=941761296&oldid=941687672
We can't proceed until
Feedback from maintainers


The DOI resolves to the URL in question. --Izno (talk) 16:29, 20 February 2020 (UTC)

Keep authors together

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 00:29, 21 February 2020 (UTC)
What happens
[78]
What should happen
[79]
We can't proceed until
Feedback from maintainers


That's a cosmetic bug that predates me. https://github.com/ms609/citation-bot/pull/2681 AManWithNoPlan (talk) 14:02, 21 February 2020 (UTC)

incorrect converted bare reference on Macbeth (1948 film)

hello @Smith609: incorrectly converted bare reference on Macbeth (1948 film). see diff Special:Diff/941880753. Leela52452 (talk) 06:48, 21 February 2020 (UTC)

This will fix that. https://github.com/ms609/citation-bot/pull/2682 AManWithNoPlan (talk) 14:14, 21 February 2020 (UTC)

{{fixed}} AManWithNoPlan (talk) 16:58, 21 February 2020 (UTC)

cauthors are not vauthors

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 19:15, 21 February 2020 (UTC)
What happens
[80]
What should happen
[81]
We can't proceed until
Feedback from maintainers


Don't use arxiv to supersede existing dates

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 22:35, 23 February 2020 (UTC)
What happens
[82]
What should happen
[83]
We can't proceed until
Feedback from maintainers


had an = where an === should have been AManWithNoPlan (talk) 11:55, 24 February 2020 (UTC)

Handles list expansion

Headbomb will provide a list of Handle providers that we will add to our constants files AManWithNoPlan (talk) 19:03, 16 October 2019 (UTC)🤔

Time to call in Leeroy Jenkins to extract the handles. AManWithNoPlan (talk) 22:18, 31 October 2019 (UTC)
Feel free to work on User:Headbomb/Sandbox and see which prefix resolves or not. Headbomb {t · c · p · b} 23:45, 31 October 2019 (UTC)

{{wontfix}} will just add as need. AManWithNoPlan (talk) 17:28, 24 February 2020 (UTC)

Disruptive line break replaced with space

Status
{{wontfix}} since 99% percent of the time this is an improvement, and in this case it is just bad to a different bad.
Reported by
kennethaw88talk 03:27, 20 February 2020 (UTC)
What happens
incorrectly replaces newline with space in the middle of a year value in the accessdate
Relevant diffs/links
Special:diff/940129006
We can't proceed until
Feedback from maintainers


Pretty hard to know what causes the error, given if you have a line break in the middle of year, it's very likely that the field is further garbage, like '20 08' for 20 August. Headbomb {t · c · p · b} 14:21, 24 February 2020 (UTC)
I do not see how a bot could intelligently fix this any better. AManWithNoPlan (talk) 14:30, 24 February 2020 (UTC)

Expand arxiv into cite arxiv similar to doi→cite journal/book

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 00:26, 21 February 2020 (UTC)
What should happen
[84]
We can't proceed until
Feedback from maintainers


I assume you mean to do this only when the arxiv template is the only content of a footnote. Otherwise, we'll run into big trouble expanding it in citations where only the arXiv identifier is intended (for instance, as part of larger manually-formatted citations). This happens very very rarely: I did a search for insource:"<ref>{{arxiv" and found two (among some 37 hits, the rest of which did not have the template as the sole content of the footnote. It seems unlikely to be problematic, but is it worth the effort? —David Eppstein (talk) 00:44, 21 February 2020 (UTC)
Yes, <ref>{{arxiv|1006.0499}}</ref> → <ref>{{cite arxiv |arxiv=1006.0499}}</ref>, same as <ref>https://arxiv.org/abs/1006.0499</ref> → <ref>{{cite arxiv |arxiv=1006.0499}}</ref> Headbomb {t · c · p · b} 04:31, 21 February 2020 (UTC)
Same for the other identifier templates it can handle, {{bibcode}}, {{jstor}}, etc... Headbomb {t · c · p · b} 15:43, 21 February 2020 (UTC)
https://github.com/ms609/citation-bot/pull/2697 AManWithNoPlan (talk) 17:43, 24 February 2020 (UTC)

FDA web site is not a journal

Status
{{fixed}}
Reported by
Whywhenwhohow (talk) 03:42, 22 February 2020 (UTC)
What happens
converts cite web to cite journal for FDA pages. Shortens full FDA name to possibly unknown abbreviation.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Live_attenuated_influenza_vaccine&diff=941984391&oldid=938233344
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2694 AManWithNoPlan (talk) 14:09, 24 February 2020 (UTC)

arxiv links should expand to cite arxiv, not cite documents

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 22:08, 23 February 2020 (UTC)
What happens
[85]
What should happen
[86]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2695 AManWithNoPlan (talk) 14:10, 24 February 2020 (UTC)

https://github.com/ms609/citation-bot/pull/2696 AManWithNoPlan (talk) 14:29, 24 February 2020 (UTC)

Bot removed URL, then complained of missing URL

Status
{{not a bug}}
Reported by
DragonHawk (talk/hist) 18:12, 26 February 2020 (UTC)
What happens
Bot edit summary included "Removed URL that duplicated unique identifier. Removed accessdate with no specified URL." It seems to me it shouldn't make a change and then complain about that change. In more practical terms, the URL and accessdate is relevant for things like editorial monitoring, retrieval from web archives, and robustness in the face of other systems changing. Maybe this was operator error, but if so, maybe the bot should warn the operator.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Scorotron&type=revision&diff=929724129&oldid=919029104
We can't proceed until
Feedback from maintainers


The bots actions are correct, they just are a little odd in the phrasing. AManWithNoPlan (talk) 18:15, 26 February 2020 (UTC)

Adding broken bioRxiv DOIs

Status
{{fixed}}
Reported by
Logan Talk Contributions 03:02, 29 February 2020 (UTC)
What happens
The bot is adding broken bioRxiv DOIs (which it then marks as broken). It looks like it's taking whatever is after /content/ in the URL and assuming it's the DOI, but that's not correct. You need to remove the PDF suffix and version, if applicable.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Severe_acute_respiratory_syndrome-related_coronavirus&diff=prev&oldid=943141569
We can't proceed until
Feedback from maintainers


Thank you. AManWithNoPlan (talk) 18:39, 29 February 2020 (UTC)

mixer formatting

Status
{{wontfix}}
Reported by
Redalert2fan (talk) 22:26, 24 February 2020 (UTC)
What happens
date and year added on a new line
What should happen
if possible this: [87]
Relevant diffs/links
[88]
We can't proceed until
Feedback from maintainers


The cite book references were split in half already by a separate line without any reason, but the bot added the date/year on a new line itself. Redalert2fan (talk) 22:26, 24 February 2020 (UTC)

We cannot easily fix that. The bot tries to figure out the best thing based upon existing line breaks. We do a guess. I will at some point look at the guess codee again. AManWithNoPlan (talk) 22:46, 24 February 2020 (UTC)

.com.au URLs

Status
{{fixed}} There was a confusion with the "libs" in the path they made the code think it was a proxy
Reported by
Timrollpickering (Talk) 12:02, 5 March 2020 (UTC)
What happens
URLs from domains ending in .com.au get converted to .com, breaking the URL
What should happen
Keep them as .com.au
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Young_Liberals_(Australia)&diff=prev&oldid=943974587
We can't proceed until
Feedback from maintainers


Bot removes archive links

Status
{{notabug}}
Reported by
awkwafaba (📥) 15:53, 5 March 2020 (UTC)
What happens
Bot removes archive links
What should happen
bot should keep archives, even when removing url that duplicates parameter
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Draft%3APancreas_Disease_in_Farmed_Salmon&diff=prev&oldid=944076280
We can't proceed until
Feedback from maintainers


Archive links must be deleted when the URL is removed because there is nothing to archive if there is no url. Also, there are full copies at the DOI, PMC, etc. No need for a junky archive copy. AManWithNoPlan (talk) 17:01, 5 March 2020 (UTC)

Incorrect cite book

Status
{{wontfix}}
Reported by
Redalert2fan (talk) 22:34, 24 February 2020 (UTC)
What happens
cite web changed in to cite book.
Relevant diffs/links
[89]
We can't proceed until
Feedback from maintainers


The reference in question is a link to a page to buy a book with some information on it, no content from the actual book is being cited or used as a citation, this is not a link to a readable copy of the book. This should probably stay as cite web. Redalert2fan (talk) 22:34, 24 February 2020 (UTC)

See User:Citation_bot/use#..._the_bot_made_a_mistake? Headbomb {t · c · p · b} 23:25, 24 February 2020 (UTC)

Stray dots in volumes

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 14:17, 25 February 2020 (UTC)
What should happen
[90]
We can't proceed until
Feedback from maintainers


GIGO: Spurious text parameter when processing invalid URL

Status
{{notabug}} bot made page better
Reported by
Logan Talk Contributions 18:53, 2 March 2020 (UTC)
What happens
The bot added a spurious text parameter when converting a raw citation to {{cite book}}, which leads to an error message on the page.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=James_Robb_%28pathologist%29&diff=prev&oldid=943585213
We can't proceed until
Feedback from maintainers


Conflicting removals and insertions of redundant external links

Status
{{not a bug}}
Reported by
Francis Schonken (talk) 13:17, 5 March 2020 (UTC)
What happens
Bot operators removing and adding redundant external links
What should happen
Citation bot operators should be instructed to not mess with redundancy in external links:
  1. if there is redundancy the operator should assume that such redundancy is there for a reason, and not mess with it: so take to talk first.
  2. if there is no redundancy, the operator should assume there is no reason to insert it, and take to talk first.
Relevant diffs/links
examples 1 and 2 show the conflicting approach:
  1. Redundancy removed by AManWithNoPlan
    • before the bot: |url=https://journals.qucosa.de/ejournals/bjb/issue/view/173 (...) |doi=10.13141/bjb.v2012
    • bot-changed to: (url removed) (...) |doi=10.13141/bjb.v2012
  2. Redundancy inserted by GreenC
    • before the bot: |chapter-url=https://archive.org/stream/Bach-jahrbuch03.jg1906/BachJahrbuch1906#page/n89 (...) |pages=84–113
    • bot-changed to: |chapter-url=https://archive.org/stream/Bach-jahrbuch03.jg1906/BachJahrbuch1906#page/n89 (...) |pages=[https://archive.org/details/Bach-jahrbuch03.jg1906/page/n89 84]–113 (identical EL inserted)
We can't proceed until
Feedback from maintainers


@AManWithNoPlan and GreenC: not sure whether the pings above reached you (as the template placed my signature above the pings), so re-pinging. --Francis Schonken (talk) 15:42, 5 March 2020 (UTC)

I'm not sure what's your point. InternetArchiveBot doesn't add redundant links in the "url" parameter. Nemo 15:46, 5 March 2020 (UTC)
In the first example above InternetArchiveBot (instructed by AManWithNoPlan) removed a redundant link, which doubled with the doi, complete with url parameter; in the second example above InternetArchiveBot (instructed by GreenC) inserted a redundant link, which doubles with the link from the chapter-url parameter. --Francis Schonken (talk) 16:01, 5 March 2020 (UTC)

The issue with GreenC bot was already reported at its talk so I have no idea why it's being reported on this page, it is clearly a bug to have 3 copies of a URL, it will be fixed but it has nothing to do with Citation bot. Also I don't see any problem with Citation bot's edit. -- GreenC 15:49, 5 March 2020 (UTC)

@GreenC: please discuss removals and insertions of "doubles" of external links in a citation template on the talk pages of the respective articles: the bot has no business there. --Francis Schonken (talk) 16:01, 5 March 2020 (UTC)
The CitationBots removals are well supported by wiki styles and template documentation. Not sure what GreenC is up to. AManWithNoPlan (talk) 17:04, 5 March 2020 (UTC)
"it is clearly a bug to have 3 copies of a URL, it will be fixed" -- GreenC 17:57, 5 March 2020 (UTC)

incorrectly added dates ? on Natalie Batalha

see https://en.m.wikipedia.org/wiki/Special:MobileDiff/943812210

i am sure about ref name="NASA-bio", however i am not sure about ref name="Kepler-bio".

if this is wrong, please excuse Leela52452 (talk) 01:59, 4 March 2020 (UTC)

Those are the dates on the pages. I am not sure what you mean by wrong. AManWithNoPlan (talk) 12:04, 4 March 2020 (UTC)

hello again,

https://web.archive.org/web/20150915203353/http://www.nasa.gov/web/20150915000257/http://www.nasa.gov/mission_pages/kepler/team/batalha.html contains march 12, 2012 and new version of site is slightly different from archived version and cite button is adding 2015 year. excuse for noise Leela52452 (talk) 13:34, 5 March 2020 (UTC)

Since getting dates from archive pages is impossible, perhaps we should not add dates when it is later than archive date. AManWithNoPlan (talk) 01:59, 6 March 2020 (UTC)
https://github.com/ms609/citation-bot/pull/2722 AManWithNoPlan (talk) 18:51, 8 March 2020 (UTC)

{{fixed}}

Publisher that isn't one

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 18:15, 4 March 2020 (UTC)
What should happen
[91]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2720 AManWithNoPlan (talk) 18:39, 8 March 2020 (UTC)

Cover the PLOS caps to all PLOS-related journals

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 18:51, 4 March 2020 (UTC)
What should happen
[92]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2721 AManWithNoPlan (talk) 18:44, 8 March 2020 (UTC)

adds |editor= params when template already has |veditors= param

Status
{{fixed}}
Reported by
Trappist the monk (talk) 16:32, 8 March 2020 (UTC)
What happens
as the section heading says
What should happen
in this particular case, nothing
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2719 AManWithNoPlan (talk) 18:35, 8 March 2020 (UTC)

go looking for bugs

https://en.wikipedia.org/wiki/User:AnomieBOT/Nobots_Hall_of_Shame/0 AManWithNoPlan (talk) 12:29, 24 December 2019 (UTC)

{{notabug}} left, just opinions mostly.

Mass DOI finder by CrossRef

Converting unstructured references is much more fun using https://doi.crossref.org/SimpleTextQuery ! I don't know you, but I get tired copy-and-pasting from articles to a search engine and back. For days I failed to get anything out of it, until I realised that I must paste my list of references into LibreOffice, click the "numbered list" button, and paste the numbered list into the tool. If you have no numbers, or if you add them manually like a human would do, it's not going to do anything.

Although there is no shortage of citation farms and messy citation sections, I wondered if there's a faster way to find the low hanging fruit. So I made a file with 25k lines from the latest English Wikipedia dump, which look like they might be titles of some work by some very simplistic grepping. If you copy up to 1000 lines into https://doi.crossref.org/SimpleTextQuery , you get a decent amount of DOIs and then you can go look for those titles in articles. I did the biggest chunks in the first 2k lines so far. Nemo 21:32, 2 December 2019 (UTC)

I pasted some examples at User:Nemo bis/Missing cite journal. Nemo 13:00, 3 December 2019 (UTC)

{{notabug}} wrong tool. Good luck. AManWithNoPlan (talk) 21:53, 8 March 2020 (UTC)

pp. and p. in page= or pages=

Status
{{fixed}}
Reported by
Grimes2 (talk) 14:47, 25 February 2020 (UTC)
What should happen
[93]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2723 AManWithNoPlan (talk) 21:56, 8 March 2020 (UTC)

Explosion of Multiple ISBNs

Status
{{fixed}} with a length limit on ISBNs
Reported by
Headbomb {t · c · p · b} 06:02, 11 March 2020 (UTC)
What happens
[94]
What should happen
Not that
We can't proceed until
Feedback from maintainers


Special CS2 code

Does not take into account comments in parameters AManWithNoPlan (talk) 21:03, 12 March 2020 (UTC)

https://github.com/ms609/citation-bot/pull/2727 AManWithNoPlan (talk) 21:41, 12 March 2020 (UTC)
{{fixed}}

Redundant pubmed proxy url

Status
{{wontfix}} - too rare. Only one
Reported by
Headbomb {t · c · p · b} 21:39, 12 March 2020 (UTC)
What should happen
[95]
We can't proceed until
Feedback from maintainers


Removes no-break-space from the middle of a multi-digit number in citation title

Status
{{notabug}}
Reported by
David Eppstein (talk) 20:21, 14 March 2020 (UTC)
What happens
Special:Diff/945566938
What should happen
Not that. The no-break-space is important to keep the number in one piece rather than breaking it over a line. More generally, a no-break-space is usually there for a reason; why are they being removed automatically? Where is the discussion and BAG approval for making this sort of change?
We can't proceed until
Feedback from maintainers


The character in question is U+2008 punctuation space. This character is a 'breakable' space; see the unicode properties. From General Punctuation, 'space equal to narrow punctuation of a font'. MOS:DIGITS notes that use of spaces for digit grouping may be problematic for screen readers. Perhaps this is a case where spaces used for digit grouping should be replaced not with other spaces but with commas which do not break.

Trappist the monk (talk) 22:05, 14 March 2020 (UTC)

User-unreadable whitespace is deprecated in general across Wikipedia, see MOS:NBSP. If it's intentional, hardcode it via &nbsp; or {{nbsp}}, like everywhere else. Headbomb {t · c · p · b} 06:47, 15 March 2020 (UTC)
Do not use {{nbsp}} in cs1|2 parameters that are included in the citation's metadata.
Trappist the monk (talk) 10:24, 15 March 2020 (UTC)

"Theses and Dissertations Available from Proquest" is not a journal, and zbMATH should be capitalized zbMATH not ZbMATH

Status
{{fixed}}
Reported by
David Eppstein (talk) 20:38, 14 March 2020 (UTC)
What happens
Special:Diff/945569576
What should happen
not that
We can't proceed until
Feedback from maintainers


Career Communications Group is not a person and Group is not its surname

Status
{{fixed}}
Reported by
David Eppstein (talk) 21:00, 14 March 2020 (UTC)
What happens
Special:Diff/945573899
What should happen
Not that. This is the second time Citation bot has made this same bogus edit.
We can't proceed until
Feedback from maintainers


Citation with wrong doi gets more garbage piled on top of it

Status
{{wontfix}} the unfixable
Reported by
David Eppstein (talk) 21:07, 14 March 2020 (UTC)
What happens
Special:Diff/945572616
What should happen
Citation bot should recognize that the (incorrect) doi and the rest of the citation have nothing in common and not try to add more
We can't proceed until
Feedback from maintainers


date=616

Status
{{fixed}}
Reported by
David Eppstein (talk) 01:02, 15 March 2020 (UTC)
What happens
Special:Diff/945592710
What should happen
Not that. I am getting really frustrated with spending all my time today cleaning up Citation bot's little messes, dropped like presents all over my watchlist. How could the bot imagine that "616" is a valid date?
We can't proceed until
Feedback from maintainers


"2 v." is not a publisher

Status
{{fixed}} with a minimum letter count
Reported by
David Eppstein (talk) 01:09, 15 March 2020 (UTC)
What happens
Special:Diff/945605294
What should happen
Not that.
We can't proceed until
Feedback from maintainers


Garbage volumes

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 17:05, 4 March 2020 (UTC)
What should happen
[96]
We can't proceed until
Feedback from maintainers


Same for issues/pages if that's found in there. Headbomb {t · c · p · b} 17:05, 4 March 2020 (UTC)

Replaces publication-place= with location=

Status
{{not a bug}}
Reported by
Jc3s5h (talk) 17:28, 8 March 2020 (UTC)
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Chronology&type=revision&diff=944567525&oldid=942736253
We can't proceed until
Feedback from maintainers


The parameter "location" is ambiguous. In some Citation Style 1 templates, it only refers to publication place, but in others, such as "cite journal" (which is an alias of "cite news"), when both "location" and "publication-place" are present, "location" refers to the byline dateline, that is, the place the story was written. The bot should not replace a correct unambiguous parameter with a potentially incorrect parameter. Jc3s5h (talk) 17:28, 8 March 2020 (UTC) Fixed 8 March 2020 18:52 UTC

We already discussed this several times: User_talk:Citation_bot/Archive_15#Publication_place, User_talk:Citation_bot/Archive_19#Erroneous_move_of_publication-place_to_location. This talk page is unlikely to be the correct forum to achieve such a change. Nemo 18:20, 8 March 2020 (UTC)

Remove via from {{cite arxiv}}, {{cite biorxiv}}, {{cite citeseerx}}, {{cite ssrn}}

Status
{{fixed}} at least for any template we edit
Reported by
Headbomb {t · c · p · b} 21:48, 9 March 2020 (UTC)
What should happen
[97]
We can't proceed until
Feedback from maintainers


The |via= serves no purpose in those, empty or filled, and should be removed as pointless clutter. Headbomb {t · c · p · b} 21:48, 9 March 2020 (UTC)

Obviously this is WP:COSMETICBOT stuff, so should be treated like an optional edit, to be suggested to users, but only made automatically when there's other things to do. Headbomb {t · c · p · b} 22:02, 9 March 2020 (UTC)

Other more general issues

I would like to suggest, for sake of manual follow-up in editing, that the actions of this and various other citation-fixing bots result in the presentation of the fields in the {{cite... markup so that they roughly follow the presentation of the citation's formatted content. That is, rather than appearing, after your work, as {{cite web | url = ..., etc., that the citation appears in markup as {{cite web | author = | date = | title = | work = | location = | publisher = | url = | url-access = | url-status = | access-date = | archive-url = | archive-date = | quote = ... with other fields inserted in similarly logical order. I would also strongly suggest introducing spaces, as shown in this example (see following). The odd and sometimes semi-random order in which the fields are presented, alongside the run-on nature of the content, make it very difficult to catch mistakes in fields, and to catch all empty fields, and so—for the significantly amplified work involved in trying to improve citation completeness—the work simply does not get done. Making the automated output easier to work with should at least be worth a beta test. Cheers, a prof and former logging editor. 2601:246:C700:19D:F47B:FAEC:3C25:6306 (talk) 05:24, 15 March 2020 (UTC)

After writing it zillions of times by hand, I prefer {{Cite web |parm1=value1 |parm2=value2 |parm3=value3 ...}}, so the pipe, the parm, and the value try to stay together when it (inevitably) has to line-wrap, and having to do with my programming sense, error likelihood, logical equivalence to the "vertical" format (thinking of the pipe as a prefix to the parm), aesthetics, etc.. As far as parm order, author first is pretty uncommon in existing usage, too, since people that create cites manually usually start with a URL and then read and enter the title, author, date, etc. I wouldn't object to that order coming out of automated tools, though. —[AlanM1 (talk)]— 08:52, 15 March 2020 (UTC)
there is an effort to keep things in a reasonable order, but when adding to existing there is nothing reasonable usually. Also, reordering of existing parameters is something that has been talked about, but we will never do because it ticks way to many people off. AManWithNoPlan (talk) 11:01, 15 March 2020 (UTC)
Insuring a space to the left of each "|" would be helpful in creating reasonable line breaks.
I can understand people getting upset with automated reordering of parameters; if there was any logical ordering in the article, there won't be when citation bot gets done with it. But if it comes up in this or another automated process, I wouldn't agree with "as far as parm order, author first is pretty uncommon in existing usage...." (AlanM1) The citations may be in an alphabetical list, or may be so rearranged later. In such cases, the authors should come first, in the same order as in the publication, then the date. If there are no authors, the title should come first. This facilitates manual alphabetical ordering when working with wikitext. If the process doesn't have access to the publication, the authors should be kept in the same order before the alteration. Jc3s5h (talk) 12:15, 15 March 2020 (UTC)
What you ask for will require a major discussion for bot approval and hug buy-in from the template crowd, etc. And you will never get it. The bot makes these thing better, but we cannot achieve perfection since no one agrees on what that is. AManWithNoPlan (talk) 13:33, 15 March 2020 (UTC)
It would probably be best handled as a script. (Although I still support a TNT checkbox for use on individual articles through the Citations button, since that's functionally a script.) Headbomb {t · c · p · b} 17:08, 15 March 2020 (UTC)
I'm not entirely sure why, but WP:CITEVAR has generally been interpreted as asking for the formatting of citation templates themselves to be preserved, not just the visible results of the template. —David Eppstein (talk) 17:19, 15 March 2020 (UTC)

Mostly to prevent pointless edit wars and arguments about multiline vs single line presentations and between sane variants of parameter order like last/first or first/last in the edit window. It would be very hard for a bot to know that

{{cite journal |pages=214–215 |title=A Schematic Model of Baryons and Mesons |journal=[[Physics Letters]] |last=Gell-Mann |volume=8 |first=M. |year=1964 |doi=10.1016/S0031-9163(64)92001-3 |bibcode=1964PhL.....8..214G |issue=3}}

is ridiculous formatting, but that

{{cite journal |last=Gell-Mann |first=M. |year=1964 |title=A Schematic Model of Baryons and Mesons |journal=[[Physics Letters]] |volume=8 |issue=3 |pages=214–215 |bibcode=1964PhL.....8..214G |doi=10.1016/S0031-9163(64)92001-3}}

is entirely fine, just as

{{cite journal |first=M. |last=Gell-Mann |year=1964 |title=A Schematic Model of Baryons and Mesons |journal=[[Physics Letters]] |volume=8 |issue=3 |pages=214–215 |bibcode=1964PhL.....8..214G |doi=10.1016/S0031-9163(64)92001-3}}

would be. Headbomb {t · c · p · b} 17:29, 15 March 2020 (UTC)

By the way, my default parameter orderings (plural!) are: (1) authors first, everything else alphabetical, so that I can find them quickly without having to remember how the "logical" ordering of parameters works, or (2) whatever order I get them from the site I'm getting the citation from, so that I don't have to put effort into hand-ordering the parameters. —David Eppstein (talk) 17:33, 15 March 2020 (UTC)

Authors first + everything else alphabetical like

{{cite journal |last=Gell-Mann |first=M. |bibcode=1964PhL.....8..214G |doi=10.1016/S0031-9163(64)92001-3 |issue=3 |journal=[[Physics Letters]] |pages=214–215 |title=A Schematic Model of Baryons and Mesons |volume=8 |year=1964}}

is a pretty ridiculous ordering. Best practice is something that somewhat resembles presentation order and groups similar things together. Authors/Editors, dates, chapter/title/journal/series/publisher, volume/issue/pages, identifiers, urls. Headbomb {t · c · p · b} 17:54, 15 March 2020 (UTC)

It is a useful ordering, because that way I can use alphabetization to quickly spot the parameter I'm looking for. I would be annoyed if a bot started making cosmetic changes to reorder it. —David Eppstein (talk) 18:47, 15 March 2020 (UTC)

I'm going to mark this as a {{wontfix}} since there's just too many problems with this. Headbomb {t · c · p · b} 14:48, 16 March 2020 (UTC)

Wrong year

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 06:16, 16 March 2020 (UTC)
What happens
Bang&diff=945800823&oldid=945742854
What should happen
Leave correct dates alone
We can't proceed until
Feedback from maintainers


I seem to remember you arguing for updating dates with newer Crossref based dates a while ago. I will investigate what this AManWithNoPlan (talk) 11:07, 16 March 2020 (UTC)

https://github.com/ms609/citation-bot/pull/2736. Crossref is not God. AManWithNoPlan (talk) 11:26, 16 March 2020 (UTC)
I argued for that when upgrading from cite arxiv --> cite journal. Headbomb {t · c · p · b} 14:46, 16 March 2020 (UTC)

p. or page in |page= or |pages=

Status
{{fixed}}
Reported by
Grimes2 (talk) 07:48, 16 March 2020 (UTC)
What should happen
https://en.wikipedia.org/w/index.php?title=Cheryl_Heller&diff=945807047&oldid=945805877
We can't proceed until
Feedback from maintainers


There are several variations to treat: page, Page, pages, Pages, p. Grimes2 (talk) 07:48, 16 March 2020 (UTC)

https://github.com/ms609/citation-bot/pull/2737 AManWithNoPlan (talk) 11:36, 16 March 2020 (UTC)

Italic or bold in |publisher=

Status
{{wontfix}} but wish we could
Reported by
Grimes2 (talk) 13:18, 16 March 2020 (UTC)
What should happen
https://en.wikipedia.org/w/index.php?title=Tara_Stevens&diff=945836494&oldid=931528460
We can't proceed until
Feedback from maintainers


This would fix markup errors: Category:CS1 errors: markup

Italic ('') or bold (''') markup not allowed in: |<param>n=

  • |publisher=
  • |journal=
  • |magazine=
  • |newspaper=
  • |periodical=
  • |website=
  • |work=

Grimes2 (talk) 13:18, 16 March 2020 (UTC)

To do it right isn't as simple as just stripping the markup as you did. Don't do that. In your example, Metropolitan Barcelona is a magazine. So, what should happen is:
|publisher=''Barcelona Metropolitan'' (Barcelona's magazine in English)
should be changed to:
|magazine=Barcelona Metropolitan
and the template changed from {{cite web}} to {{cite magazine}}
If this bot does anything with this category of errors, for those templates with improper italic markup, it should (and I believe that it does to some extent) maintain a dictionary of periodicals from which it can determine the correct template name and periodical parameter. In your example, |publisher= also contains editorial commentary which it should not. We should not expect this, or any other, bot to know what to do with that kind of improper parameter content. The bot can remove bold markup outright – though that markup is, when compared to italic markup, somewhat rare.
Trappist the monk (talk) 13:42, 16 March 2020 (UTC)
Too complicated for a bot. It's better to do it manually. {{wontfix}} Grimes2 (talk) 15:14, 16 March 2020 (UTC)
we have a very short whitelist we use for this type of thing. AManWithNoPlan (talk) 19:06, 16 March 2020 (UTC)

Caps: I, U, Y

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 13:45, 25 February 2020 (UTC)
What should happen
[98] [99]
We can't proceed until
Feedback from maintainers


Too many of those to assume any default behaviour when not on a whitelist. 'I' and 'i' should be left alone . Headbomb {t · c · p · b} 14:17, 25 February 2020 (UTC)

Same for U and Y. Headbomb {t · c · p · b} 05:05, 6 March 2020 (UTC)

https://github.com/ms609/citation-bot/pull/2741 AManWithNoPlan (talk) 12:17, 17 March 2020 (UTC)

Expand journals if title=none

Status
{{fixed}}, but will not replace title=none part because of it is a magic word.
Reported by
Headbomb {t · c · p · b} 04:56, 6 March 2020 (UTC)
What happens
If |title=none, the bot fails to expand empty |journal= etc because it thinks there's no title match
What should happen
Ignore |title=none for purpose of matching
We can't proceed until
Feedback from maintainers


Example of a failure please. AManWithNoPlan (talk) 18:32, 8 March 2020 (UTC)

@AManWithNoPlan: Try on this one
vs
Headbomb {t · c · p · b} 17:20, 15 March 2020 (UTC)

Citation bot userbox

Hello, I've created a userbox for those who use Citation bot. Jerm (talk) 01:23, 9 March 2020 (UTC)

Wikitext userbox where used
{{User wikipedia/Citation bot}}
 This user fixes citations with the help of Citation bot.
linked pages

{{notabug}} flag to archive and copying to non talk page AManWithNoPlan (talk) 00:44, 17 March 2020 (UTC)

Another incorrect capitalization of stop words in a non-English journal title

Status
{{fixed}}
Reported by
David Eppstein (talk) 05:58, 17 March 2020 (UTC)
What happens
Special:Diff/945961694
What should happen
According to our article on its translated mirror, the "i" and "ee" should be lowercase.
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2741 AManWithNoPlan (talk) 12:17, 17 March 2020 (UTC)

Edit-warring

The bot seems currently engaged in a slow edit-war at the Christmas Oratorio page ([100]). Please stop that behaviour. The proper way is to take the issue up at the article's talk page. Tx. --Francis Schonken (talk) 10:41, 17 March 2020 (UTC)

{{notabug}} Cannot stop three different people AManWithNoPlan (talk) 11:45, 17 March 2020 (UTC)

How would a bot go and discuss it's edits on a talk page btw? I think it is made pretty clear that this bot is user activated... --Redalert2fan (talk) 12:11, 17 March 2020 (UTC)

Adsabs issue

https://tools.wmflabs.org/citations/process_page.php?edit=toolbar&slow=1&page=Siphonostomites

is throwing an AdsAbs issue that looks like it might be fixable:

> Checking AdsAbs database
  ! Error 400 in query_adsabs: org.apache.solr.search.SyntaxError: Query exceed maxAllowedDepth of 100 tokens for query redistribution: Message with key:Query exceed maxAllowedDepth of 100 tokens for query redistribution and locale: en_US not found.
- URL was:  https://api.adsabs.harvard.edu/v1/search/query?q=title:%22Excursion+guidebook+CBEP+2014-EPPC+2014-EAVP+2014-Taphos+2014+Conferences%3A+The+Bolca+Fossil-Lagerst%C3%A4tten%3A+A+window+into+the+Eocene+World%22&fl=arxiv_class,author,bibcode,doi,doctype,identifier,issue,page,pub,pubdate,title,volume,year

Martin (Smith609 – Talk) 10:47, 17 March 2020 (UTC)

We cannot get around it since it is internal error. {{fixed}} This will make the message no longer red text and will make the text more accurate. https://github.com/ms609/citation-bot/pull/2740 AManWithNoPlan (talk) 12:07, 17 March 2020 (UTC)

Replacement of `publication` with `publicationdate`

Status
{{fixed}}
Reported by
Martin (Smith609 – Talk) 10:48, 17 March 2020 (UTC)
What happens
Closest lexical match to 'publication' is 'publicationdate'... is it worth hard-coding a more suitable alternative ('journal'?)?
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Siphonostomites&diff=prev&oldid=945986980
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2739 AManWithNoPlan (talk) 11:54, 17 March 2020 (UTC)

Still multiple erors

For instance in Viola (plant) it changed a book url to a chapter-url, when no such url exists, the reference was to the book. Michael Goodyear   23:55, 18 March 2020 (UTC)

Thank you. Added some more code {{fixed}}. AManWithNoPlan (talk) 13:06, 19 March 2020 (UTC)

chapter / title error and editor-list error

Status
new bug
Reported by
Trappist the monk (talk) 11:55, 19 March 2020 (UTC)
What happens
1. bot deleted |chapter= and then renamed |title= to |chapter=
2. added |editorn-first= and |editorn-last= when template already has |veditors=
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers


{{fixed}} editor problem. Added comments to the article itself to deal with usage of complete book DOI with chapter, etc. AManWithNoPlan (talk) 13:50, 19 March 2020 (UTC)

books.google.com/books?id= not clean enough

Status
{{fixed}} once GitHub comes back from the dead
Reported by
T3g5JZ50GLq (talk) 04:37, 12 March 2020 (UTC)
T3g5JZ50GLq (talk) 04:43, 12 March 2020 (UTC)
What happens
not clean enough
What should happen
less URL
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=I._F._Stone&diff=next&oldid=942824455
Replication instructions
on: a books.google.com/books?id=....... citation, remove : #v=onepage.........., as this is part of a google redirect or something and does not affect the resulting content returned.
equals
equals
We can't proceed until
Feedback from maintainers


I would like these urls stripped down to id= and pg=. All else is unnecessary. —David Eppstein (talk) 04:53, 12 March 2020 (UTC)
It's tricky and easy to cause damage. They don't always have page numbers and when they do there can be multiple page number arguments. I believe the last one takes priority? Removing the quotes dq= unclear that should be done as it allows highlighting of passages, but there are multiple types of quotes eg. ldq= and unclear which takes priority based on position in the URL or name of argument. There is no documentation for Google URLs so everything is based on supposition and in my experience when you think you understand it you then find exceptions where it works differently. Would be great if someone took this one to find all the permutations and rules and document it for the world. -- GreenC 15:41, 12 March 2020 (UTC)
One thing for sure that should not be done is to convert the quote parts of a google url into |quote= when the google url is converted to archive.org url; highlighted search string is not a quotation as |quote= is a quotation. (T246762 – yeah, I know, not this bot ...)
Trappist the monk (talk) 15:54, 12 March 2020 (UTC)
Unless someone can docuement this better, we simply will continue to not touch the post hash stuff. Although maybe everything AFTER the hash should be deleted? AManWithNoPlan (talk) 22:24, 14 March 2020 (UTC)
https://github.com/ms609/citation-bot/pull/2749 AManWithNoPlan (talk) 14:19, 21 March 2020 (UTC)
Compare:
The first case is how it exists on Wikipedia. The second case is how it would be if the fragment were removed. Another:
Different results. @AManWithNoPlan: (User:AManWithNoPlan) -- GreenC 20:41, 21 March 2020 (UTC)

https://github.com/ms609/citation-bot/pull/2750 AManWithNoPlan (talk) 20:52, 21 March 2020 (UTC)

Edits at Sociology of language

Status
new bug
Reported by
Cnilep (talk) 03:18, 21 March 2020 (UTC)
What happens
|title= changed to |chapter=
What should happen
nothing
Relevant diffs/links
https//en.wikipedia.org/w/index.php?title=Sociology_of_language&diff=next&oldid=930451731
We can't proceed until
Feedback from maintainers


I'm not certain whether this is a bug or some interaction with human error. This December 2019 edit to Sociology of language repeated the book title as chapter. A human editor removed that parameter the next day. Citation bot changed then title= to chapter= in March 2020. Cnilep (talk) 03:18, 21 March 2020 (UTC)

Seems related to a bad url/doi, which eventually got fixed? Headbomb {t · c · p · b} 04:32, 21 March 2020 (UTC)

{{notabug}} since bad DOI AManWithNoPlan (talk) 01:47, 22 March 2020 (UTC)

Adds new dates in non-ideal format

Status
mostly {{fixed}}, use of DMY and MDY templates would help more
Reported by
David Eppstein (talk) 20:51, 14 March 2020 (UTC)
What happens
Special:Diff/945572370
What should happen
The date format used here, YYYY-MM-DD, is acceptable according to the MOS only for accessdates. Publication dates require either Month DD, YYYY or DD Month YYYY. I have been spending far too much time today fixing badly formatted dates for articles on my watchlist, and I think I have missed many more. Stop it.
We can't proceed until
Feedback from maintainers


You are incorrect or at least have a significantly different interpretation of the interesting MOS rule. MOS:DATEUNIFY permits these also in citation publication dates per

Publication dates in an article's citations should all use the same format, which may be:

  • ...
  • an abbreviated format from the "Acceptable date formats" table, provided the day and month elements are in the same order as in dates in the article body, or

Of which ISO 8601 is one of the included date formats. --Izno (talk) 21:03, 14 March 2020 (UTC)

You are the one that is incorrect. The added dates are not all in the same format as the rest of the article's citations. This is not allowed by the part of the MOS that you directly quoted. —David Eppstein (talk) 21:11, 14 March 2020 (UTC)
The issue you reported was not inconsistency, it was that the date format was simply wrong for use a publication date. It is not. Which is the issue you have? --Izno (talk) 21:11, 14 March 2020 (UTC)
Since the standard templates for date style are not present; how does someone suggest we proceed. AManWithNoPlan (talk) 02:02, 15 March 2020 (UTC)
Better dates with a different style than no dates at all. AManWithNoPlan (talk) 02:04, 15 March 2020 (UTC)
You just keep convincing yourself that your bot is doing good instead of making work for others. —David Eppstein (talk) 04:55, 15 March 2020 (UTC)
The work was there to be done before since the date was missing. The bot facilitates the work by putting said missing date. Could the bot's logic be improved? Possibly. Maybe by seeing what other citations uses for date format. But a date is better than no date. Headbomb {t · c · p · b} 06:50, 15 March 2020 (UTC)
Adding a date in YYYY-MM-DD format where there is none is clearly an improvement, while venturing a guess to what other formats to use sounds dangerous. Nemo 17:51, 15 March 2020 (UTC)
Most of the dates added are from web pages rather than dated publications. It is not at all obvious to me that they are helpful or improvements. (For most web pages, accessdates are more important than the date the web page claims to have been created or updated.) —David Eppstein (talk) 00:49, 17 March 2020 (UTC)
I completely agree. That’s why we don’t add dates that are after archive or access date. People really need to add access dates. AManWithNoPlan (talk) 01:53, 17 March 2020 (UTC)

https://github.com/ms609/citation-bot/pull/2754 AManWithNoPlan (talk) 20:46, 22 March 2020 (UTC)

OAuth requests

Status
{{fixed}}
Reported by
Ⓩⓟⓟⓘⓧ Talk 22:34, 21 March 2020 (UTC)
What happens
It requests new OAuth it seems very frequently, it gets annoying with trying to run bot on multiple cats within the same day.
We can't proceed until
Feedback from maintainers


No idea why. It's only a minor inconvenience. It might be related to the bot requesting both identity and edit permissions. We actually only need identity. AManWithNoPlan (talk) 23:08, 21 March 2020 (UTC)

It's really annoying for sure. Never seems to remember it's permission for much more than 5-10 minutes. Super annoying when you're asking the bot to process 20-25 distinct pages and then they each fail, and then you have to reload each page, ask for the first one to be processed, wait for OAuth to ask for permission, and then request the other 19-24 pages to be processed. Headbomb {t · c · p · b} 23:12, 21 March 2020 (UTC)
I think I figured it out. Stay tuned. AManWithNoPlan (talk) 23:58, 21 March 2020 (UTC)
give it a shot. AManWithNoPlan (talk) 00:26, 22 March 2020 (UTC)
Will take some time to know for sure, but after a day or two, I'll likely know if things have improved. Headbomb {t · c · p · b} 00:51, 22 March 2020 (UTC)

Under heavy load

Status
{{fixed}} for now
Reported by
Joseywales1961 (talk) 23:09, 22 March 2020 (UTC)
What happens
bot hangs while trying to fix refs (2 bare refs that I then fixed manually) on page Pickaninny and four or five other pages I attempted to use it on today
We can't proceed until
Feedback from maintainers


WP:CITEVAR violation using citation bot

{{fixed}} - copy from my talk page

When using citation bot: please be more careful about not changing instances of {citation} to {cite book} (especially where the source is not a book) where the former is the established usage, as done here at Puget Sound faults, and other places. (Haven't I mentioned this before?) Nor should the first author's first/last be concatenated with preceding line, as it makes it harder to scan the citation for accuracy. Your attention to this would be appreciated. ♦ J. Johnson (JJ) (talk) 20:46, 12 March 2020 (UTC)

I see the problem. It has a journal set which is invalid for citation, so it has to be changed to cite book. BUT, the journal is set to a comment which is a strange edge case. AManWithNoPlan (talk) 21:00, 12 March 2020 (UTC)
https://github.com/ms609/citation-bot/pull/2727 AManWithNoPlan (talk) 21:41, 12 March 2020 (UTC)

Question...

Why is citationbot stripping notable authors of being wikilinked, instead adding new authorlink fields?

Did someone decide this was a good idea? Doesn't it lapse from the long honoured engineering principle of "Don't fix it if it ain't broke"? Geo Swan (talk) 01:47, 25 March 2020 (UTC)

It's not? Diff? Headbomb {t · c · p · b} 02:53, 25 March 2020 (UTC)
I did see one diff recently where it took two consecutive authors (author2 and author3) with linked names, moved the link in author3 from that parameter to an author3-link parameter just before where it was (good), and moved the link in author2 from that parameter to an author2-link parameter placed all the way at the end of the citation (bad). Unfortunately I don't remember which article it was and didn't save a bookmark. But that's cosmetic, not at all the same as stripping links. —David Eppstein (talk) 05:06, 25 March 2020 (UTC)
It is not just cosmetic, it fixes the COINS data. AManWithNoPlan (talk) 12:40, 25 March 2020 (UTC)
By "cosmetic", I meant the bad placement of the link parameter, not the choice to put the link in a different parameter than the name. —David Eppstein (talk) 06:43, 26 March 2020 (UTC)
Umm, not true. It is ok to wikilink |authorn= using either style of wikilink; both of these produce acceptable metadata:
{{cite book |title=Title |author=[[Abraham Lincoln]]}}
Abraham Lincoln. Title.
'"`UNIQ--templatestyles-00000050-QINU`"'<cite id="CITEREFAbraham_Lincoln" class="citation book cs1">[[Abraham Lincoln]]. ''Title''.</cite><span title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Title&rft.au=Abraham+Lincoln&rfr_id=info%3Asid%2Fen.wikipedia.org%3AUser+talk%3ACitation+bot%2FArchive+19" class="Z3988"></span>
{{cite book |title=Title |author=[[Abraham Lincoln|Lincoln, A.]]}}
Lincoln, A. Title.
'"`UNIQ--templatestyles-00000054-QINU`"'<cite id="CITEREFLincoln,_A." class="citation book cs1">[[Abraham Lincoln|Lincoln, A.]] ''Title''.</cite><span title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Title&rft.au=Lincoln%2C+A.&rfr_id=info%3Asid%2Fen.wikipedia.org%3AUser+talk%3ACitation+bot%2FArchive+19" class="Z3988"></span>
|author-linkn= is intended for cs1|2 templates that use |lastn= and |firstn= so that the whole name may be rendered as a single wikilink. I have seen cases like this:
|last=[[Abraham Lincoln|Lincoln]] |first=Abraham
That kind of construct should probably be changed to:
|last=Lincoln |first=Abraham |author-link=Abraham Lincoln
Trappist the monk (talk) 13:17, 25 March 2020 (UTC)

Wikilinks in authors used to generate corrupt COINS data. Interesting that this has been fixed. AManWithNoPlan (talk) 21:26, 25 March 2020 (UTC)

Since this is no longer a COINS problem and is {{fixed}}. Will no longer do author links, but still do last links. https://github.com/ms609/citation-bot/pull/2755 AManWithNoPlan (talk) 00:28, 26 March 2020 (UTC)

Minor edits

Should the minor flag be removed? AManWithNoPlan (talk) 21:47, 25 March 2020 (UTC)

As soon as this is accepted, the minor flag will be removed from edits. Too many things being done now to qualify as minor. https://github.com/ms609/citation-bot/pull/2756 AManWithNoPlan (talk) 22:06, 25 March 2020 (UTC)
The context is that I asked AManWithNoPlan not to mark edits as minor when they may require review. The policy on WP:Minor edits is "A minor edit is one that the editor believes requires no review and could never be the subject of a dispute."
The edit in question changed a reference from {{cite web}} to {{cite document}}, which was incorrect. The source used originally was a web page, but latterly we have been able to access an online archive of the original magazine article, so it has ended up as {{cite magazine}}. The problem is that marking these edits as "minor" should be a guarantee that they require no review, yet this edit evidently required review, and could have been missed.
I appreciate that the scope of the bot has expanded over time, to the extent that some of its edits may benefit from review, but it is already flagged as a "bot" edit, which will allow editors who don't want to review bot edits to ignore them. Flagging these edits as "minor" as well is surely disadvantageous, as it is no longer possible to be sure that the changes produces need no scrutiny. --RexxS (talk) 22:13, 25 March 2020 (UTC)
There's really nothing that changes in the appearance from changing a cite web to a cite document, save for correctly displaying the volume/issue information, so I don't really see why that's something that should particularly require review. Compare
Bren, Linda (November–December 2002). "Oxygen Bars: Is a Breath of Fresh Air Worth It?". FDA Consumer. pp. 9–11. PMID 12523293. Retrieved 25 March 2020.
Bren, Linda (November–December 2002). "Oxygen Bars: Is a Breath of Fresh Air Worth It?" (Document). pp. 9–11. {{cite document}}: Cite document requires |publisher= (help); Unknown parameter |accessdate= ignored (help); Unknown parameter |issue= ignored (help); Unknown parameter |magazine= ignored (help); Unknown parameter |pmid= ignored (help); Unknown parameter |url= ignored (help); Unknown parameter |volume= ignored (help)
Headbomb {t · c · p · b} 22:48, 25 March 2020 (UTC)
The point is not whether a particular edit, such as the one that triggered the request was significant. It wasn't. This is what we should see:
Bren, Linda (November–December 2002). "Oxygen Bars: Is a Breath of Fresh Air Worth It?". FDA Consumer. Vol. 36, no. 6. pp. 9–11. PMID 12523293. Retrieved 25 March 2020.
The point is that the bot demonstrably makes mistakes and edits that may require review, so the minor flag is inappropriate (as well as unnecessary). --RexxS (talk) 23:08, 25 March 2020 (UTC)
Yeah, remove the minor flag. I guess I'm surprised this bot's actions were ever considered minor since they always made a non-negligible change to the pages it visited (even when all it was doing was expanding cite dois and others). --Izno (talk) 22:48, 25 March 2020 (UTC)

{{fixed}} this decade old oddity AManWithNoPlan (talk) 23:11, 25 March 2020 (UTC)

Now wait for the crowd soon coming to demand that the bot edits be marked minor. :) Nemo 06:15, 26 March 2020 (UTC)

Suggest modifying Zotero timeout

Status
Annoying from time to time, but {{fixed}} is not obvious right now
Reported by
Martin (Smith609 – Talk) 08:56, 3 March 2020 (UTC)
What happens
Zotero allows 15s before reporting a timeout.

This is maybe fine when running the bot from a URL, but when using the "citations" button it led me to give up and abort the run. It seems to me that 15000ms is a very long time to wait, particularly if there are multiple Zotero calls on a page: would 150ms still be sufficient?

> Using Zotero translation server to retrieve details from URLs.
  ! Operation timed out after 15001 milliseconds with 0 bytes received   For URL: http://sp.sepmonline.org/content/sepsp088/1/SEC6.abstract
  ! Operation timed out after 15000 milliseconds with 0 bytes received   For URL: http://www.paleoportal.org/kiosk/sample_site/fossil_gallery_109_images.html
  ! Operation timed out after 15001 milliseconds with 0 bytes received   For URL: http://ichnology.ku.edu/invertebrate_traces/tfimages/zoophycos.html
We can't proceed until
Feedback from maintainers


In normal circumstances I'd say that anything above 1000 ms is crazy slow. However I have no idea what's the median response time from our Zotero server. Do you know? Nemo 20:07, 3 March 2020 (UTC)
This is the total time from initiating the connection until data is received and the connection is closed. There is a separate timeout for just connecting. The more urls on the page, the shorter the timeout. AManWithNoPlan (talk) 20:25, 3 March 2020 (UTC)
   if ($url_count < 5) {
     curl_setopt($ch_zotero, CURLOPT_TIMEOUT, 15);
   } elseif ($url_count < 25) {
     curl_setopt($ch_zotero, CURLOPT_TIMEOUT, 10);
   } else {
     curl_setopt($ch_zotero, CURLOPT_TIMEOUT, 5);
   }
If we reduced that to, say, 3, 2 and 1 respectively, would we be able to tell from the logs or something whether the success rate (however defined) increases? Nemo 20:56, 3 March 2020 (UTC)
I had something similar for User:Bibcode Bot, but I had increasing timeouts (5/10/15 seconds) for the ADSABS database before failure. But this was a bot doing its on thing, without anyone waiting after it. For what's essentially a communal tool, I'd say 10 seconds total wait time for a single url should be more than enough. And if multiple distinct Zotero calls fail in succession, maybe skip Zotero for the next 5 minutes so we're not constantly querying a dead connection during a server hiccup or something. Headbomb {t · c · p · b} 22:23, 3 March 2020 (UTC)
We do skip after enough fails, but that is per run and not global. AManWithNoPlan (talk) 22:53, 3 March 2020 (UTC)
I don't see this warning right now. AManWithNoPlan (talk) 22:54, 3 March 2020 (UTC)
 if (!$is_a_man_with_no_plan) $this->expand_templates_from_identifier('url',     $our_templates);
long-term it would be good to take advantage of the bulk API and submit all urls at once AManWithNoPlan (talk) 00:57, 4 March 2020 (UTC)
True for all APIs. Headbomb {t · c · p · b} 19:17, 16 March 2020 (UTC)
Already true for the slow ones that allow it (other than zotero). AManWithNoPlan (talk) 12:20, 17 March 2020 (UTC)

remove website and synonyms from cite arxiv

Status
{{fixed}} once deployed
Reported by
Headbomb {t · c · p · b} 03:33, 23 March 2020 (UTC)
What happens
[101] (after changing cite web to cite arxiv manually)
What should happen
[102]
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/2760 AManWithNoPlan (talk) 11:41, 26 March 2020 (UTC)

Flagging edits as "bot"

It appears that the wikipedia API is ignoring the bot=1 flag we are passing it. Someone with wikipedia superpowers needs to flag this account as a bot, so that this flag is accepted. I know this flag is not required for bots, but it would be nice. AManWithNoPlan (talk) 11:36, 26 March 2020 (UTC)

@Xaosflux: any insights here? Headbomb {t · c · p · b} 11:58, 26 March 2020 (UTC)
The edits are being correctly flagged as bot. Remember that only the recentchanges table stores this information, so you can see it from the recentchanges API or your own watchlist.
Example which show the flag is correctly registered:
{"type":"edit","pageid":42939132,"revid":947449519,"old_revid":947449495,"rcid":1244098555,"user":"Citation bot","bot":""},
Nemo 12:52, 26 March 2020 (UTC)
My watchlist was not showing the "b" flag. I don't know why, but now it is. That was really weird. {{notabug}} AManWithNoPlan (talk) 14:26, 26 March 2020 (UTC)
In hindsight, I should have noticed that NO bots were flagged as "b". AManWithNoPlan (talk) 14:41, 26 March 2020 (UTC)

Removes valid partial title link

Status
{{fixed}} - only do this now if less than a 60% of the title length
Reported by
David Eppstein (talk) 16:52, 18 March 2020 (UTC)
What happens
Special:Diff/946139951
What should happen
Moving the "centennial edition" part to an edition field is probably too much intelligence to expect of the bot, but the title link for Alan Turing: The Enigma should be either moved to a title-link field or left in place, not just dropped on the floor.
We can't proceed until
Feedback from maintainers


Partial wikilinks should not be used (according to the styles), and are 99% of the time invalid (ie. they link to IBM in the title instead of the actual thing, for example) AManWithNoPlan (talk) 12:41, 19 March 2020 (UTC)

Incorrect change from "url=" to "chapter-url="

Status
{{fixed}} please report more. No bug is ever truly dead
Reported by
Graham87 15:23, 21 December 2019 (UTC)
What happens
In the Busselton article, the bot changes "url=" to "chapter-url=", which is incorrect in this case because the PDF link goes to the entire book, not just one chapter of it.
Relevant diffs/links
this edit
We can't proceed until
Feedback from maintainers


Manual bypass seems the solution here. Headbomb {t · c · p · b} 15:27, 22 December 2019 (UTC)
Not making bot edits beyond the capacity of the bot to understand the actual meaning of the content at the link seems to be the answer to me. If we're going to have two different url parameters with different meanings and one of them is chosen as the correct one by a human editor, why should the bot be second-guessing that? —David Eppstein (talk) 18:52, 22 December 2019 (UTC)
Because in 99%+ of cases, humans are wrong and use url instead of chapter url. Headbomb {t · c · p · b} 20:45, 22 December 2019 (UTC)
This is directly counter to the philosophy according to which, several years ago, the |url= parameter was changed from being a catch-all parameter that would by default bind to the tightest title in the template, and instead became split into several parameters that each had a specific meaning. If I want to use a parameter with its correct meaning, and the bot refuses to let me, that seems like the very definition of a bug to me. —David Eppstein (talk) 01:11, 31 December 2019 (UTC)
Just encountered this again at Modern Jazz Quartet. The bot should have some code that helps it figure out that this, added by InternetArchiveBot, is most definitely not a chapter URL. Graham87 04:24, 12 March 2020 (UTC)
Here's another one. If Citation bot is too stupid to recognize that an archive.org url like this, without any extra page-number complications, is going to be a link to the whole book, it is too stupid to be making these changes at all. url= without chapter-url= is a perfectly valid combination of parameters and should not need special bot-exclusion code to prevent it from being broken by marauding bots. —David Eppstein (talk) 06:36, 20 March 2020 (UTC)
I agree. Book URLs are not rare, and blindly changing |url= to |chapter-url= is introducing a significant number of errors. It needs to be stopped. Kanguole 10:17, 26 March 2020 (UTC)
this should help a lot https://github.com/ms609/citation-bot/pull/2765 AManWithNoPlan (talk) 17:07, 27 March 2020 (UTC)
Checking for google.com and archive.org will reduce the number of errors, but the bot will still be making many erroneous edits. This is not an edit that can be safely automated. Kanguole 22:29, 27 March 2020 (UTC)
please point me to examples where there is a problem now. AManWithNoPlan (talk) 22:55, 27 March 2020 (UTC)
Sorry, I misread it. The new approach, only moving for the few websites where you know the format of URLs for parts of books, is what I wanted. Kanguole 23:36, 27 March 2020 (UTC)

Discussion at Village Pump

Of possible interest: Wikipedia:Village_pump_(technical)#=url_and_=archiveurl_do_not_match -- GreenC 14:18, 18 March 2020 (UTC)

{{fixed}} - flag for archive. AManWithNoPlan (talk) 21:12, 28 March 2020 (UTC)

Clean up todo's and fix code coverage

Aggressive fixing of bugs has left the code with some technical debt. Need to fix. AManWithNoPlan (talk) 11:41, 26 March 2020 (UTC)

{{fixed}} for now. Will look again in the future. AManWithNoPlan (talk) 21:11, 28 March 2020 (UTC)

ANI notice

  There is currently a discussion at Wikipedia:Administrators' noticeboard/Incidents regarding an issue with which you may have been involved. The thread is AManWithNoPlan and Citation bot. . HJ Mitchell | Penny for your thoughts? 10:08, 28 March 2020 (UTC)

{{fixed}}

URL added for journal title

Status
{{fixed}}
Reported by
Jonatan Svensson Glad (talk) 16:36, 28 March 2020 (UTC)
What happens
|journal=HTTPS://Sociologydictionary.org/
What should happen
Do not add URL in |journal=
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Worldview&diff=prev&oldid=947809324
We can't proceed until
Feedback from maintainers


I know this is some kind of GIGO, but a sanity check to not add ULRs to journal field cound be applied. Jonatan Svensson Glad (talk) 16:36, 28 March 2020 (UTC)

I am once again disappointed in zotero's error checking. We do a lot of data sanitization. https://github.com/ms609/citation-bot/pull/2767 AManWithNoPlan (talk) 19:49, 28 March 2020 (UTC)
This one isn't even a journal! {{cite encyclopedia}} would have been a better choice. —David Eppstein (talk) 20:10, 28 March 2020 (UTC)