Template talk:Webarchive/Archive 2

Latest comment: 2 years ago by GreenC in topic Para alias requests
Archive 1 Archive 2

Period-comma

One of the most common typos in Wikipedia is to find a period and comma together in a reference. [1] Example. This occurs because this template produces a period after "Wayback Machine." Editors often add commas after each bibliography element, probably because they don't notice the period, resulting in a period and comma together. If I notice, I can remove the comma but not the period, resulting in a reference with one period and everything else having a comma (is that OK?) Of course I haven't checked 6,821,589 articles; only a bot could do that. We could make the period optional, or remove it and fix existing references with a bot. Art LaPella (talk) 15:57, 15 January 2019 (UTC)

When this template was first written, it was merging in an older template {{wayback}} which rendered a period, and {{webcite}} which didn't render a period. To be backwards compat, this template will check if it is a Wayback URL and if so add a period, otherwise no period. This is confusing for anyone who doesn't know the history as they will get different behavior for no logical reason, it would look like a bug to me. An option to exclude/include a period would probably be warranted. -- GreenC 17:34, 15 January 2019 (UTC)
OK, but someone else besides me should do it, although I once did simple template fixes (typos) myself. It seems to have moved to "Module" space and changed languages. It's 1000 lines long, and searching for periods doesn't find it. Art LaPella (talk) 19:21, 15 January 2019 (UTC)
Is it really necessary to continue that period / no-period support? Really? Special cases are bad. My vote would be to do away with the period that occurs in cases like the one described. The terminal period should be required when the template produces more than one sentence (as happens with multiple urls, etc). Else, it should have no terminal punctuation (as at Example 2 under Template:Webarchive#Usage).
Trappist the monk (talk) 20:37, 15 January 2019 (UTC)
It would break a lot of things to remove the period it is one of the more common cases. As noted by Art LaPella it would require a bot to try and untangle things and I'm not sure it is feasible.. I tried adding |postscript=none support [1] but for some reason the program is not seeing the argument it's always treated as a null value. Test case Testcase A.12.3. Any ideas? I'm perplexed. -- GreenC 01:21, 16 January 2019 (UTC)
Ahh sheesh, thanks. Alright we now have a working |postscript=none in the sandbox that can be seen in Testcase A.12.3. -- GreenC 14:30, 16 January 2019 (UTC)
Thank you Art LaPella (talk) 14:51, 16 January 2019 (UTC)
I guess if there is no objection, we can implement the sandbox version. The question of continued support for the '.' at the end of Wayback links can be addressed at any time, the |postscript= won't preclude that and would be easy to undo with AWB should support ever be dropped at which point |postscript= would become redundant. -- GreenC 16:04, 17 January 2019 (UTC)
The question of continued support for the '.' at the end of Wayback links can be addressed at any time, ... Then why not now? The issue that brings this to the fore is the extraneous dot comma combination. Claims are made that removing the dot would break a lot of things. What would break?
This search (when it doesn't timeout) finds just shy of 60k article pages that have {{webarchive}} followed by a comma. Because insource searches are not perfect, no doubt there are {{webarchive}} templates in that search result that are not Wayback Machine. A Ctrl+f search of the results page for the text string web.archive.org gave 311 of 500 so perhaps there are 40k-ish articles that suffer the dot-comma outrage that, to fix, would require the addition of |postscript=none at least once per page. That seems like a bot task to me. Then, should we later decide to deprecate and remove |postscript=, that's yet another bot task.
I am not opposed to keeping |postscript=, but if we do, it should be applied to all renderings of {{webarchive}}, not just Wayback Machine, and should be made to take a value that is actually appended to the template rendering. Of course, one might argue that such a parameter is pointless because editors can simply add whatever would go in |postscript= in the wikitext after the template ...
Trappist the monk (talk) 16:56, 17 January 2019 (UTC)
In addition to "period-comma", I've also noticed "period-period" (a period after a webarchive template), which should be considered if we're designing a bot. Art LaPella (talk) 18:58, 17 January 2019 (UTC)
I added a period-period to the search and it's up to 95k. If we are looking at 80k-ish articles requiring |postscript=none, the next question is how many would be broken by the removal of '.', if it were deprecated? Which choice is worse. There are 338,994 instances of the template in use for Wayback URLs. 80k is about 23% of 340k so the question: are more than a quarter of them relying on that period? -- GreenC 19:23, 17 January 2019 (UTC)
Then that is further inducement to stop supporting the special-case wayback machine terminal dot. Searching for dot comma and for dot dot and for various other simple punctuation characters gave me these results using a slightly different search string:
dot comma (.,) (>54k×)
dot dot (..) (>43k×)
dot semicolon (.;) (~3800×)
dot colon (.:) (~1100×)
dot exclamation (.!) (5×)
dot question (.?) (18×)
dot bracket (.() (~9400×)
dot bracket (.[) (~3500×)
dot bracket (.]) (~40×)
dot bracket (.)) (~1700×)
dot bracket (.<) (>145k×)
A simple search, like the ones above, to answer the [how many are] relying on that period question is, I think, not possible, or if it is, I don't know the magic words. I do think that this template should not have special cases; either all archives should be terminated with a dot or no archives should be terminated with a dot (excepting those template renderings that are multiple sentences which, of course, require the terminal dot).
Trappist the monk (talk) 21:02, 17 January 2019 (UTC)
Of course a bot would be nice, and an archive-independent solution would be better. But I'd be helpless against the response of "Glad you volunteered!" Art LaPella (talk) 21:28, 17 January 2019 (UTC)
Art LaPella, would you mind randomly looking through 50 or 100 cases this template and counting how many would look worse off if the period were removed? It would provide a very rough guess how bad the situation is. -- GreenC 21:39, 17 January 2019 (UTC)
working on it Art LaPella (talk) 22:15, 17 January 2019 (UTC)

Any which follow a CS1 template, e.g. those using |format=addlarchives, should probably have a period. - Evad37 [talk] 22:55, 17 January 2019 (UTC)

I didn't see the add1archives comment until I was done, and looking for that parameter would require starting over, searching in edit mode.
I started from ABB ALP-44 trying to get a more random sample by avoiding things like years (but I still got mostly Abbas (Muslims) and abbeys (Christians)).
There were 26 articles with at least one example of dot-comma or dot-dot.
Of the others, there were 8 examples resembling [2] where removing the period without benefit of a bot could confuse "Wayback Machine" with the next parameter. There are more numerous examples I didn't count because the Wayback Machine period was the only punctuation in the entry that separates bibliography elements.
The most common solution, 32 times, is using commas with the Wayback Machine and period at the end, like this: [3] Or no punctuation, with the Wayback Machine and period at the end, 18 times: [4]. This example shows using Wayback Machine only for the first of many references, perhaps avoiding Wayback Machine because of the period problem. Also, the Wayback Machine almost always appears at the end of a reference, perhaps because that is where the extra period is least objectionable. So editors probably have some trouble trying to work around the period problem.
Art LaPella (talk) 00:24, 18 January 2019 (UTC)
Now that I've seen the "additional archives" format, I don't remember finding that at all on my random pages. It must be rare. Art LaPella (talk) 01:11, 18 January 2019 (UTC)
Thanks for your work here but these numbers don't make sense to me. It compares 26 articles that have the comma-period problem with 8 templates that require the default period. Normally you would look at 50 templates and report how many have a default-period problem (ie. removal of the period would be a problem), how many have a comma-period problem, and how many it doesn't matter (ie. the period is at the end of the sentence). Then we can extrapolate percentages. -- GreenC 01:24, 18 January 2019 (UTC)

References


Here's the data I found in about 50 articles.

Article default-period comma-period no problem
1st Durham Volunteer Artillery 0 0 4
1st Squadron, 108th Cavalry Regiment 0 0 1
3rd Battalion, 320th Field Artillery Regiment 0 1 0
4th Princess Louise Dragoon Guards 0 0 1
8coupons 0 7 1
A Northern Light 0 1 0
A Pack of Lies 1 0 0
A Life Without Pain 1 0 0
A Long Long Way 0 0 1
A Mango-Shaped Space 0 0 1
A History of the Crusades 0 1 0
A Hand of Bridge 0 1 0
A Nightingale Sang in Berkeley Square 0 0 4
A Kind of Magic 3 5 0
Aalstermolen 0 1 0
Aamir Bashir 1 0 2
Aaniiih Nakoda College 0 0 8
Aanval 0 0 2
Aa Dinagalu 0 0 1
Aab-e-Gum 0 0 1
Aaron Dankwah 0 0 1
Aaron Faulls 0 0 5
Aaron Gwyn 0 1 0
Aaron Kelton 0 1 0
Ace Hood 0 0 1
Acer Aspire 0 0 1
Accra 1 6 1
Access Copyright 0 1 0
Accept (band) 0 0 1
Accrington F.C. 0 0 1
Accident management 0 0 2
Acciaroli 0 0 1
Acer sieboldianum 2 0 0
Accurate Miniatures 0 0 1
Adam Sarota 0 0 2
Adam Selwood 0 1 0
Adam Snow 0 1 0
Adam Leipzig 0 0 3
Adam Kokesh 0 2 2
Adamit 0 0 1
Adams Key 0 0 1
Adamsdown 0 0 1
Adamussium 2 0 0
Adán Godoy 0 0 1
Addicted to Love (film) 0 0 3
Addicted (2002 film) 0 1 0
Addison Bain 0 0 1
Adel, Iowa 0 0 2
Adel Lami 0 0 1

Key

  • default-period ie. removing the default period causes a problem
  • comma-period ie. keeping the default period causes a problem
  • no-problem either way doesn't matter

Totals

  • default-period: 11 (11%)
  • comma-period: 31 (31%)
  • no-problem:: 60 (60%)
  • total: 102

-- GreenC 02:17, 18 January 2019 (UTC)

If that 11% figure is even close to reality (it is a very spare sampling), right away it would make sense to drop support because the default '.' is causing more problems than resolving. I suspect the 11% is in the ballpark because we know the comma-period problem is about 25% and that is close to the 31% found here. -- GreenC 02:22, 18 January 2019 (UTC)
Use of |addlarchives= or |addlpages= is relatively rare:
|format=addlarchives (79×)
|format=addlpages (25×)
If the first five pages in the |format=addlarchives search results are to be believed, it is common for editors to jam the leading {{ of a {{webarchive}} template right up against the trailing }} of a cs1|2 template. By that metric, this template should also enforce a required leading space character; which could probably be done harmlessly because, as I understand it, most browsers truncate multiple whitespace characters to a single space when rendering html. Still, I think that leading whitespace and trailing punctuation is the responsibility of the editor and templates shouldn't attempt to second guess editors. "But wait," someone will say, "cs1 templates all have terminal puntuation." Yep, because every element of a cs1 rendering that isn't in parentheses, isn't a compound, or quotation gets terminated with a dot. As I've said before, {{webarchive}} renderings that produce multiple sentences require a terminal dot.
Trappist the monk (talk) 14:17, 18 January 2019 (UTC)

Given the above, I believe we should move forward with deprecating support for the default period. It's causing more problems than not. -- GreenC 16:25, 19 January 2019 (UTC)

The Sandbox is updated, and the Testcases shows where it would be different (yellow bars). -- GreenC 22:52, 19 January 2019 (UTC)
It lives. -- GreenC 19:00, 21 January 2019 (UTC)
Thank you Art LaPella (talk) 19:42, 21 January 2019 (UTC)

Further localization support

To make it more localizable, parameter names should be in a table so there's no need to replace them manually in multiple places in the code. Also further date formatting support to replace output with a correct localized format: form of name of the month changes with suffix if there is or is not a day given in the date and so on. Edit: I was looking in the wrong place for parameter names, that one is localizable already. Edit2: actually, can we just replace the output formatting with what user is giving? That would fix the whole thing. Currently it isn't used in anything at all in output? Edit3: And since Lua doesn't support non-ASCII characters correctly it is a pain to fix things in other ways as well.. Ipr1 (talk) 05:36, 24 May 2019 (UTC)

You have the advantage of me in that you are seeing in your mind's eye exactly what you mean. But I cannot read your mind. Can you provide examples of just what you mean for each of the improvements you've named?
Trappist the monk (talk) 10:22, 24 May 2019 (UTC)
Actually, what is the point with the date conversions in the module AT ALL? You'll need to provide date in the URL in the format the service accepts anyway. That whole part with date conversion is entirely pointless as I see it. To locate actual archived page user needs to check the service and copy the URL after finding relevant article, so user will get the date according to what is archived there and there's no need to enter it into the date-field. So you could just make things simple and rip out the date conversions entirely from the module. After that, there's not much point to having the module in the first since it would be just reference-template again. Ipr1 (talk) 00:07, 25 May 2019 (UTC)
I mean, either user provides a correct working URL to archived page or they don't - that's it. Why is this implemented in a complicated way? We could skip bunch of conversions then. And if the cases are automatically generated by a bot then doubly so - bot can't start guessing which version might have the referenced information, it should use the one in original reference. If that isn't archived then it is upto a human again to find archived version which has the referenced information. So no reason to generate/convert dates in that case either. Ipr1 (talk) 00:20, 25 May 2019 (UTC)
You'll need to provide date in the URL - this is not always true for all archive services. -- GreenC 02:09, 25 May 2019 (UTC)
That whole part with date conversion - which whole part? There are date conversions in multiple places for different purposes. -- GreenC 02:18, 25 May 2019 (UTC)

Fill in date automatically?

archive-date has to be entered manually, even if the archive-url already contains it. Wayback & many other archives always include the date (also the source URL) in the snapshot URL. Couldn't this be automated, where possible? That would be more convenient, and less error-prone.

(I just brought this up at WP:CS1. ~Crossposting, b/c same issue applies here, and I'm not sure if these templates rely on seperate code.)

Toxide (talk) 11:21, 17 July 2019 (UTC)

Please, one conversation in one place. Conversation continues at Help talk:Citation Style 1 § Fill in archive-date automatically
Trappist the monk (talk) 14:36, 17 July 2019 (UTC)
Thread was archived: Help talk:Citation Style 1/Archive 58 § Fill in archive-date automatic
Toxide (talk) 11:50, 29 September 2019 (UTC)

Citation parameter names

Is there any objection to creating parameter name aliases compatible with {{citation}} and friends. Specifically, "archive-url" and "archiveurl" as aliases for "url" and "archive-date" and "archivedate" as aliases for "date". These are less ambiguous parameter names and if editors and bots start using them, it will be easier to convert manually formatted references incorporating {{webarchive}} to a fully templated one using {{citation}}. ~Kvng (talk) 14:27, 24 May 2020 (UTC)

Yes object. It is not a CS1|2 template and should not be confused as such. Using CS1|2 argument names will result in accidental inclusions of other CS1|2 args the template does not support (access-date, url+archive-url, etc..) creating more confusion and feature creep. It is redundant to have "webarchive" and "archiveurl", the inclusion of "archiveurl" implies it would pair with a "url" just as in CS1|2 templates. The template exists for a single purpose, the name of the template makes clear, simply a web archive URL ie. {{webarchive}} |url= .. it is self-documenting. -- GreenC 15:52, 24 May 2020 (UTC)

webarchive use of {{use dmy dates}}, etc.

The usage of "webarchive" should use the stated date format:

Archived 2019-10-23 at the Wayback Machine
Top routes from YYC at the Wayback Machine (archived 2019-10-23)

like "cite web":

"Top routes from YYC". Archived from the original on 23 October 2019.

--User-duck (talk) 15:34, 23 October 2019 (UTC)

It was originally designed WYSIWYG to avoid adding a new argument |df=iso for those who prefer it display ISO (or whatever). There is also a contingent of people who prefer that archive-dates be in ISO, and other dates in full, so it's not totally clear making longer date strings would be preferred by default. -- GreenC 19:14, 23 October 2019 (UTC)
Use of the template potentially leaves date formats out of line within articles. Autoformatting is already in place for citation templates, meaning that all dates contained within the templates are rendered to the reader according to the desired format designated by {{use dmy dates}} or {{use mdy dates}} irrespective of the format of each underlying date within the template. I suggest that it would be useful for this template to use this Autoformatting feature to ensure that dates are cromulent in accordance with MOSNUM.

At present, {{webarchive|title=Top routes from YYC|url=https://web.archive.org/web/20191023150550/https://www.flightradar24.com/data/airports/yyc|date=October 23, 2019}} displays as :Top routes from YYC at the Wayback Machine (archived October 23, 2019) despite there being a {{use dmy dates}} on the page. Note that even {{webarchive|url=https://web.archive.org/web/20191023150550/https://www.flightradar24.com/data/airports/yyc}} (which has no date parameter) displays as Archived 2019-10-23 at the Wayback Machine, and it is impossible to modify the date format in this case short of adding the |date= parameter, which is rather a waste of time.

Whilst indeed there are readers who prefer different date formats that are permitted for archive-dates – be it yyyy-mm-dd, dmy or mdy – these formats are already overridden within citation templates. However, editors have the means of modifying the display characteristics by adding parameters to the {{use dmy dates}} template as indicated at Help:Citation_Style_1#Dates. The same could be done with this template. -- Ohc ¡digame! 15:48, 11 July 2020 (UTC)

Date localization

I want this template/module to accept the following date formats on bs.wiki (I used "January 4, 2014" as an example):

  • 4. 1. 2014
  • 4. januar 2014

and by default display the first one. Is this possible? – Srđan (talk) 17:24, 16 April 2021 (UTC)

Template-protected edit request on 29 April 2021

"Archive.today" is uppercase, but the original and Wikipedia page names are lowercase (archive.today). Is it possible to correct it? Thank you. 126.140.222.220 (talk) 10:58, 29 April 2021 (UTC)

  Done Elli (talk | contribs) 00:31, 30 April 2021 (UTC)
As a logo it is lowercase, but when used in grammatical sentences not so sure. Unclear the wiki page is accurate or consistent (there are cases of upper in the article). Since it is a proper noun lean towards upper, but also the template is not producing full grammatical sentences, space, brevity and clarity take precedent. -- GreenC 00:42, 30 April 2021 (UTC)
I think it makes sense to keep it lowercase - it looks like the article only uppercases it at the start of sentences (Since July 2013, archive.today supports the Memento Project application programming interface (API)). Elli (talk | contribs) 00:48, 30 April 2021 (UTC)
I'm alright with that. We do cap WaybackMachine and Internet Archive but usually not archive.org -- GreenC 00:59, 30 April 2021 (UTC)

Incorrect documentation

The table at the bottom states for nolink that "Any value including blank means no wikilink". In fact, nolink has no effect. A non-blank value is required. Chimango Caracara (talk) 10:42, 24 October 2021 (UTC)

Template-protected edit request on 27 November 2021

Please replace line 102:

['pandora.nla.gov.au'] = {false, 'Pandora Archive'},

with:

['webarchive.nla.gov.au'] = {true, 'Australian Web Archive'},

(The National Library of Australia relocated the Pandora Archive to the broader Australian Web Archive in March 2019.) ClaudineChionh (talkcontribs) 09:24, 27 November 2021 (UTC)

  Done firefly ( t · c ) 14:32, 28 November 2021 (UTC)

Bad

Apart from (a) why can't I just fix this myself, I have to point out that this template and documentation are both bad.

The template is bad, it provides text such as <original page title> at webarchive <some date>. So if the original web page title was "There was a fire", the template inserts the text There was a fire at webarchive, 15 Jan 2020, or similar.

And the documentation which is on the main page where I am commenting because I find only this read-only documentation and not any way for a user to amend the broken wikipedia you maintain, is incorrectly telling people that what the template inserts is something like, Archived at webarchive <some date>. For you to fix.

<legal threat removed>— Preceding unsigned comment added by 150.101.157.18 (talk) 00:55, 8 December 2021 (UTC)

Read article date format

This template should read the article's date format (specified via the {{Use dmy dates}} set of templates) just like all of the {{cite}} templates do when rendering the date format. Getsnoopy (talk) 19:12, 1 January 2022 (UTC)

It's not a CS1|2 template and doesn't try to be. It was designed to allow any date format by way of a literal string. Otherwise it would require a new argument to override the article-level date format when so desired. It would also complicate usage at non-enwiki sites that use their own formats and "use dmy dates" templates. Also many people seem to prefer iso vs dmy/mdy for certain things like archive and access dates, if this was done it would upset the cart of existing uses. -- GreenC 20:12, 1 January 2022 (UTC)

archive.ec is decrepcated

The domain "archive.ec" no longer belongs to archive.today, it should probably be taken out. Luckily there are only about 6 articles using "archive.ec" instead of one of the domains he owns. The template should probably purge the "archive.ec" entry. Rlink2 (talk) 17:40, 11 January 2022 (UTC)

Para alias requests

Can we set up |archive-url= and |archive-date= as aliases for |url= and |date= respectively. This would be consistent with how citation templates handle URLs. ~Kvng (talk) 00:18, 23 February 2022 (UTC)

This has been previously discussed. The template is not a CS1|2 and should not be confused or used as such. Furthermore archiveurl is redundant, the only kind of URL this template accepts is an archive URL - the purpose of the template is to hold web archive urls ie. webarchive + url -- GreenC 02:01, 23 February 2022 (UTC)