Wikipedia talk:Wikipedia Signpost/2020-04-26/In focus

Discuss this story

Latest comment: 3 years ago26 comments14 people in discussion

I was surprised to learn that almost half of the articles in German don't have English language concurrent articles. This is not very healthy as we have so many notable Germans with no mentions in English Wikipedia and we are being deprived of vital information. Now I am not suggesting that we create English language articles for all of those, it would be too time-consuming. But at least for a certain number of such articles, it is well worth our while to do what's necessary. We could set a target that within the next 2 years, necessary steps would be taken to cover let's say 75% of the German articles in English instead of the present 50%. That would add a huge number of articles of notable German individuals and subjects, wouldn't it. If this cannot be presently achieved for some reason, we could at least create redirect pages in English Wikipedia with the appropriate short description, defaultsort, language links and appropriate categories with the resulting redirect page pointing to the German page for now. Or, instead of a redirect page, a template would be created that will inform reader that there is actually a German language article for the said individual or subject. This could also similarly apply for say Italian, Spanish and French language articles as a start, possibly also Arabic, Portuguese and Chinese at a later stage. The "translate" feature would do the rest once we are led to the language page of the non-existant English language article. werldwayd (talk) 20:08, 26 April 2020 (UTC)Reply

I think a first step would be to assess why we don't have an English article matching one on another wiki. Is it solely because of a language barrier? Is the concept covered by a different article? Does the explanation lie in differing notability guidelines between projects? A translate-a-thon is not a bad idea in theory but further analysis would be useful in better understanding the underlying issues. Nikkimaria (talk) 20:29, 26 April 2020 (UTC)Reply

Here's a quick query that shows you a few hundred German articles without one in English (should be easy to change the language on that): https://w.wiki/P7d

Probably even more interesting is this, the list of articles that exist both in the German and Spanish Wikipedia, but not in English: https://w.wiki/P7f

This is just for starting this investigation, obviously, we should have a much deeper analysis, I agree with that. --denny vrandečić (talk) 20:55, 26 April 2020 (UTC)Reply

Just because the article is in German and there's no English equivalent doesn't mean that that the article is about a German subject. I've encountered many articles in German about Australian athletes. Hawkeye7 (discuss) 01:23, 27 April 2020 (UTC)Reply

I have translated a few articles from German and other languages to English. In some other cases, the article I wanted to "port" had insufficient references per enwp requirements. This is not necessarily a surmountable problem with just more translators. Notability and BLP documentation standards differ between different Wikipedia language editions, and enwp seems to have a relatively high bar. Example, I just created a stub for Paragraf Lex. It has zero references on Serbian Wikipedia. It appears to be somewhat notable; at least, it is cited quite a few times by English Wikipedia. ☆ Bri (talk) 21:00, 26 April 2020 (UTC)Reply

Indeed, I'd expect us to have the highest bar as the largest Wikipedia (by some metrics). Lower bars encourage expansion of content and accumulating an editor base, which is good for small Wikipedias, whilst higher bars encourage improving the quality of already-existing content. Nonetheless, we do have systemic biases and no doubt there is a lot of useful translation that can be done, even if the aim is not to make an article for every subject on the German Wikipedia. — Bilorv (talk) 21:18, 26 April 2020 (UTC)Reply

When we talk about articles existing in one language Wikipedia not being present in the English language Wikipedia, it doesn't always prove that there is a deficit needing to be addressed. For example, a few years back I was surprised to find the Bulgarian language Wikipedia has an article on every Consul of the Roman Empire known -- who number about 1,400 between 30 BC & AD 235. (en.wikipedia has somewhat more than 1,000.) I was impressed by that, & took a close look at a few ... only to find they were the most basic of stubs, consisting of little more information than "X was a politician of ancient Rome. X was a consul in year A with Y as his colleague", & some fancy templates. (Google translate works wonders in cases like this.) No sense translating stubs like these to the English Wikipedia; we create enough stubs on our own. -- llywrch (talk) 03:40, 27 April 2020 (UTC)Reply

"Almost half" means there are more than one million articles in the German language-version of Wikipedia that are not in the English version. So, regarding "We could set a target that within the next 2 years, necessary steps would be taken to cover let's say 75% of the German articles in English instead of the present 50%", there is no feasible path - today - to translate 500,000 articles in two years. There probably isn't a feasible path to translate even 5,000 articles, if by "feasible" we mean "finding volunteers who speak both languages fluently, and aren't busy doing other things". If we're going to get massive amounts of content from one language version into other language versions, the only way to do that is with computer-based processes, lightly reviewed by humans. Or by a donation of several billion dollars from a very well-endowed foundation or philanthropist. -- John Broughton (♫♫) 23:18, 26 April 2020 (UTC)Reply
What we really need is an automated translation tool that translates everything but the text. By which I mean the templates, links, tables, categories etc. This would greatly reduce the effort required. Hawkeye7 (discuss) 01:23, 27 April 2020 (UTC)Reply
But what we normally don't need is machine-translated content...have a look at pages for translation; there, many articles which have been machine-translated are listed, and are waiting to be evaluated/translated/copyedited, some of them for years . What we regulars over there that have learned: it's much more efficient and less time-consuming to write articles from scratch, using the foreign-language sources used in the other language article. Of course only if no English sources can be found. Lectonar (talk) 10:22, 27 April 2020 (UTC)Reply

The Content translation tool can offer that really well! @Amire80: for cc --denny vrandečić (talk) 16:07, 27 April 2020 (UTC)Reply

Thanks, denny vrandečić :)

Hawkeye7, yes! Computers should translate what they can translate reliably: code, data, etc., and humans should translate prose. I'd go even further and say that ideally humans really should not translate things that can be reliably translated by computers. A good multilingual systems is supposed to strive for this: automate everything that can be reliably automated. I'm also fine with Denny's proposal in principle, because to the best of my understanding, what it suggests is auto-generating boilerplate prose from data reliably, while allowing people to write their own prose.

As Denny says, Content Translation kind of does it, although not perfectly. It's pretty good at transferring links, images, and tables. Links are mostly a breeze, if the article exists in the language into which you are translating and they are connected using a Wikidata sitelink. Images are a breeze if they are on Commons. It's not perfect with complex tables because they are, well, complex, especially those that have a lot of columns and are too wide to fit in a narrow column, but yeah, it kind of works. (The real solution for complex tables is to try thinking of storing what they show in a proper database, and then get the data to display in articles using queries that auto-adapt to different media. It would be a difficult project that will require a lot of infrastructure, but it's worth thinking about. But I digress.)

Categories are a breeze, as long as directly corresponding categories had been created in the language into which you are translating. What often happens in practice is that the English Wikipedia's category tree is more complex because it has more articles and more need for deeper categories, so categories have to be manually added after the article created.

Another thing you didn't mention is language-independent content, most notably math formulas.

And this brings us to templates, which are the biggest pain. Translating templates works nicely in Content Translation if the corresponding template exists in the language into which you are translating, and all of its parameters had been correctly mapped using TemplateData. Templates and modules are currently stored on each wiki separately, so this has to be done for every template in every wiki, and in practice this doesn't scale. It must be possible to make templates shareable, or global. I wrote a proposal for this: mw:Global templates/Proposed specification, short version. Your opinion about this is very welcome.

Lectonar, you are generally right, but here's the more nuanced approach to machine translation. If machine translation of a text from another language is posted as a Wikipedia article, this is worse than useless. This absolutely must not be done, ever. If, however, machine translation is used by a responsible human who corrects all the mistakes that it made and makes sure that the text reads naturally, has true information, and is well-adapted to the reader in the target language, and then this text is posted as a Wikipedia article, then it's indistinguishable from translation. If machine translation helped this human translator do it more quickly, then it was beneficial.

Some people who translate texts find machine translation totally useless and prefer to translate everything from scratch. This is totally fine, too, but it's not absolute: There are also people who find that machine translation makes them more efficient. As long as they use it responsibly and post text only after verifying every word, this is okay. --Amir E. Aharoni (talk) 07:38, 28 April 2020 (UTC)Reply
Responding more directly to the article, thanks again to denny vrandečić for mentioning global templates. The two ideas are indeed related, although I'd say that we should make it possible to have globally shareable modules and templates first, and then proceed to complete something like Wikilambda. Here's my rationale: The syntax for writing modules and templates is familiar to many editors. The global templates proposal specifically says that it doesn't suggest changing anything about the programming languages in which they are written: wiki syntax, parameters, parser functions, Lua, etc. It only suggests a change in where they are stored: transitioning from the current state, in which modules and templates can only be stored and used on a single wiki, to having the option of storing some of them, those that are useful to multiple wikis, on a shared repository (while preserving the option of having local modules and templates). This is similar to having to option to store images on Commons. I've read the whole Wikilambda paper, and my impression is that while it's probably not finalized, it's already clear enough that the proposed Wikilambda "renderer" language is a new thing that will be significantly different from wiki syntax and Lua. This is legitimate, but it makes a lot more sense to start a global repository from something familiar. In addition, developing a global repository will probably require updating some things deep in the core MediaWiki platform so that the performance of change propagation across sites will be better, and this will also improve the performance of some existing things, most importantly Commons and Wikidata. Once this core improvement is done for familiar things like images (Commons), structured data (Wikidata), and wiki syntax (modules and templates), it will be easy to build Wikilambda upon it. --Amir E. Aharoni (talk) 08:12, 28 April 2020 (UTC)Reply
This makes a lot of sense to me, but does not strike me as a botteleneck. It will take some time to set up the basic site + processes for Wλ, and we can pursue a "global sharing" framework at the same time which will be useful for Wλ once it gets underway. – SJ + 23:22, 5 May 2020 (UTC)Reply

If I undertake to write an article for English Wikipedia, I generally have to find English-language sources. German-language sources aren't much help to my English-speaking readers.

In effect, notability is defined per-language. For any particular article in German Wikipedia, the topic may not be suitable for English Wikipedia, if there are not enough appropriate English-language sources. Bruce leverett (talk) 18:57, 1 May 2020 (UTC)Reply

This is not an issue. According to WP:NOENG, English sources are preferred in the English Wikipedia, but non-English sources are allowed, and this is sensible. As for notability, my understanding is that this Multilingual Wikipedia / Abstract Wikipedia / Wikilambda proposal doesn't intend to force any article to appear in any language, but only to give an easier way to auto-generate basic articles in languages that are interested in them. --Amir E. Aharoni (talk) 12:16, 2 May 2020 (UTC)Reply

Non-English sources are "allowed", but, to repeat what I said above, "German-language sources aren't much help to my English-speaking readers". Yes, I expect people to read the footnotes, and click on them. I understand that for some articles, including some that I have worked on, this isn't an issue. But the implication is that creating a non-stub English-language version of a foreign-language article is more than just running Google translate and fixing up the results -- much more. I'm not scoffing; in many cases of, for example, chess biographies, I have yearned to be able to transplant the knowledge from a foreign-language article to English. But it's only a little less work than writing a new article from scratch. Bruce leverett (talk) 18:42, 2 May 2020 (UTC)Reply

@Bruce leverett: Agreed. There's also nothing that would stop us from using a cite mechanism in the Abstract Wikipedia that prefers sources in the language of the Wiki when available, and only falls back to sources in other languages if none is given. I guess it is still better to have a source in a foreign language than have no source at all, but I totally understand and agree with the idea that sources in the local language should be preferred on display. --denny vrandečić (talk) 20:48, 11 May 2020 (UTC)Reply

I expect people to read the footnotes, and click on them You're going to be very disappointed. Not only don't they read the footnotes, sometimes they post questions on the talk page admitting that they didn't read the article.

Hawkeye7 (discuss) 23:26, 5 July 2020 (UTC)Reply

I have read both the Signpost article and the separate article. In theory, "Wikilambda" sounds like a good idea. In practice, however, I think it would be too complex to implement. I have been interested in, and have had some ability in, computer programming since I was first exposed to computers in the late 1970s. I have been a Wikipedia editor for more than a decade, and have translated hundreds of Wikipedia articles from another language (mostly either German or Italian) into English (usually with the assistance of Google translate). But I really doubt whether I would have the computing skills to contribute anything to "Wikilambda"; I found it difficult enough to draft the code necessary to transclude Wikidata information into Wikipedia infoboxes, to such an extent that I had to procure another Wikipedia editor to help me. "Wikilambda" seems so hard to grasp that I don't think I would even try to get involved in it. Maybe if the proposer could convince enough computer geeks who speak more than one language fluently to become contributors to "Wikilambda", then it might be able to get off the ground. But I have my doubts. Bahnfrend (talk) 13:07, 2 May 2020 (UTC)Reply

@Bahnfrend: But isn't that true for Wikipedia in general? We have people with different skill sets working together. Bots written in Python, templates written with many curly braces, modules in Lua, tables, images, categories, and beautiful natural language text.

The important part is that the actual content can be contributed by many people, because that is where we must make sure that the barrier is low. This is what the project really needs to get right, and it devotes quite a few resources to this challenge.

For Wikilambda itself, yes, that's a very different kind of beast - and will have a different kind of community with a different set of contributors. But they don't have to be the same contributors that contribute to the Content of the Abstract Wikipedias. But again, as in Wikipedia we will have volunteers with different skill sets working together and achieving more than they could alone. --denny vrandečić (talk) 20:53, 11 May 2020 (UTC)Reply

Always a promising idea, and seems very doable now that related substructure is available. Let us! I suggest EO as an early target wiki, for all of the reasons. – SJ + 23:22, 5 May 2020 (UTC)Reply

Thank you! --denny vrandečić (talk) 20:49, 11 May 2020 (UTC)Reply

[Moved my comments to meta:Talk:Wikilambda#Kaldari's_concerns] Kaldari (talk) 21:52, 8 May 2020 (UTC)Reply

Answered there, thanks! --denny vrandečić (talk) 20:49, 11 May 2020 (UTC)Reply

Update: the proposal has been approved by the Wikimedia Board of Trustees. --denny vrandečić (talk) 17:49, 5 July 2020 (UTC)Reply

Add topic