Wikipedia:Wikipedia Signpost/Single/2023-12-24

Single-page Edition

WP:POST/1

24 December 2023

Special report
Did the Chinese Communist Party send astroturfers to sabotage a hacktivist's Wikipedia article?

News and notes
The Italian Public Domain wars continue, Wikimedia RU set to dissolve, and a recap of WLM 2023

In the media
Consider the humble fork

Discussion report
Arabic Wikipedia blackout; Wikimedians discuss SpongeBob, copyrights, and AI

In focus
Liquidation of Wikimedia RU

Technology report
Dark mode is coming

Recent research
"LLMs Know More, Hallucinate Less" with Wikidata

Gallery
A feast of holidays and carols

Comix
Lollus lmaois 200C tincture

Crossword
when the crossword is sus

Traffic report
What's the big deal? I'm an animal!

From the editor
A piccy iz worth OVAR 9000!!!11oneone! wordz ^_^

Apocrypha
Local editor discovered 1,380 lost subheadings in ancient Signpost scrolls. And what he found was shocking.

Humour
Guess the joke contest

BJAODN
Bad jokes and other deleted nonsense

2023-12-24

Did the Chinese Communist Party send astroturfers to sabotage a hacktivist's Wikipedia article?

Contribute —

By JPxG

The first thing to note

No.

There are a lot of claims made here, and a lot of facts to address. The accusation goes beyond "big if true" — it would be gargantuan if true — and, as such, it warrants being examined in full detail, rather than rejected out of hand. It should not be dismissed as pure fancy. After all, astroturfing is a real and ubiquitous phenomenon, and astroturfing on the English Wikipedia is attempted on a daily basis by any number of organizations and entities.

I think, however, that when the merits of this situation are examined in full detail, you will agree with me that the insinuation of the Chinese Communist Party being somehow involved with this Wikipedia article is bullshit.

To be more specific:

The article revision linked to with the text "Cyber Anakin's Wikipedia article" is from late November 2023. As best I can tell, the version they are actually referring to (the much-longer one, referred to by the Taiwan News article) is this, from September 2022. It is indeed true that the current revision is much shorter. It's also true that many articles on Wikipedia are made shorter (or longer) on a regular basis. But unlike most websites, the page history on Wikipedia articles is an open record for anyone to inspect. So we can inspect this and try to figure out the real story. First, though, allow me to take a minute to tell you a different story.

As a Wikipedia administrator with over a hundred thousand edits, and over a hundred article creations, I have gotten into all sorts of disagreements with people. But perhaps the greatest frustration in my editing career was this deletion discussion. In it, an extremely useful and detailed 122,174-character-long tabular list of technical specifications for Xilinx field-programmable gate arrays was unceremoniously redirected to a couple paragraphs in the larger article about Xilinx. Look how they massacred my boy. I didn't write the article, but I'd used it, and I gave it my all at the debate. It was me, a consummate Wikipedia nerd (and a handful of outraged hardware engineers) against an opposing contingent of equally consummate Wikipedia nerds. We lost. It made me angry, but the decision was compliant with policy, and part of working on a collaborative project is that sometimes stuff happens that makes you angry. Eventually you have to get over it, which I did.

Note that this single deletion discussion was about 3,200 words long, and it wasn't anywhere near the longest. I've written some software that keeps indices of the largest deletion debates of all time; to give you an idea of the Wikipedian capacity for argumentation, Wikipedia:Articles for deletion/List of bow tie wearers (4th nomination) is 22,271 words long. Talk:Cyber Anakin and its archive page (which contain the entirety of the talk page arguments mentioned in the Quillette piece) come out to about 17,708 words.

Sure, this is a lot. It's a whopping 0.79 bow-tie-wearers-fourth-nominations' worth.

Anyway, one of my adversaries during the Xilinx FPGA deletion debate was Drmies. He is, incidentally, one of the editors who removed content from the Cyber Anakin article in November 2022, and therefore, I suppose, one of the "employees or sympathizers of Xi Jinping's regime". Is this plausible?

Well, let's see: Drmies — who really is a Dr. — is an administrator, for which he had to submit to a grueling public seven-day job interview that was being voted on the entire time (and passed with 205 in favor, 2 opposed and 3 neutral). In the twelve years since then he's also been given the checkuser and oversight usergroups. Being an oversighter involves access to information so sensitive even administrators can't see it (e.g. they are the people who remove shock videos of executions and naked children), and for which you are required to sign a non-disclosure agreement with the Wikimedia Foundation. Since 2007 he's made 378,748 edits.

Just as a thought exercise, try to imagine you are a college professor, and you're approached by a foreign spymaster, who offers you a mission: to edit an encyclopedia in your free time, carrying out research, writing articles, fighting vandalism, making hundreds of thousands of individual edits, debating the finer points of policy, writing dozens of paragraphs arguing with engineers about field-programmable gate arrays, and signing legally binding documents to achieve a position of authority and prestige on said encyclopedia — to do all of this for sixteen years — as a ploy so that one day you can remove a couple paragraphs from an article about a hacktivist. How much money do you think you'd ask for? How many millions of dollars do you think Xi Jinping's budget is for each individual English Wikipedia article? My guess is not enough for this to be a viable strategy.

Wikipedia article histories are public records

Anyway, you can look at the Xtools statistics for the article and see for yourself what the deal is on everyone else. Sideswipe9th, a heavy editor of the article, has often removed material from it, and has also made 9,996 edits over the last few years. They edit a lot of political stuff but they've never been blocked, something which seems fairly difficult to do if you're a saboteur. Jayen466, one of the editors who's argued for inclusion of material in the article, is not only a highly experienced user, but a former editor-in-chief and regular contributor to the Signpost; one would hope he'd be capable of noticing and saying something if he found himself surrounded by psyops agents.

Of course, it's impossible to know who people are in real life without instituting rather intrusive measures that destroy anonymity — something which would bode quite poorly for our editors and readers who live in, say, tin-pot dictatorships where all remotely political Internet activity is monitored and official arbiters of truth given central registries by which to control speech.

Ultimately, it's impossible to say for sure that none of the well-established editors arguing over that article were on the PLA payroll. Or that I'm not, for that matter. The same is true, of course, of the milkman, the firefighter, or the thinkpiece writer — can any empirical knowledge truly be known? — well, no. But some things are just not very likely to be the case, and it's just not very likely that thousands of volunteers who can't even agree on the notability of field-programmable gate array datasheets would be able to carry out a coordinated decades-long operation (after all, unmasking your interlocutor as an international psyops agent is a great way to win the argument).

You may wonder why I am getting so bent out of shape about the accusation of paid editing. Surely this stuff happens all the time. Yes and no — we're volunteers, and it takes a lot of time to track down people who are up to no good, and there sure are a lot of them. But we have a small army of volunteers who sniff out sockpuppet farms and astroturf operations. They are pretty good at it, and something like this would be a gigantic ordeal. The Signpost has reported on dozens of cases of influence operations on Wikipedia getting busted.

Oh, speaking of astroturfing operations getting busted

While we're on the topic of unmasking strange behavior, a couple other things may be worth mentioning about the history of the article. Primarily, one editor — Bugmenot123123123 — created both Cyber Anakin and the since-deleted page 2016 KM.RU and Nival Networks data breaches. Bugmenot123123123 doggedly (and unsuccessfully) defended both pages at their respective deletion nominations in late 2016 (here and here) — and was eventually blocked for disruptive editing. Like I said before, due to our principles of anonymity, it's difficult to know exactly who someone is in real life. The jury is still out on who this is: but they've been remarkably consistent in their agenda over the better part of a decade.

Throughout their tenure, they were persistent in advocating for Cyber Anakin to have an article, for the article to be retained, and for the article to be expanded. In fact, they were so dedicated to championing the cause of Cyber Anakin that, even after their block, they operated several sockpuppet accounts — including Mdikici4001, Mamasanju, and Wizzakk — all of whom were fixated on recreating the article. All of whom, I should note, were rather easily detected and their efforts stymied: it is pretty obvious when five brand-new accounts suddenly try to create articles about the same random hacktivist over and over. This is not the first time someone has tried to do this, and we're not idiots.

It is true that, despite all the sockpuppetry and abuse, Cyber Anakin has a page now. We're not idiots, but we're not Inspector Javert either, and we don't punish people simply because they have aggressive fanboys (or, for that matter, if they are the aggressive fanboys). The article was since put to a second deletion nomination last November, at which it was concluded that there were enough independent, third-party sources to be able to write a neutral, accurate article. I mean, who knows — someone could nominate it again and maybe it would get deleted. Maybe it should. Maybe it will. Or maybe not. Part of working on a collaborative project is that sometimes stuff happens, and then other stuff happens.

But back to Bugmenot123123123.

There are many sockfarms — and I mean hundreds of farms and thousands of socks — with investigation casepages. But there are comparatively fewer long-term abuse pages; these are a distinction reserved for people whose abuse of the project is so persistent and relentless that it's necessary to keep tabs on their modus operandi (like this guy, whose LTA page's "see also" section includes a link to the "California" section of our cyberstalking legislation article — draw your own conclusions from that).

Bugmenot123123123 has been such a giant pain in the ass, for so many years, and in so many diverse ways, that they have a long-term abuse page of their very own:

BMN123 canvasses extensively onwiki, offwiki, and crosswiki, changing proxies frequently and using external rather than normal links (to avoid backlinking) in an attempt to conceal its extent, often seeding in random editors among those targeted as likely supporters.

Edits to Cyber Anakin focus on maintaining the sockmaster's preferred narrative. They will edit-war (Special:Diff/1113301828, Special:Diff/1112087106) and leave long rambling talk page posts (Special:Diff/1113134519, Special:Diff/1112288532) in effort to restore their preferred version when it is disturbed, occasionally leaving warning templates on the talk pages of people who revert them (Special:Diff/1112090413).

Edits on other Anonymous area pages attempt to spam mentions of Cyber Anakin or incidents associated with Cyber Anakin everywhere (Special:Diff/1082332024, Special:Diff/757938463) subsequently edit-warring to retain them (Special:Diff/1117016182 Special:Diff/757940419). In an unblock request they outright stated their intent to spam Cyber Anakin across many pages (Special:Permalink/1117032979#Request_to_downgrade_block_to_partial_block_of_some_pages).

[...]

It is possible that more than one editor is responsible. They've directly claimed to have hired others to edit [1] and have posted on and off-wiki attempting to form a dirty tricks cabal (Special:Diff/1113171407). Regardless policy is clear that when there is uncertainty whether a party is one user with sockpuppets or several users with similar editing habits they may be treated as one user with sockpuppets so the detail is not particularly important.

They've also shown on ability to trick journalists from some marginally reliable sources to defame Wikipedia editors and subsequently post the ultimately self-sourced statements in mainspace as part of their harassment campaign.

Man, that sure would be embarrassing.

Reader comments

2023-12-24

The Italian Public Domain wars continue, Wikimedia RU set to dissolve, and a recap of WLM 2023

Contribute —

By Oltrepier, Andreas Kolbe, HaeB and Bri

The Birth of Venus (c. 1484–1486), by Sandro Botticelli, one of the several works of art involved in recent disputes over cultural heritage in the public domain in Italy.

Court of Audit criticizes Italy’s plan to put public domain behind “pay-wall”

The Italian Court of Audit publicly opposed a recent decision by the Ministry of Culture, led by Gennaro Sangiuliano, to establish minimum fees for the production and publication of digital reproductions of cultural heritage, as recently reported by Wikimedia Italia, as well as several national media (in Italian; the latter two links are behind pay-wall).

As written by Italian lawyers Deborah De Angelis and Giuditta Giardini for Communia last July, in Italy the so-called Cultural Heritage and Landscape Code (CCHL) has been in force since 2004; basically, it was intended to "support the role of cultural heritage institutions in sustainable economic and social development", granting them, among other privileges, discretion to choose whether to make art works such as paintings, frescoes and statues available in the public domain, through the attribution of a Creative Commons licence or, at least, the digital reproduction of images.

However, in recent years some state-owned institutions have taken advantage of this interpretation of the CCHL to start lawsuits against commercial uses of works by Italian artists which, theoretically, should already be in the public domain – for example, Michelangelo’s David, Leonardo da Vinci’s Vitruvian Man and Sandro Botticelli’s The Birth of Venus. As explained by De Angelis and Giardini, these initiatives are likely in contrast with the Article 14 of the Directive on Copyright in the Digital Single Market, adopted by the European Union in 2019 and transposed into domestic law by Italy two years later.

In April of this year, the Italian Ministry of Culture caused even more headaches by publishing "guidelines" for the introduction of minimum fees for the commercial use of digital reproductions of state-owned cultural heritage, including works in the public domain (see previous coverage on Diff and the Signpost). The decree, which was harshly criticized by numerous experts and researchers, contradicts the principles expressed both in the CCHL itself – more specifically, the Articles 1 and 6 – and the Faro Convention (which Italy signed and ratified): they stress the importance of full freedom of access to and sharing of reproductions of cultural heritage in the public domain. If officially implemented, the measures included in the MoC’s decree might not only impoverish Wikimedia’s projects, but also damage activities of research and promotion of Italian culture.

Now, though, the Italian Court of Audit also expressed concern about the ministry's bill in a report named The results of monitoring activities done in the year 2022 and the consequential measures adopted by administrations. In their “Review of consequential measures adopted by administrations" – starting from page 157 of the report – the court give credit to the MoC’s offices for their “important effort in digitization”, as for the goals set by the Digital Library and the National Recovery and Resilience Plan [it], while noting how the introduction of the aforementioned minimum fees looks to be “against [this] trend”, especially in regards to the benefits of open access:

For some time now, Open Access has proven to be a powerful multiplier of wealth not only for the cultural institutions themselves [...], but also in terms of increasing the GDP, and is therefore considered a strategic asset for the social, cultural and economic development of the [European] Union’s member countries. [...] The introduction of such a "fee schedule" seems, moreover, to take into account neither the operative peculiarities of the web, nor the potential damage to the community, which should be measured in terms of [...] lost opportunities, as well; therefore, [the decree] also stands in obvious contrast to the clear indications coming from the National Digitization Plan (PND) of cultural heritage.

What’s more, Avvenire and Il Sole 24 Ore (see the links cited at the top of this story) reported that the Court of Audit had already endorsed the free circulation of digital reproductions of cultural heritage in public domain in an October 2022 document, which included the following quote:

The radical transformations digital [devices and services] have produced in our society encourage [...] the abandonment of traditional "proprietary" paradigms, in favor of a more democratic, inclusive and horizontal vision of cultural heritage. Forms of economic return based on the "sale" of the single image appear anachronistic and largely outdated since, moreover, they are patently uneconomic. There is evidence that, in some cases, the ratio of costs incurred in managing the collection service to the actual revenue generated produces a negative balance.

The question is: will the MoC get the memo this time around? – O

Wikimedia Russia to be dissolved

Stas Kozlovsky

On December 19, 2023, Stas Kozlovsky, the Executive Director of Wikimedia Russia, posted a message on a community page in the Russian Wikipedia, saying that after almost 25 years of work as an associate professor at Moscow State University, he had recently been summoned by the vice-rector and told there was "reliable information" about the imminent intention by Russian authorities to declare him a "foreign agent". He said he was allowed to choose either being dismissed "for absenteeism" or resigning on his own, and eventually "chose the latter" option.

Kozlovsky proceeded to call an emergency meeting of Wikimedia Russia, where he shared this news, and a general decision was taken to close the organization; the liquidation process would take several months.

Stas had taken over as head of Wikimedia Russia earlier this year, after the previous director of the organization, Vladimir V. Medeyko, was indefinitely banned for establishing a government-approved fork of the Russian Wikipedia (see previous Signpost coverage). See the In focus column of this issue for more details on Wikimedia Russia's shut-down, as well as reports from The Moscow Times and Radio Free Europe. – AK, O

Wiki Loves Monuments 2023: a recap

The photo of Florence Cathedral that won one of the two Italian contests for Wiki Loves Monuments 2023 (credits: FrancescoSchiraldi85).

Following the end of the national contests in September and October of this year, the 2023 edition of Wiki Loves Monuments has come to its crucial final phase, and it’s now waiting for the international winners to be publicly announced. Historically, the annual photographic competition organized worldwide by the Wikipedia community has involved dozens of countries across the globe and gifted Wikimedia projects with hundreds of thousands of photos each year; 2023 has made no exception, as users from at least 46 different nations uploaded more than 217,000 images.^[1] 2,343 photos became quality images, 46 were assessed as featured pictures, and two received the valued image treatment.

Five countries made their debut in this year’s competition: Egypt, Togo, Uzbekistan, Zambia and the Dutch special municipality of Sint Eustatius. On the other hand, four nations – Belgium, Georgia, Greece and the United States – came back to the party after more or less prolonged hiatus.

Taking a look at the statistics,^[2] Italy recorded the highest number of uploaded images by a mile, with 52,004 contributions; to put it in context, that’s almost twice as much as second-placed Russia (28,761), and almost thrice as much as third-placed Ukraine (19,641), as well as a huge jump from the previous performances of the Bel Paese itself. Brazil was the highest-ranked, non-European country on the list, coming in fifth place with 13,202 contributions, right above the United Kingdom (12,851); elsewhere, India led the Asian continent from their 10th place (5,754 images), while Nigeria was the first of the African countries in 17th place (2,800), slightly outperforming the US (2,513).

Perhaps unsurprisingly, Italy also topped the chart for the total number of uploaders (946, 565 of whom signed up to Commons during the competition), with Russia (557, 406 of whom registered) and Iran (459, 391 of whom registered) following at moderate distance. It is surprising, however, to see Uganda boast the highest percentage of images that were used in the wikis after being uploaded (about 85%), a feat Egypt (61%) and Malta (44%) are not even close to, despite being on the podium.

Most of the countries involved in Wiki Loves Monuments 2023 have already elected their national winners and/or selected their ten best submissions for the international stage: you can see a comprehensive gallery here.^[3] Just like last year, Italy's committee has once again stood out for their decision to “kill two birds with one stone”, by hosting a traditional contest alongside one that was centered around a specific category of monuments, which in this case turned out to be religious buildings; you can see the winners and finalists of the Italian contest in detail here or here. Now, all we have to do is wait for the announcement of the international winners: let’s see which pictures will make our jaws drop this year! – O

^ Including 97 from Sint Eustatius and 10 with no specific country tag; a considerable number of images was likely submitted over the deadlines.
^ The data for each participating country was updated throughout the whole length of their respective national contests, and halted once they hit their pre-set deadlines.
^ The organizers of the Russian contest have decided not to submit any photos for the final round, in sign of respect for the people affected by the persistent Russian invasion of Ukraine.

Brief notes

The 12th and final admin T-shirt of 2023 was awarded to Clovermoss.

Annual reports: Whose Knowledge?, Wikimedia Community User Group Malta, Wiki Movement Brazil User Group.
Your Wikipedia year in review: A new online tool by User:Jdlrobson allows editors to "look back at all the good work you have been doing this year in helping build the best place on the Internet!", based on their edit history and thanks/thanked log. (A somewhat related feature, focusing on contributions from the last 60 days instead of the last year, was recently integrated into Wikipedia's user interface itself at Special:Impact, see last month's coverage.)
New administrator: The Signpost welcomes the English Wikipedia's newest administrator, Clovermoss. Her RfA passed on 20 December with 218 in support, 5 opposed, 4 neutral.
Articles for Improvement: This week's Article for Improvement is Online encyclopedia. Please be bold in helping improve this article! Up next, starting from December 25, will be Carbon source (biology).

Reader comments

2023-12-24

Consider the humble fork

Contribute —

By Bri, Oltrepier, Red-tailed hawk and Smallbones

Forks are everywhere. If you've got a barn or a stable, there should be a fork inside it to clean out the muck. There are forks in the road, on the internet, on the chess board, on antelopes, in rivers, in beards and tongues, in cryptocurrencies, and almost everybody has forks in their drawers. Maybe we should use chopsticks instead. – S

Have you gotten $2.75 worth of info from Wikipedia? Consider donating

Have you ever unexpectedly run into a (pay)wall?

The Ledger's headline (paywalled) gives the main news: the Florida newspaper is asking for funds from its readers to support Wikipedia. But the bad news is that The Ledger needs to charge its readers to pay its bills. Otherwise, their readers will get cut off by the paywall. The good news is that they will give you "unlimited digital access (costing) $1 for the first 6 months". Everybody, it seems, needs a little green to support their publishing. The better news is that Wikipedia is still free for all readers and has no plans to change that. This reporter has no objections to you donating $2.75 or $25.00 or whatever amount you would prefer. It is not that the Wikimedia Foundation needs your cash now to forestall closing down this website next week, next month, or even next year, but it is just good planning for a non-profit organization to build a solid base of small donors who can ensure that this site will be around for a long time to come. The best news is that The Signpost will always be free – just as we have for almost 19 years – so long as Wikipedia keeps publishing. And to return the plug, Signpost readers should feel free to consider paying a dollar for six months of The Ledger. – S

When you come to a fork in the road, take it.

When you come to a fork in the highway ...

Just another fork

In his ever-informative column in Slate, Stephen Harrison explains in detail why editors from WikiProject Highways created a new website forking Wikipedia's road articles. (We note that The Signpost scooped him on this story.)

In his usual style, Harrison breaks the story into an intriguing introduction, and several tines accompanied by quotes from participants and analysis of Wikipedia's policies and guidelines. In this particular case, he grabs you in the intro with "Wikipedia, road infrastructure, and drama—one of these things doesn’t sound like the other" and a mention of a video that "spills the tea." He then focuses on an editor, identified only as Ben (or bmacs001), and the tines include the difference between editors who are roadgeeks and railfans, with a brief note on possible cultural differences between American and European railfans.

The Wiki-rules discussed include notability, reliable sources, pseudoscience, and no original research.

Of course, no newspaper story is ever perfect: Harrison might have emphasized the fact that the fork has enjoyed a fairly successful start, or that there are no rules against forking Wikipedia (as long as you give proper attribution). Or that there are no prohibitions on users editing both Wikipedia and the fork, and few on importing text from the fork into Wikipedia itself. And he certainly should have mentioned that the word "fork" is likely an inherently funny word. – S

Forked again?

For more detail regarding the claims in this article, see this issue's special report.

In an article for Australian newspaper Quillette, Shuichi Tezuka raises some pointed objections to the way the Wikipedia community handles disputes over coverage of contentious material; for example, he expresses concern about "cognitive distortions" that are perpetuated "by reducing the population of people who raise [objections]... as these users have either quit Wikipedia or been permanently blocked from editing". Tezuka mention the famous "somewhat-viral tweet" of last October and related concerns about WMF spending (see previous Signpost coverage), and concludes that newly-formed fork Justapedia (which recently sparked a discussion on the administrators' noticeboard), is necessary to solve these problems, stating: "the need for such a competitor [to Wikipedia] is stronger now than it has been in past years, due to several recent controversies revolving around the manipulation and/or politicization of Wikipedia, along with a widespread perception that Wikipedia has not done enough to prevent this type of problem." The founder of Justapedia, user Atsme, wrote an op-ed expressing some of the same concerns for the Signpost back in 2020. – B

In brief

A U.S. Congressional hearing on the 2014 Ebola outbreak. Staff have briefed congresspeople from Wikipedia articles on this and other topics.

Congressional staffers rely on Wikipedia...: The good news, according to a former staffer writing in Scientific American blogs, with amplification and analysis on The Hill, is that the U.S. Congress leans on Wikipedia to get smart fast on science crises, "from the 2011 Fukushima nuclear power plant disaster, to the 2014 Ebola outbreak, or the 2016 flooding of social media platforms by disinformation". The bad news is, they can't go deeper because they have gutted their own research arms and don't have staff with the requisite knowledge.
...and Wikipedia relies on staff whose pay scales may lag sector: Business Insider reports that "the salaries of Wikimedia executives are sparking an online debate about tech sector wages"; the fire was started by a screenshot of the WMF's 2021 IRS filing shared on X/Twitter on December 12, which showed the executive salaries of various high-profile figures within the foundation, including former CEO Katherine Maher and former COO Janeen Uzzell.
Why is this night ... er, year, different from all other years?: Cricket, obviously: as written on Deseret News, "no other cricket-centric topics have appeared on the site's [that's us!] year-end most popular articles lists since they began doing the calculations in 2015." The year-end roundup of pageviews was also covered by the Associated Press, Fox Business and CNN, among others.
Hey, look ma, I made it... on Wikipedia!: Back in October, Valeria Costa reported in Domani (in Italian) about how French physicist Pierre Agostini had gained a Wikipedia article – user Uhooep created it first on en.wiki – only after being jointly awarded the 2023 Nobel Prize in Physics, along with Anne L'Huillier and Ferenc Krausz. As noted by Costa, Agostini was not the first Nobel-winning figure who received the "WP:GNG treatment" upon his prestigious achievement, since Donna Strickland had been granted an article back in 2018 (mistakenly reported as 2020).
The disrespect is real: In her review of the Hulu-exclusive TV series The Great, as part of the "Best Shows That Ended in 2023", The New York Times journalist and television critic Margaret Lyons humorously said that, "So many period dramas just feel like inert, expensive Wikipedia entries, but The Great, through its irreverence and artistry, was alive at every turn."
No, Internet doesn't count...: In a November interview for El País (in Spanish), Spanish poet, musician and film director Antón Reixa [es] revealed that, back in 2011, he had shown his own Wikipedia article to the civil registration officer in an attempt to prove his identity and will to change his full name; however, the clerk shrugged it off, stating that "Internet [didn't] count".
100 Books (and one Wiki): Back in November, Italian journalist and literary critic Marino Sinibaldi shed a light on the Wikipedia article for the Bokklubben World Library — published by the Norwegian Book Club [no] in 2002 — during the 100th episode of Il Post-exclusive podcast Timbuctu, which was centered around famous lists of the most memorable books. Sinibaldi also cited Le Monde's 100 Books of the Century list and the BBC's The Big Read, among other examples.

Wikipedia accessibility guidelines expressed as a checklist of "dos" and "don'ts"

Wishing for a more accessible Internet: On 3 December, which marked the International Day of Persons with Disabilities, Nicolas Six reported in Le Monde (in French, partially behind pay-wall) that more than one million French citizens with several grades of visual impairment still struggle to surf on the Internet due to their condition. One of the interviewees, a 75-year-old woman, said that "despite taking weekly IT lessons, [she] can't even go on Wikipedia". If you want to know how to improve accessibility to our platform for all readers, then the Dos and don'ts list and the project pages for WikiProject Accessibility (which is also active on French Wikipedia) and WikiProject Usability should be good places to start from.
Jimbo says no to crypto: As reported on Benzinga and CoinTelegraph, Wikipedia co-founder Jimmy Wales mocked Bitcoin through a post on X, a decision that brought him push-back from some notable figures within the community. Benzinga also noted that the Wikimedia Foundation had decided to stop accepting cryptocurrency donations back in May 2022, following a comprehensive three-month discussion within the community.
Fake work on Wikipedia, or work about a fake Wikipedia?: Author and autofiction specialist Ben Lerner, whom we reported about on last month's disinformation report, was interviewed by radio magazine Here and Now, aired on WBUR-FM. Bizarrely, the host's introduction describes Lerner not as a fiction writer, but an "enterprising journalist" "test[ing] how much he could manipulate [Wikipedia] entries".
Wikimedia RU dissolving: Several Russian media covered the shutdown of Wikimedia Russia, including The Moscow Times, RTVI, Vzglyad, RBK Daily, and TASS [2], as well as Radio Free Europe. See this month's In focus for more about the situation.

Do you want to contribute to "In the media" by writing a story or even just an "in brief" item? Edit our next issue's edition in the Newsroom or leave a tip on the suggestions page.

Reader comments

2023-12-24

Arabic Wikipedia blackout; Wikimedians discuss SpongeBob, copyrights, and AI

Contribute —

By Red-tailed hawk, JPxG and Robertsky

Is SpongeBob SquarePants now freely licensed?

At the policy village pump: what's the deal with media companies uploading drawings and videos of highly recognizable characters under free licenses? Is it a proper release? Do they have the authority to make a proper release? Would it hold up in court? Who knows. One thing's for sure: in one week, the mouse will be freed from his prison. — J

Wikimedia Commons discusses how to handle AI images

On December 6, a discussion was opened at Commons' village pump regarding the proper tagging and use of pictures created using image models (e.g. Midjourney, DALL-E, Stable Diffusion and friends). Should they be permitted? Labeled? Forbidden? There's a litany of opinions. At the center of them is the newly-congealing Commons:AI-generated media. — J

Arabic Wikipedia blacks out main page, logs out all users, publishes statement, and adopts new logo in response to war in Gaza

Following a discussion on the Arabic Wikipedia ( archive alt), the wiki took several actions in response to the war in Gaza. The actions, as summarized by Arabic Wikipedia checkuser and administrator Dr-Taher, are as follows:

The logo of the Arabic Wikipedia was changed to bear the colors of the Palestinian flag indefinitely;
On December 23, the home page of the Arabic Wikipedia was to be blacked out and editing of the wiki would be prohibited;
The Arabic Wikipedia published a statement of solidarity with the Palestinian people; and
A competition is to be arranged by administrators of the Arabic Wikipedia to create content relating to the cause of the Palestinian people (Arabic: القضية الفلسطينية) and to the ongoing war in Gaza.

The header of the Arabic Wikipedia on 23 December 2023

The discussion began when Mervat posted a message to her fellow Arabic Wikimedians, encouraging the wiki to put out a banner message in solidarity with Palestinian Arabs and calling for an end to the war. As conversation developed, محمد أحمد عبد الفتاح lamented the state of the war's coverage on the Arabic Wikipedia, stating that it relied too much upon English sources and that several key articles were too short, and arguing that some changes should be made to encourage the creation of content related to the Palestinian cause and to encourage advocacy on its behalf.

Following initial discussion, a consensus was obtained on the Wiki to issue a statement of solidarity. Dr-Taher made a formal proposal to take several actions in support of Palestinians, including a blackout of the Arabic Wikipedia's main page.

هذا المقترح أعلاه، وأرجو من الزملاء الخبراء في الصور اختيار صورة مناسبة مع إبراز العلم الفلسطيني. سيبدأ الإعلان بـ (يوم إغلاق -Blackout- لصفحة ويكيبيديا الرئيسة) فلا يظهر فيها إلا الإعلان فقط، وكذلك صفحات ويكيبيديا على وسائل التواصل. يلي ذلك عودة الصفحة لما كانت عليه مع بقاء الإعلان في أعلى الصفحة، وتغيير الشعار إلى لون العلم الفلسطيني لحين إشعار آخر. سيبدأ الإداريون ترتيب مسابقة لتطوير المحتوى المتعلق بالحرب وبالقضية الفلسطينية كاملة.
— Dr-Taher

Later on, حبيشان suggested that the logo of the Arabic Wikipedia be changed to bear the colors of the Palestinian flag. The user compared the proposal to the decision by community of the Georgian Wikipedia to change their logos to bear the colors of the Ukrainian flag following the Russia's 2022 invasion of Ukraine.

Throughout the discussion, there was near unanimous support for a statement of support for the Palestinian people, the creation of new content related to the conflict, the blacking out of the main page, and the changing of the logo. A broader ban on editing on 23 December was also implemented; an interface administrator installed a fork of Wikibreak enforcer into the site's Common.js file (diff, archive alt). It is not clear to The Signpost which interface administrator made the edit, as the username of the editor who performed the edit was subject to revision deletion.

Update: NickK filed a Steward request (archive) immediately on Meta after the Wikibreak enforcer came online: requesting for the enforcer to be removed on basis that unsuspecting logged-in editors who visit the Arabic Wikipedia would be logged out without adequate warnings. After a quick discussion with an unanimous view that the protest should not affect logged-in editors on other projects formed, the enforcer was removed.

— R, rs

Reader comments

2023-12-24

Liquidation of Wikimedia RU

Contribute —

By Russian Wikinews editors

This article was originally published in the Russian Wikinews on December 20, 2023. We thank Ymblanter for help with the translation.

On December 19, 2023, the director of Wikimedia Russia, Stanislav Kozlovsky, made several important statements about his forced resignation from his job at the Moscow State University and the dissolution of the WMF's Russian branch.

Background

Kozlovsky had been working at the Moscow State University for 25 years, where he most recently served as a Candidate of Psychological Sciences and Associate Professor at the Faculty of Psychology. In December 2023, he was at the University's branch in Baku, Azerbaijan, giving lectures on psychophysiology to local students, when he was "unexpectedly" forced to interrupt the course, having been called to Moscow by order of the vice-rector of the MSU due to "operational necessity".

Dismissal

On December 18, 2023, during a meeting at the dean's office of the MSU's Faculty of Psychology, Kozlovsky was told that there existed "reliable information" about his inclusion in the list of suspected foreign agents by Russian authorities [since lists of foreign agents are updated on Fridays, his inclusion on the list was expected to be made public on December 22, 2023]. According to Kozlovsky himself, he was offered two options by the MSU's board: either be fired for "absenteeism", or to resign "at his own request"; having been denied further time to think about his future at the university, he ultimately "chose the latter [option]". Later on the same day, Kozlovsky removed information about his place of work from his Wikipedia user page.

However, TASS later reported about a press statement by the MSU's Faculty of Psychology itself regarding the events associated with Kozlovsky’s dismissal, in which the office denied his version of the facts, while also claiming that neither the university, nor the faculty knew anything "on the inclusion of S. A. Kozlovsky in any lists".

Meeting of Wikimedia RU and the decision to dissolve the organization

On the evening of December 18, Kozlovsky called an emergency meeting of Wikimedia RU's NP members, where he reported his forced resignation from his job at the MSU. As he later said, in an interview with RBK, the Wikimedia RU administrators agreed that "it [was] impossible to work in such conditions" and "decided to dissolve the organization", although the "closing formalities [would] take several months". After this was reported on the Russian Wikipedia's Village Pump, users expressed understanding and many found some words in support.

Further comments

At 07:38, December 19, 2023 (UTC), Kozlovsky posted on the Russian Wikipedia news forum with a story about the events described. He also gave several interviews to Russian media, including TASS, RBC, RTVI, and Vzglyad. He said that Wikimedia RU has never been responsible for Wikipedia. Instead, it supported Wikiprojects in Russian (those, along with Wikipedia, include Wiktionary, Wikinews, Wikisource, and others). It prepared textbooks and offered online courses on editing these projects, organized conferences, seminars, and lectures, and worked with copyright holders to facilitate transfer of works to Wikipedia under free licenses. According to him, in recent years, "everyone has become afraid" of dealing with Wikimedia RU, although there has been no obvious pressure on the organization until now.

In an interview with another publication, the business newspaper Vzglyad, Stanislav said that he does not intend to leave Russia. "I do not have anywhere to go."

Kozlovsky was also not sure about what exactly was the reason for the possible inclusion of foreign agents on the list. Since he gave lectures in Baku, Kozlovsky joked: "Maybe they want to recognize me as a foreign agent of Azerbaijan? I don't know".

In addition, Kozlovsky recalled that there has been discussion about blocking Wikipedia for more than ten years, but it has never happened: "It [the blocking] could happen any day, but it doesn’t happen. I hope this never happens."

Sources

Evgenia Lepekhina; Elena Lepekhina (December 19, 2023). "Проект поддержки русскоязычной «Википедии» объявил о закрытии" [The project supporting Russian-language Wikipedia announced its closure]. RTVI.
Alexey Degtyarev (December 19, 2023). "Российское сообщество поддержки «Википедии» решили закрыть" [[They] decided to close the Russian support community of Wikipedia]. Vzglyad.
Kirill Sokolov; Margarita Ovsyannikova (December 19, 2023). "Сообщество российской «Википедии» решило прекратить работу" [The Russian Wikipedia community has decided to shut itself down]. RBK.ru. RBK Group.
"Викимедиа РУ закрывается" [Wikimedia RU is closing]. Russian Wikipedia (discussion at the community "News" forum). 20 December 2023.
"В России закроют сообщество поддержки русскоязычной «Википедии»" [The support community for Russian-language Wikipedia will be closed in Russia]. TASS. December 19, 2023.
"В МГУ сообщили об увольнении директора «Викимедиа.ру» из вуза по собственному желаниюs" [MSU announced the dismissal of the director of Wikimedia.ru from the university at his own request]. TASS. December 19, 2023.

Reader comments

2023-12-24

Dark mode is coming

Contribute —

By Olga Vasileva, Szymon Grabarczuk, and Jon Robson

This post was originally published on Diff.

Olga Vasileva, Szymon Grabarczuk, and Jon Robson are the Wikimedia Foundation's Web Team Project Manager, Lead Community Relations Specialist, and Web Software Developer respectively.

Image for the dark mode beta feature, design by Justin Scherer (WMF).

“

When we’re at night, the white skin of Wikipedia dazzles us and it’s very uncomfortable. I suggest a night mode switch for users or at least a darker color. Also available for the mobile version.

”

— VictorPines, 2017

“

Some kind of toggleable dark or night-mode. It would be most accessible as a feature for everyone and not just a new skin for logged-in users.

”

— Premeditated Chaos, 2018

“

Colors of Wikimedia projects are white or near white, which on long view time causes damage to eyes, and consumes more energy on the laptop.

”

— David L, 2021

“

Please add dark mode!!

”

— Crenshire, 2023

The Wikimedia Foundation has seen many requests like these. Dark mode is available in the Wikipedia mobile apps, but still not in the web browser. It’s been a common request from editors in the Community Wishlist Surveys and the rollouts of the Vector 2022 skin — hundreds of comments! We would like to thank for all these.

Some time ago, a few Foundation staff members, Volker, Alex, Carolyn, and MusikAnimal, built a dark mode script as an experiment. It has become a popular gadget across wikis. But until this year, making dark mode a regular part of the interface was not possible. Now, with help from communities, we are finally ready to work on this feature! Continue reading to learn about the benefits of dark mode, what made it possible, and how to get involved.

Why dark mode?

Dark mode improves accessibility. The primary benefit is that it reduces eye strain. When we’re in a long reading or editing session, particularly when it’s dark around us, the contrast between a bright screen and the surrounding darkness can cause discomfort. Dark mode mitigates this by giving us a darker background with light text, reducing glare and minimizing eye fatigue. This feature is especially helpful for night-time readers or readers who spend lots of time on their devices.

Many readers and editors favor dark mode. The softer, darker hues can be less harsh on the eyes and create a more relaxed reading environment, enhancing the reading experience.

What made building dark mode possible?

In the past, it was not possible to change our web interface based on the preferences of logged-out users. These users couldn’t set a preferred page density, change the font size, or set a dark mode. Also, the MediaWiki skin and design architecture made it difficult to maintain two color schemes (light and dark). It was necessary to improve these three facets first.

We began with improving the skin architecture – we were doing this while building Vector 2022. This laid the foundation for further interface changes.
Next, the Design Systems team introduced Codex and with it “design tokens”. These are useful variables, like templates on wikis that allow us to make and centralize color definitions.
Finally, we added the ability to provide preferences for logged-out users. When working on Vector 2022, we built a toggle changing the content area width. After listening to editors’ opinions and some creative thinking, we made it available for logged-out users, too. Next, our engineers and architects created a wider system, allowing us to make more settings customizable.

With this system in place, we could begin planning the Accessibility for reading project. This is our response to users’ need to read the wikis comfortably and to adjust the settings. In the first step, logged-in and logged-out users will be able to select different font sizes and text density. Dark mode will be next.

How? Together. But how exactly, and how to get involved?

Editors control content which includes templates: amboxes, infoboxes, navboxes, as well as bitmaps, timelines, tables, and more. Some of those, like weather and sports tables, use colors in a meaningful, or semantic, way. Simply inverting these colors would immediately lose their meaning. We need to find other options.

Whatever technical approach we choose, we will coordinate with editors. We may build different solutions for big and small communities. In the coming weeks, we will reach out with specific questions and ideas.

We would like to start gradually, with a limited number of communities and wikis. First, the dark mode would be a beta feature. As such, it would only be available for logged-in users who decide to enable it. Any logged-in user will have an opportunity to test alongside us as we build out the final version.

We will talk to interface admins, template and module maintainers, and editors interested in making the wikis easier to read for everyone. Together with them, we would like to work on recommendations for making pages more friendly to dark mode. We will also help them adjust the current code on the wikis. When enough pages become dark-mode friendly, we will roll dark mode out for logged-out users. (On a side note, we aren’t sure how many pages are enough. We will ask about that, too!)

How do you feel about all this? Write on our project talk page. Be sure to subscribe to the Web team newsletter to never miss an update from us. Thank you! —OV, SG, JR (WMF)

Reader comments

2023-12-24

"LLMs Know More, Hallucinate Less" with Wikidata

Contribute —

By Tilman Bayer

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

"Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over Wikidata"

Overview of how the authors' "WikiSP" semantic parser is used to answer a user's question:
"An entity linker is used to link entities in the user query to their unique ID in Wikidata; e.g. “A Bronx Tale” is linked to entity ID “Q1130705”. The query and entity linker outputs are fed to the WikiSP semantic parser to produce a modified version of SPARQL, where property IDs (e.g. “P915”) are replaced by their unique string identifiers (e.g. “filming_location”). If applying the [SPARQL] query to Wikidata fails to return a result, we default to [OpenAI's large language model] GPT-3, labeling the result as a GPT-3 guess. Returned answers are presented in the context of the query, so the user can tell if the answer is acceptable; if not, we also show the guess from GPT-3. Here WikiSP mistakenly uses “filming_location” instead of “narrative_location”; the user detects the mistake, thumbs down the answer, and the GPT-3 answer is provided."

This paper^[1] (by five graduate students at Stanford University's computer science department and Monica S. Lam as last author) sets out to show that

While large language models (LLMs) can answer many questions correctly, they can also hallucinate and give wrong answers. Wikidata, with its over 12 billion facts, can be used to ground LLMs to improve their factuality.

To do this, the paper "presents WikiSP, a few-shot sequence-to-sequence semantic parser for Wikidata that translates a user query, along with results from an entity linker, directly into SPARQL queries [to retrieve information from Wikidata]." It is obtained by fine-tuning one of Facebook/Meta LLaMA 1 large language models.

For example, the user question "What year did giants win the world series?" is supposed to be converted into the query SELECT DISTINCT ?x WHERE {?y wdt:sports_season_of_league_or_competition wd:Q265538; wdt:winner wd:Q308966; wdt:point_in_time ?x. }. The paper uses a modified SPARQL syntax that replaces numerical property IDs (here, P3450) with their English-language label (here, "sports season of league or competition"). The authors motivate this choice by observing that "While zero-shot LLMs [e.g. ChatGPT] can generate SPARQL queries for the easiest and most common questions, they do not know all the PIDs and QIDs [property and item IDs in Wikidata], and nor is it possible to include them in a prompt."

To evaluate the performance of "WikiSP", and as a second contribution of the paper, the authors present

[...] WikiWebQuestions, a high-quality question answering benchmark for Wikidata. Ported over from WebQuestions for Freebase, it consists of real-world data with SPARQL annotation. [...]

Despite being the most popular large knowledge base for a long time, existing benchmarks on Wiki-
data with labeled SPARQL queries are unfortunately either small or of low quality. On the other hand, benchmarks over the deprecated Freebase still dominate the KBQA research with better-quality data.

Using this new benchmark, "Our experimental results demonstrate the effectiveness of [WikiSP], establishing a strong baseline of 76% and 65% answer accuracy in the dev and test sets of WikiWeb- Questions, respectively." However, the paper's "Limitations" section hints that despite the impressive "12 billion facts" factoid that the paper opens with, Wikidata's coverage may be too limited to answer most user questions in a satisfying manner:

Even though knowledge bases are an important source of facts, a large portion of the knowledge available in digital form (e.g. Wikipedia, news articles, etc.), is not organized into knowledge bases. As such, the results of this paper can be considered complementary to the larger body of fact-checking research based on free text.

To address this weakness, the authors combine this Wikidata-based setup with a standard LLM that provides the answer if the Wikidata query fails to return a result. They state that

By pairing our semantic parser with GPT-3, we combine verifiable results with qualified GPT-3 guesses to provide useful answers to 96% of the questions in dev.

Data and evaluation code from the paper have been released in a GitHub repo, where the authors state that "We are now working on releasing fine-tuned models."

The paper's endeavour bears some similarity to a paper authored by a different team of Stanford graduate students with professor Lam that sought to use Wikipedia (rather than Wikidata) to reduce LLM hallucations, see the review in our July issue: "Wikipedia-based LLM chatbot 'outperforms all baselines' regarding factual accuracy".

Briefly

See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.
US-based editors wanted for workshop on research ethics: For a research project titled "Beyond the Individual: Community-Engaged Design and Implementation of a Framework for Ethical Online Communities Research", a team from the University of Minnesota's GroupLens lab is seeking US-based Wikipedia editors to participate in a 2-hour remote workshop, to discuss "ways that research can help or harm the community" (following up on a previous workshop with non-US-based English Wikipedia editors). Interested users can sign up here.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata"

From the abstract:^[2]

"In this work, we explore the use of Large Language Models (LLMs) for knowledge engineering tasks in the context of the ISWC 2023 LM-KBC Challenge. For this task, given subject and relation pairs sourced from Wikidata, we utilize pre-trained LLMs to produce the relevant objects in string format and link them to their respective Wikidata QIDs. [...] The method achieved a macro-averaged F1-score of 0.701 across the properties, with the scores varying from 1.00 to 0.328. These results demonstrate that the knowledge of LLMs varies significantly depending on the domain and that further experimentation is required to determine the circumstances under which LLMs can be used for automatic Knowledge Base (e.g., Wikidata) completion and correction. The investigation of the results also suggests the promising contribution of LLMs in collaborative knowledge engineering. LLMKE won Track 2 of the challenge.

"Large language models learn to organize concepts in ways that are strikingly similar to how concepts are organized in [Wikidata]"

From the abstract:^[3]

"Knowledge bases such as WikiData provide large-scale, high-quality representations of inferential semantics and world knowledge. We show that large language models learn to organize concepts in ways that are strikingly similar to how concepts are organized in such knowledge bases. Knowledge bases model collective, institutional knowledge, and large language models seem to induce such knowledge from raw text. We show that bigger and better models exhibit more human-like concept organization, across four families of language models and three knowledge graph embeddings."

"Enhancing Multilingual Language Model with Massive Multilingual Knowledge Triples" from Wikidata

From the abstract:^[4]

[...] we explore methods to make better use of the multilingual annotation and language agnostic property of KG [ knowledge graph ] triples, and present novel knowledge based multilingual language models (KMLMs) trained directly on the knowledge triples. We first generate a large amount of multilingual synthetic sentences using the Wikidata KG triples. Then based on the intra- and inter-sentence structures of the generated data, we design pretraining tasks to enable the LMs to not only memorize the factual knowledge but also learn useful logical patterns. Our pretrained KMLMs demonstrate significant performance improvements on a wide range of knowledge-intensive cross-lingual tasks, including named entity recognition (NER), factual knowledge retrieval, relation classification, and a newly designed logical reasoning task.

"KGConv, a Conversational Corpus grounded in Wikidata"

From the abstract:^[5]

"We present KGConv, a large, conversational corpus of 71k conversations where each question-answer pair is grounded in a Wikidata fact. Conversations contain on average 8.6 questions and for each Wikidata fact, we provide multiple variants (12 on average) of the corresponding question using templates, human annotations, hand-crafted rules and a question rewriting neural model. We provide baselines for the task of Knowledge-Based, Conversational Question Generation. [...]"

"WikiDialog" dataset: "Dialog inpainting" using Wikipedia

From the abstract^[6] of a paper presented by a team of Google researchers at last year's ICML conference:

"[...] conversational question answering (ConvQA) systems have long been stymied by scarce training data that is expensive to collect. To address this problem, we propose a new technique for synthetically generating diverse and high-quality dialog data: dialog inpainting. Our approach takes the text of any document and transforms it into a two-person dialog between the writer and an imagined reader: we treat sentences from the article as utterances spoken by the writer, and then use a dialog inpainter to predict what the imagined reader asked or said in between each of the writer's utterances. By applying this approach to passages from Wikipedia and the web, we produce WikiDialog and WebDialog, two datasets totalling 19 million diverse information-seeking dialogs -- 1,000x larger than the largest existing ConvQA dataset. Furthermore, human raters judge the answer adequacy and conversationality of WikiDialog to be as good or better than existing manually-collected datasets."

As "a real example of a dialog inferred from a Wikipedia passage using dialog inpainting" the paper presents the following (abridged) exchange between an "imagined reader" of the Freshman 15 article and a Wikipedia "Writer" who (after the initial greeting) always answers with excerpts from the article, with all other sentences filled in by the inpainter:

Writer: "I'm an automated assistant. I can tell you about Freshman 15."
Imagined Reader: "How does the freshman 15 relate to eating habits?"
Writer: "In parental-supervised diets, students also usually ingest the proper proportion [...]"
Imagined Reader: "What is the cause of this?"
Writer: "This is because when students go off to college, they face an independence [...]"
Imagined Reader: "Do people tend to eat healthier or less healthy when they are away from home?"
Writer: "Research has shown that over 60 percent of college students commonly ingest [...]"

Wikipedia-based "Retrieval Augmentation Reduces Hallucination in Conversation" with large language models

From the abstract of a 2021 paper by a team from Facebook AI Research:^[7]

"Despite showing increasingly human-like conversational abilities, state-of-the-art dialogue models often suffer from factual incorrectness and hallucination of knowledge (Roller et al., 2020). In this work we explore the use of neural-retrieval-in-the-loop architectures [retrieving articles from Wikipedia] [...] for knowledge-grounded dialogue [...] We demonstrate that our best models obtain state-of-the-art performance on two knowledge-grounded conversational tasks. The models exhibit open-domain conversational capabilities, generalize effectively to scenarios not within the training data, and, as verified by human evaluations, substantially reduce the well-known problem of knowledge hallucination in state-of-the-art chatbots."

Large language models as an alternative to Wikidata?

From the abstract:^[8]

"Pre-trained language models (LMs) have recently [as of 2021] gained attention for their potential as an alternative to (or proxy for) explicit knowledge bases (KBs). In this position paper, we examine this hypothesis, identify strengths and limitations of both LMs and KBs, and discuss the complementary nature of the two paradigms."

The authors acknowledge that "Starting from [a 2019 paper], many works have explored whether this LM-as-KB paradigm [i.e. the ability of LLMs to answer factual questions, by now familiar to users of ChatGPT] could provide an alternative to structured knowledge bases such as Wikidata. However, the paper concludes, as of 2021,

[...] that LMs cannot broadly replace KBs as explicit repositories of structured knowledge. While the probabilistic nature of LM-based predictions is suitable for task-specific end-to-end learning, the inherent uncertainty of outputs does not meet the quality standards of KBs. LMs cannot separate facts from correlations, and this entails major impediments for KB maintenance. We advocate, on the other hand, that LMs can be valuable assets for KB curation, by providing a “second opinion” on new fact candidates or, in the absence of corroborated evidence, signal that the candidate should be refuted.

References

^ Xu, Silei; Liu, Shicheng; Culhane, Theo; Pertseva, Elizaveta; Wu, Meng-Hsi; Semnani, Sina; Lam, Monica (December 2023). "Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over Wikidata". In Bouamor, Houda; Pino, Juan; Bali, Kalika (eds.). Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. EMNLP 2023. Singapore: Association for Computational Linguistics. pp. 5778–5791. doi:10.18653/v1/2023.emnlp-main.353. Data and evaluation code
^ Zhang, Bohui; Reklos, Ioannis; Jain, Nitisha; Peñuela, Albert Meroño; Simperl, Elena (2023-09-15). "Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata". arXiv:2309.08491 [cs.CL]. code
^ Gammelgaard, Mathias Lykke; Christiansen, Jonathan Gabel; Søgaard, Anders (2023-08-29). "Large language models converge toward human-like concept organization". arXiv:2308.15047 [cs.LG].
^ Liu, Linlin; Li, Xin; He, Ruidan; Bing, Lidong; Joty, Shafiq; Si, Luo (December 2022). "Enhancing Multilingual Language Model with Massive Multilingual Knowledge Triples". In Goldberg, Yoav; Kozareva, Zornitsa; Zhang, Yue (eds.). Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. EMNLP 2022. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics. pp. 6878–6890. doi:10.18653/v1/2022.emnlp-main.462.
^ Brabant, Quentin; Lecorve, Gwenole; Rojas-Barahona, Lina M.; Gardent, Claire (2023-08-29). "KGConv, a Conversational Corpus grounded in Wikidata". arXiv:2308.15298 [cs.CL].
^ Dai, Zhuyun; Chaganty, Arun Tejasvi; Zhao, Vincent Y.; Amini, Aida; Rashid, Qazi Mamunur; Green, Mike; Guu, Kelvin (2022-06-28). "Dialog Inpainting: Turning Documents into Dialogs". Proceedings of the 39th International Conference on Machine Learning. International Conference on Machine Learning. PMLR. pp. 4558–4586. Dataset, poster presentation
^ Shuster, Kurt; Poff, Spencer; Chen, Moya; Kiela, Douwe; Weston, Jason (2021). "Retrieval Augmentation Reduces Hallucination in Conversation". arXiv:2104.07567 [cs.CL]., also in: Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3784–3803 November 7–11, 2021
^ Razniewski, Simon; Yates, Andrew; Kassner, Nora; Weikum, Gerhard (2021-10-10). "Language Models As or For Knowledge Bases". arXiv:2110.04888 [cs.CL].

Reader comments

2023-12-24

A feast of holidays and carols

Contribute —

By Smallbones

I love Christmas carols, especially the old ones. Charles Dickens's story A Christmas Carol is not that old — first published in 1843 — but is written in the form of a "Christmas carol in prose", according to the title page. Its chapters are even called staves. In the first stave, a passing caroler sings a small snippet of an old carol to Scrooge. Do you know the Christmas carol sung in A Christmas Carol?

"God Rest Ye Merry, Gentlemen" goes back to the 1650s, but songs have been associated with mid-Winter holidays for over 2,000 years. For example, the Roman holiday Saturnalia was associated with song, as well as wine and political incorrectness — though it should not be confused with Bacchanalia. There's even a modern Saturnalia song, sung in Latin, titled "Io, Saturnalia" (In English: "Yo, Saturnalia") which might be better to skip.

Carols are not necessarily religious, but they are almost always happy music you can dance to. "O Tannenbaum" means "Oh, fir tree" in German but is usually translated into English as "Oh, Christmas Tree". Other than the word "Christmas", the song has little to do with religion. It just praises the fir tree's "faithfulness" — its ability to stay green all Winter. In German, in French, and in English.

Religious carols

My favorite religious carols include:

"Good King Wenceslas" — celebrates the day after Christmas, the Feast of Stephen, and emphasizes the importance of charity (and gift-giving in general).

"It Came Upon the Midnight Clear" — a song that has lyrics from a poem of the same name, and is a very intellectual expression of the author's personal interpretation of the meaning of Christmas. It may mark his joy at the announcement of peace ending the Mexican-American War.

"O Holy Night" — sends a similar message.

Ramsey Lewis gives a jazz version of "We Three Kings".

To fully appreciate "O come all ye faithful", you need to hear it in a large packed church with a powerful organ belting it out on Christmas Eve. The original Latin version, Adeste Fideles, can be even more powerful. Strangely, though I only know a few words of Latin, I always think of it as Venite Adoremus from the words in the chorus that translate to "Oh come let us adore (him)".

The explanation is the quirky, sprightly carol "The Snow Lay On the Ground", which also uses the words venite adoremus. The lyrics are attributed to a 19th-century Italian folk song, but three quarters of the time you just sing venite adoremus.

Another folk song, an African-American spiritual, "Go Tell It on the Mountain", is an expression of pure joy. It was first mentioned in 1901, and published in 1909.

Diverse points of view

Modern Christmas carols and songs express many of the same themes as the earlier carols, adapted to the current state of the world. But I'm not going to link to "All I Want for Christmas is You" — you know where to find it, and you know that you have heard it enough already this year.

There are also many people who live in different circumstances in other countries, who celebrate different Winter holidays, and worship in different faiths. Nobody should be left out at this time of year. We are sorry that there is not enough time to cover everybody's circumstances.

"Silver Bells" brings great memories of "Christmastime in the city". But I also have mixed feelings on its message. Is it meant to honor the Salvation Army? Or is it just an advertisement for the modern commercialized holiday that seems to start in October? Or maybe it is just a great song, in a bad movie, starring an even worse comedian?

There is no doubt that Elvis Presley's "Blue Christmas" is a great song. But sometimes I wonder if it has anything to do with Christmas.

José Feliciano's song [3] "Feliz Navidad" causes no such mixed feelings. A little bit of repetition never hurt a Christmas song.

Russia and Ukraine both have long traditions of celebrating Christmas and New Years'. And they share some of them.

В лесу родилась ёлочка ("In the woods is born a fir tree") is a Russian children's New Years' song. It mirrors Oh, Christmas Tree but includes a cute little bunny, an angry wolf, and most kiddy videos include Father Frost (a Slavic Santa Claus).

The music to this little Christmas dance was written by a gay Russian composer whose grandfather was born in Ukraine.

Do not be fooled by a bit of chaos at the start to this video of Ukrainian carolers.

These shared traditions only make the current war more tragic.

There are other tragedies happening right now that involve different religions that share, in part, a common heritage.

You might think it would be difficult finding a Jewish Christmas carol, but a song often called "The best selling Christmas song of all time" was written by Irving Berlin, a Jew.

Hanukkah songs include "The Dreidel Song" and "Hanukkah Rocks" [4] by The LeeVees (from NPR's Tiny Desk Concert).

You might think there are no Muslim Christmas songs, and you might be right. But Muslims are allowed to borrow the Christmas carols they like, just like anybody else. This is the view put forward in these two thought provoking videos.

We all share part of our common human heritage. We all share in our common human tragedy.

Reader comments

2023-12-24

Lollus lmaois 200C tincture

Contribute —

By JPxG

"Uh, anybody gonna noticeboard these guys...?"

Reader comments

2023-12-24

when the crossword is sus

Contribute —

By JPxG

As part of my grueling mountain training regimen to become an elite webshit, I have learned enough CSS to make the Signpost crossword template usable. Essentially, it is a grotesque hack of the InputBox extension — full documentation can be found here — so there are some issues. Namely, if you press "enter" in any of the cells, it will take you to another page. I can't do anything about this. I can change where it goes (enabling the fun surprise you'll see if you do it) but I can't make it go away.

Anyway, here's the deal for this issue: everything is an abbreviation except 1-Across.

..	..	..	..	..	.
..	..	1 .	2 .	3 .	.
..	..	4 .	.	.	.
..	5 .	.	.	..	.
..	.	..	..	6 .	.
..	7 .	8 .	9 .	..	.
..	..	10 .	.	..	.
..	..	11 .	.	12 .	.
..	13 .	.	..	.	.
..	.	..	..	.	.
..	..	..	..	..	.

Note: This experimental wikitext-based crossword utility should allow you to click on the boxes and type in letters. Don't press "enter" unless you want to end up on another page.

Across
1	One in real life often causes one on its Wikipedia article	WAR
4	Where naughty Signpost articles are taken to meet their doom	MFD
5	Overwhelmed by Neelix	RFD
6	Policy requiring that sources can be checked	V
7	Attention, calling all cars, we have a BLP vio in progress on 5th street and Centerline... we have additionally reports of a "poopoo peepee" past uw-4im in the 1600 block of Stamson... exercise extreme caution, suspect may be armed with a proxy...	AIV
10	Department where employees love adding "mission statement" to the company's infobox	PR
11	A parenthetical note made in passing	BTW
13	Where you could take Senkaku Islands and tree shaping edit-warriors to be dealt with, until quite recently	AE
Down
1	They roll the nickels	WMF
2	Where the work of 10-across departments tends to wind up	AFD
3	5-across deals with, and 2-down can close as	RD
5	Challenge-pissing extravaganza for demonstrating a need for the tools	RFA
8	Permission granted to public wifi enjoyers, college editors, and others frequently hit by rangeblocks	IPBE
9	Celeb email receivers	VRT
12	Retired annelid arbitrator	WTT
13	Zoomer/moomer version of "hella"; alternately, camera setting for non-supervised crispness	AF

Note: the chronologically previous crossword appeared in the 26 June 2022 issue, in the humour column.

Reader comments

2023-12-24

What's the big deal? I'm an animal!

Contribute —

By Igordebraga, TheJoebro64, CAWylie, Ltbdl, Shuipzv3, Krimuk2.0 and Death Editor 2

This traffic report is adapted from the Top 25 Report, prepared with commentary by Igordebraga, TheJoebro64, CAWylie, Ltbdl, Shuipzv3, Krimuk2.0 and Death Editor 2 (December 3–9, 2023) and by Igordebraga and CAWylie (December 10–16, 2023).

Hey, what's the big deal? Tell me how to feel (December 3–9, 2023)

Rank	Article	Views	Notes/about
1	Animal (2023 film)	5,560,820	I'd rather be with an animal... Ranbir Kapoor stars in this Bollywood movie about a man's quest for revenge on the guys who tried to kill his father, that managed to become the highest-grossing in the world for its opening weekend (even entering the American top 10), overcoming mixed reviews on how the protagonist is an embodiment of toxic masculinity.
2	Ryan O'Neal	1,313,128	An American actor who died at the age of 82, whose best moments were in the 1970s with films like Love Story, What's Up, Doc?, Barry Lyndon, and Paper Moon, the last of which alongside daughter Tatum O'Neal (who won the Oscar at just 10 for her role). Regarding the rest of his career, let's just leave this.
3	Tripti Dimri	1,101,282	This Indian actress's small role (slightly larger than a cameo appearance) in #1 was apparently a "treat to look out for", leading to loads of Wikipedia views and Instagram followers.
4	Leave the World Behind (film)	986,506	Released in theaters in November and to streaming this week, this apocalyptic thriller, based on the novel of the same name, depicts what happens to humanity when technology fails us, or rather is controlled to do so. The film is produced by and stars Julia Roberts (pictured).
5	Deaths in 2023	980,794	Last day of the rest of my life I wish I would've known 'Cause I didn't kiss my mama goodbye...
6	Shane MacGowan	926,977	People continued to mourn the frontman of The Pogues, a punk of many words and few teeth. His funeral was attended by the president of Ireland and celebrities such as Nick Cave, Johnny Depp, Bob Geldof, Aidan Gillen, and last but certainly not least former Sinn Féin leader Gerry Adams, and featured a performance of the holiday classic "Fairytale of New York".
7	Macaulay Culkin	926,247	Success might have waned for this actor as an adult, but Culkin is now eternized with a star at the Hollywood Walk of Fame. Among those in attendance were Catherine O'Hara, who played his mom in Culkin's best known role, Home Alone and its sequel.
8	Godzilla Minus One	885,761	The giant radioactive dinosaur that is one of Japan's cultural icons will have its 70th anniversary in 2024, so Toho celebrated one year earlier with a period piece where Godzilla emerges shortly after the end of World War II. Godzilla Minus One earned positive reviews and along with already having paid itself with the Japanese box office, is performing well in North America, with two straight weekends at #3, behind The Hunger Games: The Ballad of Songbirds & Snakes as runner-up to Renaissance: A Film by Beyoncé and The Boy and the Heron. (In the meantime, the American Godzilla of the MonsterVerse got a trailer for his return in Godzilla x Kong: The New Empire.)
9	Norman Lear	876,342	The television king, who wrote, created, or developed over 100 shows in the 1970s and 1980s, died at age 101 on December 5. His sitcoms cleverly broached political and social themes of the time.
10	Premier League	822,401	The highest-level English football system made headlines over a stalemate to help fund other struggling systems.

Broadcast me a joyful noise unto the times, lord (December 10–16, 2023)

Rank	Article	Views	Notes/about
1	Animal (2023 film)	3,332,995	It's animal, livin' in the human zoo...
2	Leave the World Behind (film)	2,807,615	The film (co-star Mahershala Ali pictured) with a simple message, and a lackluster ending, was the top film on Netflix with 41.7 million views.
3	Andre Braugher	2,557,735	This double-Emmy Award-winning actor of television, film, and stage died at age 61 from lung cancer on December 11.
4	Deaths in 2023	1,001,973	If I wane, this could die If I wait, this could die...
5	Tommy DeVito (American football)	820,601	This "zero-to-hero" quarterback is keeping the New York Giants in NFL playoff contention.
6	UEFA Champions League	718,583	The group phase of Europe's top club tournament ended. Most of the qualified teams aren't surprising (including last year's finalists Manchester City and Inter Milan and perennial favorites Real Madrid, Bayern Munich and FC Barcelona), but there was still room for F.C. Copenhagen over the once mighty Manchester United.
7	Shohei Ohtani	744,102	"Shotime", now a Los Angeles Dodgers player, just signed the largest contract in professional sports history: ten years, US$700 million.
8	Wonka (film)	701,240	Charlie and the Chocolate Factory already inspired two hit movies, so now there's an attempt at a prequel telling Willy Wonka's beginnings as a chocolatier, starring Timothée Chalamet. Praised by reviewers as great family picture with impressive production values and another catchy soundtrack, Wonka arrives in North America one week after being released in 37 countries, and is expected to debut atop the box office.
9	Premier League	643,562	The latest season of English football keeps on rolling. Arsenal F.C. are currently leading, and hoping they won't choke in the final rounds like last season, specially to give the resonance of another title in the 20th anniversary of their unbeaten championship.
10	List of highest-grossing Indian films	626,796	Ten of the 50 highest-grossing Indian films were released in 2023, with #1 quickly cracking the Top 10 upon release last week.

Exclusions

These lists exclude the Wikipedia main page, non-article pages (such as redlinks), and anomalous entries (such as DDoS attacks or likely automated views). Since mobile view data became available to the Report in October 2014, we exclude articles that have almost no mobile views (5–6% or less) or almost all mobile views (94–95% or more) because they are very likely to be automated views based on our experience and research of the issue. Please feel free to discuss any removal on the Top 25 Report talk page if you wish.

Reader comments

2023-12-24

A piccy iz worth OVAR 9000!!!11oneone! wordz ^_^

Contribute —

By JPxG

There are some unique challenges in working with a content management system consisting of nineteen years of HTML, CSS, MediaWiki markup, Lua, two separate JavaScript user scripts, Python, and the specific template ecosystem of the English Wikipedia running a backend and a frontend on top of another backend and frontend. It is less of a "tech stack" and more of a "tech pile". Nevertheless, some progress has been made on the suite (SPS.js, SignpostTagger, Module:Signpost, Wegweiser, and the various internal Signpost templates). Most recently, two new features have been integrated into the pile: subheadings and images in the database.

The subheadings were particularly difficult. The primary issue is that they weren't saved anywhere: they were used in templates on the main page, at Wikipedia:Wikipedia Signpost, and then erased when the next issue was published. Which means that, well, they were saved somewhere. But the only way to get them out was to go through the entire history of the main Signpost page, manually identify each revision that was associated with a specific issue, and then manually copy the text of the subheading from each item into the templates on that issue's archive page. Well, not entirely manually: I wrote a script to do the last part. Then I put together a second script to extract those subheadings out of the issue pages and add them to the RSS description templates in the actual article pages.

But then, once they were in the RSS description templates, it was simple to add some code to the existing metadata fetcher script to incorporate the subheadings into the data it passed to the Lua serializer − and similarly simple to incorporate subhead parsing/formatting into the publishing script, SignpostTagger, the module's own output, and the snippet display templates.

Well, for arbitrary definitions of "simple". The long and short of it is that I was able to recover all of the previously-lost subheadings for every Signpost article going back to July 2012 when subheadings were first used. It's also now possible to use the module to make dynamic article lists, now that the database contains the full set of information associated with an article, rather than the previous system of hardcoding everything into individual issue pages (yeesh!).

Some more work will be necessary to fully modernize various things — archive pages, for example, still use the weird redundant {{Signpost/item}} instead of {{Signpost/snippet}}, the module doesn't yet have the ability to fetch images, and CSS cropping for cover items has some bizarre mobile bugs that need to be worked out. By the time the next issue is out, I expect to have these resolved, as well as some miscellaneous other things.

There was also some pretty interesting stuff squirreled away in those old revisions, which I will go into further depth on in this issue's Apocrypha − for now, suffice it to say that, in the language of the old country, "we nao haz t3h piccys ^____^" [sic].

Reader comments

2023-12-24

Local editor discovered 1,380 lost subheadings in ancient Signpost scrolls. And what he found was shocking.

Contribute —

By JPxG

What it feels like to chew Signpost gum.

Yeah, you got clickbaited. Anyway, here's the deal:

Recently, I wrote and deployed an argosy of scripts (covered in more detail here) to extract 1,380 lost subheadings from the revision history of the Signpost's main page. These are now in their respective articles' header templates (and from there, in the module indices that serve as an article database). While this allows for much broader flexibility in our display methods, that isn't very exciting (or at least not until these display methods are actually put into practice). What is exciting — or at least mildly amusing — is a whirlwind tour of the never-before-seen Signpost greatest-hits compilation.

Basically, the subheadings were introduced in July 2012, as part of the perpetual effort to keep the Signpost modern and bumpin' — they started out as simple excerpts from articles that were shown on the main page. In 2017, they started being incorporated by default into the RSS-description templates — these are invisible, they don't display anywhere on the article page, but they provide metadata in the HTML — and began to assume their current form (brief, couple-sentence-long hooks). Well, I went through and put all of them into RSS-description templates, so now there's 2,464 articles with machine-readable subheadings, out of 5,462 Signpost articles in total — i.e. there are precisely 1,998 articles from before July 2012 that just never had headlines in the first place. Well, whatevs.

Some were missing, some were messed-up, some were typos — honorable mentions to the 2018-02-20 humor column ("headline?") and the 2017-12-18 blog feature ("."). Among the rest, a few extremes stood out, which I had nothing better to do than put in tables for my own amusement — and maybe, dear reader, yours as well.

All-time shortest headlines

10: 2015-09-02/Featured content:

Brawny

9: 2013-10-02/WikiProject report:

U2 Too

8: 2012-10-01/Featured content:

Mooned

7: 2018-10-28/In focus:

Alexa

6: 2015-11-11/Gallery:

Paris

5: 2014-08-27/Traffic report:

Viral

4: 2017-09-06/Humour:

Bots

3: 2013-12-04/Featured content:

F*&!

2: 2019-01-31/Essay:

How

1: 2014-04-09/WikiProject report:

Law

All-time longest headlines

10: 2014-10-22/In the media:

The story of Wikipedia; Wikipedia reanimated and republished; UK government social media rules; death of Italian Wikipedia administrator

9: 2016-08-04/In the media:

Paid editing service announced; Commercial exploitation of free images; Wikipedia as a crystal ball; Librarians to counter systemic bias

8: 2013-12-04/Recent research:

Reciprocity and reputation motivate contributions to Wikipedia; indigenous knowledge and "cultural imperialism"; how PR people see Wikipedia

7: 2015-10-28/Recent research:

Student attitudes towards Wikipedia; Jesus, Napoleon and Obama top "Wikipedia social network"; featured article editing patterns in 12 languages

6: 2017-02-06/Forum:

Productive collaboration around coordinated protest marches; Media and political personalities comment on Wikipedia at its 16th birthday celebration

5: 2019-10-31/Recent research:

Research at Wikimania 2019: More communication doesn't make editors more productive; Tor users doing good work; harmful content rare on English Wikipedia

4: 2015-02-11/Featured content:

A grizzly bear, Operation Mascot, Freedom Planet& Liberty Island, cosmic dust clouds, a cricket five-wicket list, more fine art, & a terrible, terrible opera...

3: 2012-07-30/Recent research:

Conflict dynamics, collaboration and emotions; digitization vs. copyright; WikiProject field notes; quality of medical articles; role of readers; best wiki paper award

2: 2013-09-11/In the media:

Lawyer goes to court to discover Wikipedian's identity; Storming Wikipedia; Wikimedia UK Secretary in conflict-of-interest controversy; Does Wikipedia need a "right to reply" box?

1: 2018-10-28/Humour:

After the apocalypse, when zombies and aliens take over the Earth in a thousand years and dig up Wikipedia's servers but can only find talk pages without their accompanying articles, what will they think??

The longest among these is 206 characters long. I wonder if that fits into the modern display template? Oh, if only a highly stable genius had made it easy to retrieve and format Signpost article metadata... if only you could type something short and memorable like {{Signpost/snippet/autofill|article|2018-10-28|Humour}} and have it automatically render the full snippet template... but alas: there's no sufficiently handsome and wise programmer among us, capable of such heroic deeds.

Jeffrey O. Gustafson – GFDL + CC 2.0 BY-SA

Humour

After the apocalypse, when zombies and aliens take over the Earth in a thousand years and dig up Wikipedia's servers but can only find talk pages without their accompanying articles, what will they think??

Talk page humour: Wikipedia a long history of talk page tomfoolery.

Haha sike.^[1]

All-time shortest subheadings

10: 2016-04-24/Special report:

Update on EranBot, our new copyright violation detection bot

Help wanted!

9: 2023-10-23/News from Diff:

Sawtpedia: Giving a Voice to Wikipedia Using QR Codes

Sounds good!

8: 2023-04-26/From the archives:

April Fools' through the ages, part two

2011 and on.

7: 2018-07-31/Essay:

Wikipedia does not need you

Get over it!

6: 2022-09-30/From the archives:

5, 10, and 15 Years ago: September 2022

Yes, again.

5: 2021-08-29/News and notes:

Enough time left to vote! IP ban

Just do it!

4: 2021-02-28/News and notes:

Maher stepping down

UCC launch.

3: 2021-02-28/News from the WMF:

Who tells your story on Wikipedia

You can!

2: 2021-01-31/Obituary:

Flyer22 Frozen

RIP.

1: 2022-09-30/Gallery:

A Festival Descends on the City: The Edinburgh Fringe, Pt. 2

Lo!

All-time longest subheadings

10: 2013-01-28/News and notes

Khan Academy's Smarthistory and Wikipedia collaborate

To many Wikimedians, the Khan Academy would seem like a close cousin: the academy is a non-profit educational website and a development of the massive open online course concept that has delivered over 227 million lessons in 22 different languages. Its mission is to give "a free, world-class education to anyone, anywhere." This complements Wikipedia's stated goal to "imagine a world in which every single person on the planet is given free access to the sum of all human knowledge", then go and create that world. It should come as no surprise, then, that the highly successful GLAM-Wiki (galleries, libraries, archives, museums) initiative has partnered with the Khan Academy's Smarthistory project to further both its and Wikipedia's goals.

9: 2012-11-26/WikiProject report

Directing Discussion: WikiProject Deletion Sorting

This week, we uncovered WikiProject Deletion Sorting, Wikipedia's most active project by number of edits to all the project's pages. This special project seeks to increase participation in Articles for Deletion nominations by categorizing the AfD discussions by various topic areas that may draw the attention of editors. The project was started in August 2005 with manual processes that are continued today by a bevy of bots, categories, and transclusions. The project took inspiration from WikiProject Stub Sorting and some historical discussions on deletion reform. As the sheer number of AfDs continues to grow, the project is seeking better tools to manage the deletion sorting process and attract editors to comment on these deletion discussions.

8: 2012-10-29/Technology report

Improved video support imminent and Wikidata.org live

The TimedMediaHandler extension (TMH), which brings dramatic improvements to MediaWiki's video handling capabilities, will go live to the English Wikipedia this week following a long and turbulent development, WMF Director of Platform Engineering Rob Lanphier announced on Monday ... Wikidata.org, a new repository designed to host interwiki links, launched this week and will begin accepting links shortly. The site, which is one half of the forthcoming Wikidata trial (the other half being the Wikidata client, which will be deployed to the Hungarian Wikipedia shortly) will also act as a testing area for phase 2 of Wikidata (centralised data storage). The longer term plan is for Wikidata.org to become a "Wikimedia Commons for data" as phases 2 and 3 (dynamic lists) are developed, project managers say.

7: 2012-10-29/News and notes

First chickens come home to roost for FDC funding applicants; WMF board discusses governance issues and scope of programs

The first round of the Wikimedia Foundation's new financial arrangements has proceeded as planned, with the publication of scores and feedback by Funds Dissemination Committee (FDC) staff on applications for funding by 11 entities—10 chapters, independent membership organisations supporting the WMF's mission in different countries, and the foundation itself. The results are preliminary assessments that will soon be put to the FDC's seven voting members and two non-voting board representatives. The FDC in turn will send its recommendations to the board of trustees on 15 November, which will announce its decision by 15 December. Funding applications have been on-wiki since 1 October, and the talk pages of applications were open for community comment and discussion from 2 to 22 October, though apart from queries by FDC staff, there was little activity.

6: 2012-08-13/Op-ed

Small Wikipedias' burden

In a certain way, writing Wikipedia is the same everywhere, in every language or culture. You have to stick to the facts, aiming for the most objective way of describing them, including everything relevant and leaving out all the everyday trivia that is not really necessary to understand the context. You have to use critical thinking, trying to be independent of your own preferences and biases. To some effect, that's all there is to it. Naturally, Wikipedians have their biases, some of which can never be cured. Most Wikipedians tend to like encyclopedias; but millions of people in the world don't share that bias, and we represent them rather poorly. I'm also quite sure that an overwhelming majority of Wikipedia co-authors are literate. Again, that's not true for everyone in this world. Yet we have other, less noticeable but barely less fundamental biases.

5: 2012-10-01/Technology report

WMF and the German chapter face up to Toolserver uncertainty

The Toolserver is an external service hosting the hundreds of webpages and scripts (collectively known as "tools") that assist Wikimedia communities in dozens of mostly menial tasks. Few people think that it has been operating well recently; the problems, which include high database replication lag and periods of total downtime, have caused considerable disruption to the Toolserver's usual functions. Those functions are highly valued by many Wikimedia communities ... In 2011, the Foundation announced the creation of Wikimedia Labs, a much better funded project that among other things aimed to mimic the Toolserver's functionality by mid-2013. At the same time, Erik Möller, the WMF's director of engineering, announced that the Foundation would no longer be supporting the Toolserver financially, but would continue to provide the same in-kind support as it had done previously.

4: 2013-01-28/In the media

Hoaxes draw media attention; Sue Gardner's op-ed; Women of Wikipedia

Hoaxes draw media attention: On New Year's Day, the Daily Dot reported that a "massive Wikipedia hoax" had been exposed after more than five years. The article on the Bicholim conflict had been listed as a "Good Article" for the past half-decade, yet turned out to be an ingenious hoax. Created in July 2007 by User:A-b-a-a-a-a-a-a-b-a, the meticulously detailed piece was approved as a GA in October 2007. A subsequent submission for FA was unsuccessful, but failed to discover that the article's key sources were made up. While the User:A-b-a-a-a-a-a-a-b-a account then stopped editing, the hoax remained listed as a Good Article for five years, receiving in the region of 150 to 250 page views a month in 2012. It was finally nominated for deletion on 29 December 2012 by ShelfSkewed—who had discovered the hoax while doing work on Category:Articles with invalid ISBNs—and deleted the same day.

3: 2012-11-19/News and notes

FDC's financial muscle kicks in

The WMF's Funds Dissemination Committee has published its recommendations for the inaugural round 1 of funding. Requests totalled US$10.4M, nearly all of the FDC's budget for both first and second rounds. The seven-member committee of community volunteers appointed in September advises the WMF board on the distribution of grant funds among applying Wikimedia organizations. The committee, which has a separate operating budget of $276k for salaries and expenses, considered 12 applications for funds, from 11 chapters and from the WMF itself for its non-core activities. The decision-making process included community and FDC staff input after October 1, the closing date for submissions. Taken together, the volunteers decided to endorse an average of 81% of the funding sought—a total of $8.43M, which went to 11 of the 12 applicants. This leaves $2.71M to be distributed in round 2, for which applications are due in little more than three months' time.

2: 2012-10-08/Technology report

The ups and downs of September and October, plus extension code review analysis

The Wikimedia Foundation's engineering report for September 2012 was published this week on the Wikimedia Techblog and on the MediaWiki wiki, giving an overview of all Foundation-sponsored technical operations in that month (as well as brief coverage of progress on Wikimedia Deutschland's Wikidata project, phase 1 of which is edging its way towards its first deployment). Three of the seven headline items in the report have already been covered in the Signpost: problems with the corruption of several Gerrit (code) repositories, the introduction of widespread translation memory across Wikimedia wikis, and the launch of the "Page Curation" tool on the English Wikipedia, with development work on that project now winding down. The report also drew attention to the end of Google Summer of Code 2012, the deployment to the English Wikipedia of a new ePUB (electronic book) export feature, and improvements to the WLM app aimed at more serious photographers.

1: 2012-12-10/News and notes

Wobbly start to ArbCom election, but turnout beats last year's

At the time of writing, this year's election has just closed after a two-week voting period. The eight seats were contested by 21 candidates. Of these, 15 have not been arbitrators (Beeblebrox, Count Iblis, Guerillero, Jc37, Keilana, Ks0stm, Kww, NuclearWarfare, Pgallert, RegentsPark, Richwales, Salvio giuliano, Timotheus Canens, Worm That Turned, and YOLO Swag); four candidates are sitting arbitrators (David Fuchs, Elen of the Roads, Jclemens, and Newyorkbrad); and two have previously served on the committee (Carcharoth and Coren). Four Wikimedia stewards from outside the English Wikipedia stepped forward as election scrutineers: Pundit, from the Polish Wikipedia; Teles, from the Portuguese Wikipedia; Quentinv57, from the French Wikipedia; and Mardetanha, from the Persian Wikipedia. The scrutineers' task is to ensure that the election is free of multiple votes from the same person, to tally the results, and to announce them. The full results are expected to be released within the next few days and will be reported in next week's edition of the Signpost.

The older subheadings tended to be longer (although I trimmed some of the most egregious ones when parsing them in). That last one is a whopping 1,070 characters. Let's see that monster in a snippet:

News and notes

Wobbly start to ArbCom election, but turnout beats last year's

Wobbly start to ArbCom election, but turnout beats last year's: At the time of writing, this year's election has just closed after a two-week voting period. The eight seats were contested by 21 candidates. Of these, 15 have not been arbitrators (Beeblebrox, Count Iblis, Guerillero, Jc37, Keilana, Ks0stm, Kww, NuclearWarfare, Pgallert, RegentsPark, Richwales, Salvio giuliano, Timotheus Canens, Worm That Turned, and YOLO Swag); four candidates are sitting arbitrators (David Fuchs, Elen of the Roads, Jclemens, and Newyorkbrad); and two have previously served on the committee (Carcharoth and Coren). Four Wikimedia stewards from outside the English Wikipedia stepped forward as election scrutineers: Pundit, from the Polish Wikipedia; Teles, from the Portuguese Wikipedia; Quentinv57, from the French Wikipedia; and Mardetanha, from the Persian Wikipedia. The scrutineers' task is to ensure that the election is free of multiple votes from the same person, to tally the results, and to announce them. The full results are expected to be released within the next few days and will be reported in next week's edition of the Signpost.

Boy oh boy!!!!!!! By the way, while we're on the subject, anyone wanna help fix all of this crap?

^ Please note that this autofill template is a grotesque hack which relies in turn on other grotesque hacks, and nobody should use it for anything serious or load-bearing. I use it here only to flex.

Reader comments

2023-12-24

Guess the joke contest

Contribute —

By JPxG

Q: How do you a ?

A: First you ... then you .

Q: But what do you do with the ?

A: Why the hell did you ?

Reader comments

2023-12-24

Bad jokes and other deleted nonsense

Contribute —

By Kjoles

From Karl Thruster Drag Racing Enterprise, by Kjoles, who created it with the summary: "Only need this page for about 30 minutes to demonstrate to a friend how easy it is to create a Wikipedia page. Then it will be deleted."

Team Karl Thruster Drag Racing Enterprise was a short-lived American Top Fuel drag racing team that participated in the 2023 racing season. The team was notable for its brief existence, spanning approximately four minutes.

Team Overview

Founded: 2023
Dissolved: 2023 (approximately 4 minutes later)
Base: United States
Owner: Karl Joles
Team Principal: Karl Joles
Notable Achievements: Shortest-lived team in drag racing history

History

Team Karl Thruster Drag Racing Enterprise was established just before the start of the 2023 Top Fuel drag racing season. The team, owned and operated by Karl Joles, fielded a fleet comprising two funny cars and two dragsters, each adorned with vibrant liveries and the team's iconic logo.

The team's debut was highly anticipated in the drag racing community, with fans and media outlets eager to see the new entrant's performance. However, just four minutes after its official launch, the team announced its dissolution due to financial constraints.

Legacy

Although its existence was fleeting, Team Karl Thruster Drag Racing Enterprise left a lasting impression on the drag racing world. Its quick rise and even quicker fall became a topic of amusement and a cautionary tale about the financial demands of the sport. The team is fondly remembered for its ambitious start and its candid approach to the realities of racing economics.

In Popular Culture

Team Karl Thruster's story quickly became a viral sensation, with memes and jokes circulating on social media. It is often referenced in discussions about the financial challenges in motorsports and is remembered as a humorous footnote in the history of drag racing.

Reader comments

If articles have been updated, you may need to refresh the single-page edition.

Get the latest headlines on your user page — just add {{Signpost-subscription}}.

Home

About

Wikipedia:Wikipedia Signpost/Single/2023-12-24

Did the Chinese Communist Party send astroturfers to sabotage a hacktivist's Wikipedia article?

The first thing to note

Wikipedia article histories are public records

Oh, speaking of astroturfing operations getting busted

The Italian Public Domain wars continue, Wikimedia RU set to dissolve, and a recap of WLM 2023

Court of Audit criticizes Italy’s plan to put public domain behind “pay-wall”

Wikimedia Russia to be dissolved

Wiki Loves Monuments 2023: a recap

Brief notes

Consider the humble fork

Have you gotten $2.75 worth of info from Wikipedia? Consider donating

When you come to a fork in the highway ...

Forked again?

In brief

Arabic Wikipedia blackout; Wikimedians discuss SpongeBob, copyrights, and AI

Is SpongeBob SquarePants now freely licensed?

Wikimedia Commons discusses how to handle AI images

Arabic Wikipedia blacks out main page, logs out all users, publishes statement, and adopts new logo in response to war in Gaza

Liquidation of Wikimedia RU

Background

Dismissal

Meeting of Wikimedia RU and the decision to dissolve the organization

Further comments

Sources

Dark mode is coming

Why dark mode?

What made building dark mode possible?

How? Together. But how exactly, and how to get involved?

"LLMs Know More, Hallucinate Less" with Wikidata

"Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over Wikidata"

Briefly

Other recent publications

"Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata"

"Large language models learn to organize concepts in ways that are strikingly similar to how concepts are organized in [Wikidata]"

"Enhancing Multilingual Language Model with Massive Multilingual Knowledge Triples" from Wikidata

"KGConv, a Conversational Corpus grounded in Wikidata"

"WikiDialog" dataset: "Dialog inpainting" using Wikipedia

Wikipedia-based "Retrieval Augmentation Reduces Hallucination in Conversation" with large language models

Large language models as an alternative to Wikidata?

References

A feast of holidays and carols

Religious carols

Diverse points of view

Lollus lmaois 200C tincture

when the crossword is sus

Across

Down

What's the big deal? I'm an animal!

Hey, what's the big deal? Tell me how to feel (December 3–9, 2023)

Broadcast me a joyful noise unto the times, lord (December 10–16, 2023)

Exclusions

A piccy iz worth OVAR 9000!!!11oneone! wordz ^_^

Local editor discovered 1,380 lost subheadings in ancient Signpost scrolls. And what he found was shocking.

All-time shortest headlines

All-time longest headlines

All-time shortest subheadings

All-time longest subheadings

Guess the joke contest

Bad jokes and other deleted nonsense

Team Overview

History

Legacy

In Popular Culture