Wikipedia talk:Wikipedia Signpost/2021-09-26/Recent research

Discuss this story

Latest comment: 2 years ago9 comments7 people in discussion

"Wikipedia is ranked highly because people are looking for it" - in other news, water is wet 😜. Seriously though, it is interesting (but not surprising) that many "high-level" (if that term makes sense) researchers use Wikipedia in their research. I do understand that the information boxes by different search engines do help make things easier for people (and I myself do skip reading the actual article if I get what I wanted from the box), but somehow, I can't get rid of the feeling that "search engines not having to pay the WMF for information to use in information boxes is ethically wrong" though I know that all information here is under CC. Tube·of·Light 04:12, 27 September 2021 (UTC)Reply
- Sample bias seems an obvious possible flaw, as DuckDuckGo is one of the less used Web browsers. Its users, being few, are surely unusual in some ways, and their attitude towards the usefulness of Wikipedia might be of of those ways. Jim.henderson (talk) 15:48, 27 September 2021 (UTC)Reply
  - True that. I wonder if Google would be willing to modify their search engine to hide the information box for a couple of days and let us know how big the impact was (but then again, there is no way I am going to ask them to do so). And just to let you know, DuckDuckGo is a search engine (a website that gives search results like Google does), not a web browser (a program like Chrome, that lets you access web pages) Tube·of·Light 02:58, 28 September 2021 (UTC)Reply
    - @Tube of Light: Google conducts studies like that all the time, using their search page. They'd be bonkers not to, considering their search-request logs are one of the greatest troves of population behavioral data ever amassed, and it's right there at their fingertips. You rightly frame the $100,000 question, though: would they be inclined to share the results with us (or anyone else)?

They're certainly under no obligation to, of course. Though I know they do either directly conduct, or authorize others to perform, research into the (presumably-anonymized, possibly aggregated) trending for certain search queries. Which is how we (the global "we") know, for instance, that Google can reliably predict (or at least detect) regional flu outbreaks by watching for an uptick in the frequency of certain search terms employed by multiple users in close geographic proximity.

I suspect any A/B testing they do on things like infoboxes is purely marketing-driven, though, and geared only towards determining which search features maximize their ad revenue. (In fact we'd better hope that the same studies that find infoboxes driving clicks through to Wikipedia also determine that they increase search engagement or return visits, because we know that driving traffic here isn't really a profit motive for Google.) -- FeRDNYC (talk) 14:30, 20 October 2021 (UTC)Reply

"Individual-driven versus interaction-driven burstiness in human dynamics: The case of Wikipedia edit history": I have tried so hard to understand this article but it feels like it is missing the part where it actually states how their math addresses their core question. The key sentences seem to be: "The large value of AUC for an article-ego pair implies the dominance of individual-driven burstiness over interaction-driven burstiness and vice versa. By correlating the AUC value with several measures for temporal and editorial correlations, we find the tendency of the AUC values to be larger for weaker (stronger) temporal correlations of the ego (the alters) and/or stronger editorial correlations in the edit sequences." If anyone is able to figure out what this means, I would be grateful to know.~ L 🌸 (talk) 04:33, 27 September 2021 (UTC)Reply
- This might be helpful: receiver operating characteristic. Basically AUC is a measure of how well a mathematical model classifies a group into some yes/no scheme based on some presumed characteristics. MER-C 17:03, 27 September 2021 (UTC)Reply
  - AUC = area under the curve. More (higher values) is better for a receiver operating characteristic, if the classifier is working right. ☆ Bri (talk) 19:20, 27 September 2021 (UTC)Reply
    - Thank you! That helps a little with the first half of the sentence. And I can tell that the bit in parentheses is offering an alternative. So it is something like, "We find a better classifier fit for individual-driven burstiness... for weaker temporal correlations of the ego and/or stronger editorial correlations in the edit sequences." Still not entirely sure what the implications of that are, but willing to let it go... ~ L 🌸 (talk) 04:42, 1 October 2021 (UTC)Reply
"Wikipedia is ranked highly because people are looking for it" - very interesting writeup. I remember the concern that the knowledge boxes would decrease click-through to Wikipedia - good to see a solid A/B test confirming otherwise. Ganesha811 (talk) 15:55, 27 September 2021 (UTC)Reply

Add topic