User talk:The Earwig/Archive 15

Hitting the quota on Google searches

Hey Earwig, I noticed we've hit the 10,000 query daily quota on 4 days in the past month and most other days are coming close to the limit now. Usage last year was about half as much. I'm worried people are going to start getting errors regularly if the usage continues to increase. Any thoughts on how to keep the usage from increasing any more? Kaldari (talk) 00:51, 6 March 2019 (UTC)

@Kaldari: No chance we can get Google to raise the limit, right? Scanning through recent logs, of the past 12,000 invocations of the tool that used Google, 5000 were automated and 7000 were manual; of both types, 7500 were on enwiki, 3000 on dewiki, low hundreds on a few other wikis; of just automated requests, 2100 enwiki and 2900 dewiki. Basically, what I'm seeing is that there isn't one clear offender that we can point to regarding a 2x spike. Luke081515's bot, assuming that's the entirety of dewiki automated traffic, seems to account for about 25% of search engine usage, which is high but I'm not necessarily sure I would consider it excessive (<15 reqs/hr)? Luke081515, could we reasonably tone down your bot slightly? I would hate to cause dewiki to miss things because of this, but I'm not sure what else to suggest or who else to ask. — Earwig talk 03:26, 6 March 2019 (UTC)
The Earwig is there anyway you could setup a different instance of this for non -en projects? That would clear up a little room for en and also give the other projects a lot more opportunity for use before hitting the cap. Best, Barkeep49 (talk) 03:37, 6 March 2019 (UTC)
Unfortunately 10,000 is a hard-coded limit set by Google. We could set up an additional API key to get around that limit, as suggested by User:Barkeep49, but it would require some extra work as we would also need to modify the proxy service to handle both keys and distribute the requests. Additionally, our credit arrangement with Google, which allows us to provide the service without paying the usual fees, is based on the usage rate from a few months ago with some padding added. So using more than 10,000 queries per day would probably eat through those credits faster than they are replenished. In the long-term we'll probably have to come up with a better solution, but 10,000 per day is all we can manage at the moment. Kaldari (talk) 03:46, 6 March 2019 (UTC)
This seems like one of those "appeal to Jimbo" things....I've got a good feeling Google queries us more than 10,000x/day.... — xaosflux Talk 04:47, 6 March 2019 (UTC)
I can modify my requests in the evening, so that the requests don't use the search engine every time.... @Kaldari: Another possibility: I asked WMDE some time ago if they can support the extension of the quota this with money, if that's needed for an extension of the limits, basically they did not decline it, but that would first require first, that we know how much the additional costs would be. Is there already a known value or a guess? Best regards, Luke081515 06:58, 6 March 2019 (UTC)
Last month we used about $1200 worth (and that was with the Tool Labs outages). Hopefully we can just stay under the quota for a while. Let me know when the change is made and I'll try to monitor it to see how much it makes a difference. Kaldari (talk) 07:14, 6 March 2019 (UTC)
@Kaldari: I've set &use_engine=false now in my programm, the change is active now, so the requests should not decrease, but the requests will not use the search engine. The bot currently scans every new article in dewiki once. My plan in the future was to see if we can extend the limit, so that I can also scan big additions in existing articles and check them. If you can provide an estimate (the 1200$?) how much that would normally cost per month, I can ask WMDE if they would be willing to support this, or maybe a combination of WMF and WMDE? Best regards, Luke081515 09:16, 6 March 2019 (UTC)
@Luke081515: Strangely there was a significant increase in queries (rather than decrease) at around 9am UTC (when you made the change). The increase remained until about 2pm when we hit the 10,000 quota and then all queries were denied after that. Maybe Earwig could look at the logs to see what happened. Kaldari (talk) 23:18, 6 March 2019 (UTC)
Hm, I guess that it is not my script then, since I took a look just now, the change I've made was correct. Maybe the logs can give us some useful information? My bot uses a unique useragent, it should be easy to find out. Note: I did not throttle my requests, I'm doing the same number of requests as before, but now requesting the tool to don't use the engine. Best regards, Luke081515 23:26, 6 March 2019 (UTC)

() I don't log user agents, but I only see about 300 requests that could have come from you, and they are indeed not using the engine, so that's good. Since 09:00 UTC, I see about 1650 requests using the engine, 1550 of those not using the API, so this is not an issue of automated queries. Of those, 1300 are enwiki. However, a large number of these around 10am appear to be vulnerability testing spam (things like page = etc/password) that all fail early and never hit Google. Of the remaining traffic, which seems to number about 1150 real requests, the majority looks like perfectly normal manual traffic. I don't really have much to go off of here. — Earwig talk 03:58, 7 March 2019 (UTC)

@Kaldari: Hm, according to Earwig it looks like an extension of the limit would be needed. Do we have a phabricator task for this already? And, if additional money is required, it is possible that we get a rough estimate? Then for example I could ask WMDE if they would be willing to help. Best regards, Luke081515 22:07, 7 March 2019 (UTC)
@Luke081515: Well, whatever happened yesterday, it's gone back to normal today. Here's the pricing info for the API: "Custom Search JSON API provides 100 search queries per day for free. If you need more, you may sign up for billing in the API Console. Additional requests cost $5 per 1000 queries, up to 10k queries per day." Normally, I would suggest that we just re-negotiate with Google, but the current agreement we have took over a year to negotiate and finalize. Maybe getting WMDE to cover a new API key would be easier. Kaldari (talk) 02:02, 8 March 2019 (UTC)
I wrote them a mail, and asked them if that would be possible. Best regards, Luke081515 23:10, 10 March 2019 (UTC)
OK, keep us posted. — Earwig [alt] talk 23:49, 10 March 2019 (UTC)
Note that our agreement with Google expires on Jan 10, 2022, so that would be a good time to renegotiate it. Kaldari (talk) 16:36, 12 March 2019 (UTC)
Since I did not receive an answer yet, I will mail them again and ask them for the status when I'm at home. Best regards, Luke081515 11:34, 18 March 2019 (UTC)

They currently have an internal discussion about this. Best regards, Luke081515 23:05, 19 March 2019 (UTC)

Update: It's still in progress there :/. Best regards, Luke081515 16:16, 7 May 2019 (UTC)
Btw: Can you take a look how high the usage currently is? If there is enough free capactity at the moment, I can for example add a random switch, that the bot processes 50% of the request with the search API. Best regards, Luke081515 16:18, 7 May 2019 (UTC)
Kaldari has easier access to this data than I, but I can take a look tomorrow if necessary and see how we've been doing over the past couple weeks on average. — Earwig talk 03:32, 8 May 2019 (UTC)
@Luke081515 and The Earwig: We're currently using about half the quota per day. It normally fluctuates between about 4000 and 6500 queries per day. However, we hit the 10,000 maximum twice last month. Kaldari (talk) 17:06, 8 May 2019 (UTC)
This is good to hear. It sounds like we've returned to mostly-reasonable levels. — Earwig talk 05:32, 9 May 2019 (UTC)
I will try to make ~40% of my requests with search engine. This should be within the limit. If it's too high anyway, please ping me. Best regards, Luke081515 18:53, 10 May 2019 (UTC)

Luke081515, has dewiki considered toolforge:copypatrol? — JJMC89(T·C) 01:51, 8 May 2019 (UTC)

Hm, it's not active there, what's the criteria for getting it active? Best regards, Luke081515 18:53, 10 May 2019 (UTC)
I'm not sure, but Kaldari should know. — JJMC89(T·C) 03:11, 11 May 2019 (UTC)
@Luke081515: You have to get community consensus and then create a Phabricator task (for example, https://phabricator.wikimedia.org/T151609). Kaldari (talk) 05:03, 11 May 2019 (UTC)
May 22, 7pm: WikiWednesday Salon and Skill-Share NYC
 

You are invited to join the Wikimedia NYC community for our monthly "WikiWednesday" evening salon (7-9pm) and knowledge-sharing workshop at Metropolitan New York Library Council in Midtown Manhattan. Is there a project you'd like to share? A question you'd like answered? A Wiki* skill you'd like to learn? Let us know by adding it to the agenda.

Featuring this month a presentation by Interference Archive guests, and a group discussion on the role of activist archives and building wiki content based on ephemeral publications and oral histories.

To close off the night, we'll also have Wikidojo - a group collaborative writing activity / vaudeville!

We will also follow up on plans for recent and upcoming edit-a-thons, museum and library projects, education initiatives, and other outreach activities.

7:00pm - 9:00 pm at Metropolitan New York Library Council (8th floor) at 599 11th Avenue, Manhattan
(note this month we will be meeting in Midtown Manhattan, not at Babycastles)

We especially encourage folks to add your 5-minute lightning talks to our roster, and otherwise join in the "open space" experience! Newcomers are very welcome! Bring your friends and colleagues! --Wikimedia New York City Team 17:11, 16 May 2019 (UTC)

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

The Signpost: 31 May 2019

#invoke:AfC on Template:AFC statistics

For the past few weeks, Template:AFC statistics, which EarwigBot task 2 updates hourly, has rendered with many repetitions of "#invoke:AfC". I assume this is for the same reason that the page is in Category:Pages where template include size is exceeded, and that the root problem is the volume of drafts.

Is there a way of simplifying or restructuring the page so that it can handle the increasing number of drafts, perhaps by splitting pending, declined, and accepted drafts into three separate pages? --Worldbruce (talk) 15:51, 31 May 2019 (UTC)

Hi Worldbruce. You're correct that this is caused by too many drafts on the page. I don't necessarily have an easy solution for it—the usual answer is just to review more drafts to reduce the backlog, but obviously this isn't a quick fix in general. The problem with splitting up is that the pending chart is still the largest by far, so it wouldn't necessarily help very much. Some other ideas I've had are only showing a random subset of drafts at a time and having the full list visible elsewhere (like through a Toolforge page), but I haven't implemented this. — Earwig talk 04:24, 4 June 2019 (UTC)

June 19: WikiWednesday Salon and Skill-Share NYC (stay tuned for Pride on weekend!)

June 19, 7pm: WikiWednesday Salon and Skill-Share NYC
 

You are invited to join the Wikimedia NYC community for our monthly "WikiWednesday" evening salon (7-9pm) and knowledge-sharing workshop at Metropolitan New York Library Council in Midtown Manhattan. Is there a project you'd like to share? A question you'd like answered? A Wiki* skill you'd like to learn? Let us know by adding it to the agenda.

We will also follow up on plans for recent and upcoming edit-a-thons, museum and library projects, education initiatives, and other outreach activities.

7:00pm - 9:00 pm at Metropolitan New York Library Council (8th floor) at 599 11th Avenue, Manhattan
(note this month we will be meeting in Midtown Manhattan, not at Babycastles)

We especially encourage folks to add your 5-minute lightning talks to our roster, and otherwise join in the "open space" experience! Newcomers are very welcome! Bring your friends and colleagues! --Wikimedia New York City Team 05:38, 18 June 2019 (UTC)

Stay tuned for details om next event!
Sunday Jun 23: Wiki Loves Pride @ Metropolitan Museum of Art

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

Copyvio Detector Not Ignoring Fork

Hi! Please add Infogalactic.com to the exclusion list. It's already listed as a fork on WP:Mirrors and forks/GHI, but the detector isn't excluding it: [[1]]. Thank you! Orville1974 (talk) 22:46, 19 June 2019 (UTC)

Thanks for pointing that out, Orville1974. It turns out the tool was completely ignoring mirrors on that page. It should be fixed now. — Earwig talk 03:59, 21 June 2019 (UTC)
June 23, 12:30pm: Wiki Loves Pride @ Metropolitan Museum of Art
 
 

You are invited to join the Wikimedia NYC community for Wiki Loves Pride @ Metropolitan Museum of Art on the Upper East Side. Togethe, we'll create new and expand existing Wikipedia articles on LGBT artists and artworks with LGBT themes in the Met collection!

With refreshments, and a special museum tour in the afternoon!

And there will be a wiki-cake!

Open to everyone at all levels of experience, wiki instructional workshop and one-on-one support will be provided.

See also the global Wiki Loves Pride photo contest, as well as the Met's online LGBT Art Writing Contest, and also the LGBT Health Writing Contest.

12:30pm - 4:30 pm at Uris Center for Education, Metropolitan Museum of Art (81st Street entrance) at 1000 Fifth Avenue, Manhattan
(note this is just south of the main entrance)

This is the fifth annual Wiki Loves Pride edit-a-thon supported by Wikimedia NYC! Newcomers are very welcome! Bring your friends and colleagues! --Wikimedia New York City Team 16:33, 22 June 2019 (UTC)

Stay tuned for details on next event!
Sunday July 14: Great American Wiknic @ Roosevelt Island

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

The June 2019 Signpost is out!

ReportsBot issue

Sorry to bother you again with a Reports bot issue, The Earwig; iirc you were kind enough to fix its last glitch a few months back. I'm not seeing it add any articles to Wikipedia:WikiProject Women in Red/Metrics/June 2019 for the last three days - history. In that period I've added 100+ properly coded women biog items to wikidata which have thus been roundly ignored in the stats. Any help you can give much appreciated. thx --Tagishsimon (talk) 14:46, 29 June 2019 (UTC)

Hi Tagishsimon. This is a bit strange. The bot runs on Wikimedia Toolforge, and it looks like Toolforge has been temporarily banned from querying Wikidata due to excessive usage. My bot only accesses the service once a day, regardless of any errors, so this is presumably the fault of another user. The ban is set to expire tomorrow, so we'll see if it works then. — Earwig talk 17:28, 29 June 2019 (UTC)
Oh dear :). Thanks for looking into it; much appreciated. I'll ping you one way or the other tomorrow (though the ban may be lifted after the ~14.00 hours point at which Reports bot runs, so we might have to wait to Monday to find out. --Tagishsimon (talk) 17:44, 29 June 2019 (UTC)
Nothing doing today (diff). We'll see what tomorrow brings. --Tagishsimon (talk) 17:28, 30 June 2019 (UTC)
Monday: still kaput. (diff) --Tagishsimon (talk) 13:22, 1 July 2019 (UTC)
Tagishsimon, yep, just checked and the ban has apparently been extended further. Clearly whatever caused the ban in the first place hasn't stopped. I'll need to ask around. — Earwig talk 01:48, 2 July 2019 (UTC)
Tagishsimon, good news, I think I've managed to fix it. When accessing Wikidata, we were sending a default user agent—identifying ourselves generically as Python. I was able to avoid the ban by setting a custom one for Reports bot. I'm guessing that Wikidata bans by user agent, and we happened to get caught up in someone else's bad behavior... — Earwig talk 02:30, 2 July 2019 (UTC)
That is good news, The Earwig; excellent sleuthing & thanks for taking the time - we're hugely in your debt. Fingers crossed for tomorrow's run. --Tagishsimon (talk) 02:52, 2 July 2019 (UTC)
This appears to be resolved, but some pointers on the UA issue: T224891 was mentioned in Tech News (2019, week 24). A python-request user agent ban was mentioned on the Wikidata list. — JJMC89(T·C) 05:55, 2 July 2019 (UTC)
Thanks, that absolutely explains it... — Earwig talk 11:45, 2 July 2019 (UTC)
And with that (diff) we can all go back to forgetting about Report Bot's operations until next time; success. Thank you once again, The Earwig. I'll hope not to darken your doors again. --Tagishsimon (talk) 16:07, 2 July 2019 (UTC)

Completely new issue, The Earwig. Feel free to decline. Emijrpbot's owner has retired after getting blocked for a 3RR, and their bot no longer operates. Amongst the ridiculously many things the bot did, there is Wikidata:Requests for permissions/Bot/Emijrpbot 6 which points to code which adds new wikidata items for new en.wiki biographies and/or codes items with human and/or gender. Historically it's done a lot of the heavy lifting in this area, and Women in Red, especially, miss it. Don't suppose there's any possibility of you adopting it, or suggesting some other adoptive user? --Tagishsimon (talk) 10:24, 2 July 2019 (UTC)

That is unfortunate to hear, but I'm afraid I don't have much time to adopt a new bot these days, especially on a wiki I'm not very familiar with. The usual people I would suggest probably do not have much free time either. — Earwig talk 11:45, 2 July 2019 (UTC)
Not surprised. But shy bairns get nowt, as they say :) --Tagishsimon (talk) 11:58, 2 July 2019 (UTC)
July 14, 2-7pm: Annual NYC Wiki-Picnic @ Roosevelt Island
 
 
 

You are invited to join us at the "picnic anyone can edit" in the lovely Southpoint Park on Roosevelt Island, as part of the Great American Wiknic celebrations being held across the USA. Remember it's a wiki-picnic, which means potluck.

This year the Wiknic will double as a "Strategy Salon" (more information at Wiknic page), using open space technology to address major questions facing our social movement.

2–7pm - come by any time!
Our picnicking area is at Southpoint Park, south of the tram and subway, and also just south of the Cornell Tech campus.
Look for us by the Wikipedia / Wikimedia NYC banner!

Celebrate our 13th year of wiki-picnics! We hope to see you there! Newcomers are very welcome! Bring your friends and colleagues! --Wikimedia New York City Team 21:37, 6 July 2019 (UTC)

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

AfC statistics

Hi! For the time being (while we're way past the template expand limit) on {{AFC statistics}}, could each of the tables be put on a page by themselves, so we could at least see some of them? Thanks! Enterprisey (talk!) 08:24, 27 July 2019 (UTC)

Oh man, it's pretty bad, isn't it... okay, Enterprisey, I split it up. We now have five subpages. The main Template:AFC statistics page will be mostly useless until the backlog goes down. — Earwig talk 06:44, 28 July 2019 (UTC)
Nice, thank you! Enterprisey (talk!) 07:18, 28 July 2019 (UTC)
Anyway so that those notices use {{noping}}? Headbomb {t · c · p · b} 07:45, 28 July 2019 (UTC)
My apologies, Headbomb, I suppose this is an issue now because the pages are smaller. I've changed the template to hopefully avoid this now. — Earwig talk 03:23, 29 July 2019 (UTC)

The Signpost: 31 July 2019

Bots Newsletter, August 2019

Bots Newsletter, August 2019
 

Greetings!

Here is the 7th issue of the Bots Newsletter, a lot happened since last year's newsletter! You can subscribe/unsubscribe from future newsletters by adding/removing your name from this list.

Highlights for this newsletter include:

ARBCOM
  • Nothing of note happened. Just like we like it.
BAG

BAG members are expected to be active on Wikipedia to have their finger on the pulse of the community. After two years without any bot-related activity (such as posting on bot-related pages, posting on a bot's talk page, or operating a bot), BAG members will be retired from BAG following a one-week notice. Retired members can re-apply for BAG membership as normal if they wish to rejoin the BAG.

We thank former members for their service and wish Madman a happy retirement. We note that Madman and BU Rob13 were not inactive and could resume their BAG positions if they so wished, should their retirements happens to be temporary.

BOTDICT

Two new entries feature in the bots dictionary

BOTPOL
  • Activity requirements: BAG members now have an activity requirement. The requirements are very light, one only needs to be involved in a bot-related area at some point within the last two years. For purpose of meeting these requirements, discussing a bot-related matter anywhere on Wikipedia counts, as does operating a bot (RFC).
  • Copyvio flag: Bot accounts may be additionally marked by a bureaucrat upon BAG request as being in the "copyviobot" user group on Wikipedia. This flag allows using the API to add metadata to edits for use in the New pages feed (discussion). There is currently 1 bot using this functionality.
  • Mass creation: The restriction on mass-creation (semi-automated or automated) was extended from articles, to all content-pages. There are subtleties, but content here broadly means whatever a reader could land on when browsing the mainspace in normal circumstances (e.g. Mainspace, Books, most Categories, Portals, ...). There is also a warning that WP:MEATBOT still applies in other areas (e.g. Redirects, Wikipedia namespace, Help, maintenance categories, ...) not explicitely covered by WP:MASSCREATION.
BOTREQs and BRFAs

As of writing, we have...

  • 20 active BOTREQs, please help if you can!
  • 14 open BRFAs and 1 BRFA in need of BAG attention (see live status).
  • In 2018, 96 bot task were approved. An AWB search shows approximately 29 were withdrawn/expired, and 6 were denied.
  • Since the start of 2019, 97 bot task were approved. Logs show 15 were withdrawn/expired, and 15 were denied.
  • 10 inactive bots have been deflagged (see discussion). 5 other bots have been deflagged per operator requests or similar (see discussion).
New things
Other discussions

These are some of the discussions that happened / are still happening since the last Bots Newsletter. Many are stale, but some are still active.

See also the latest discussions at the bot noticeboard.

Thank you! edited by: Headbomb 17:24, 7 August 2019 (UTC)


(You can subscribe or unsubscribe from future newsletters by adding or removing your name from this list.)

Barnstar of Awesomeness

  Barnstar of Awesomeness
Thanks Ben for creating and maintaining the Earwig's Copyvio Detector tool. You are awesome! — Diannaa 🍁 (talk) 18:22, 17 August 2019 (UTC)
Thank you! — Earwig talk 18:39, 17 August 2019 (UTC)

August 28: WikiWednesday Salon and Skill-Share NYC (+editathons before and after)

August 28, 7pm: WikiWednesday Salon and Skill-Share NYC
 

You are invited to join the Wikimedia NYC community for our monthly "WikiWednesday" evening salon (7-9pm) and knowledge-sharing workshop at Metropolitan New York Library Council in Midtown Manhattan. Is there a project you'd like to share? A question you'd like answered? A Wiki* skill you'd like to learn? Let us know by adding it to the agenda.

Featuring this month a review of the recent Wikimania 2019 conference in Sweden!

We will also follow up on plans for recent and upcoming edit-a-thons, museum and library projects, education initiatives, and other outreach activities.

7:00pm - 9:00 pm at Metropolitan New York Library Council (8th floor) at 599 11th Avenue, Manhattan
(note this month we will be meeting in Midtown Manhattan, not at Babycastles)

We especially encourage folks to add your 5-minute lightning talks to our roster, and otherwise join in the "open space" experience! Newcomers are very welcome! Bring your friends and colleagues! --Wikimedia New York City Team 17:58, 27 August 2019 (UTC)

Edit-a-thons at Interference Archive and The Met

Also check out these editing events, before and after our WikiWednesday Salon:

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

The Signpost: 30 August 2019

Copyvio down?

Traceback (most recent call last):
  File "/data/project/copyvios/www/python/src/app.py", line 38, in inner
    return func(*args, **kwargs)
  File "/data/project/copyvios/www/python/src/app.py", line 103, in index
    query = do_check()
  File "./copyvios/checker.py", line 41, in do_check
    _get_results(query, follow=not _coerce_bool(query.noredirect))
  File "./copyvios/checker.py", line 52, in _get_results
    page.get()  # Make sure that the page exists before we check it!
  File "/data/project/copyvios/git/earwigbot/earwigbot/wiki/page.py", line 587, in get
    rvprop="content|timestamp", rvslots="main")
  File "/data/project/copyvios/git/earwigbot/earwigbot/wiki/site.py", line 716, in api_query
    return self._api_query(kwargs)
  File "/data/project/copyvios/git/earwigbot/earwigbot/wiki/site.py", line 254, in _api_query
    return self._handle_api_result(response, params, tries, wait, ae_retry)
  File "/data/project/copyvios/git/earwigbot/earwigbot/wiki/site.py", line 295, in _handle_api_result
    raise exceptions.APIError(e)
APIError: API query failed: JSON could not be decoded.

-- WBGconverse 09:31, 14 August 2019 (UTC)

I'm still investigating this. It seems like something genuinely wrong with the API, but I'm not sure what's causing it yet. — Earwig talk 05:15, 15 August 2019 (UTC)
Hi, I also get this error very often in the last months. I hope you can investigate this. Regards Doc Taxon (talk) 15:17, 16 August 2019 (UTC)
It should be fixed now. — Earwig talk 01:45, 17 August 2019 (UTC)
no problems any more since August 18th. Thank you Doc Taxon (talk) 10:01, 3 September 2019 (UTC)

Error in Copyvio Detector?

Hi! The URL https://tools.wmflabs.org/copyvios/?lang=de&project=wikipedia&title=&oldid=191835348&use_engine=0&use_links=0&turnitin=0&action=compare&url=fr.facebook.com%2FEsWirdBesserFilm%2Fabout%2F finds a copyvio of 76,7%, but this URL not: https://tools.wmflabs.org/copyvios/?lang=de&project=wikipedia&title=&oldid=191835348&action=search&use_engine=1&use_links=1&turnitin=1

What's wrong? Doc Taxon (talk) 09:59, 3 September 2019 (UTC)

Hi Doc Taxon. Sometimes a website won't respond to the tool quickly enough or will generate some error instead of a valid webpage. This might cause us to report it as a 0% match. I tried again, bypassing the cache, and this time it found the match. Unfortunately I don't have a great solution for this problem in general. However, I can work on having the tool indicate that a source failed to load, so you can know to try checking it again. — Earwig talk 03:36, 4 September 2019 (UTC)
oh, I forgot bypass cache. Sometimes the tool is running against the timeout. Can you raise the timeout for a little more time, please? Doc Taxon (talk) 07:31, 4 September 2019 (UTC)
The overall timeout is not really something I can control. — Earwig talk 11:21, 4 September 2019 (UTC)
Sept 7, 12:30pm: Met Fashion Edit-a-thon @ Metropolitan Museum of Art
 
 

You are invited to join the Wikimedia NYC community for Met Fashion Edit-a-thon @ Metropolitan Museum of Art on the Upper East Side. Together, we'll expand Wikipedia:WikiProject Fashion topics for basic clothing types that can be illustrated by the Met collection, and also past Costume Institute exhibitions!

It's the last weekend for Camp: Notes on Fashion, and we will have an intro talk to the exhibit by a guest from the Costume Institute, and participants will then be able to visit it on their own. Galleries will be open this evening until 9 pm.

With refreshments, and there will be a wiki-cake!

Open to everyone at all levels of experience, wiki instructional workshop and one-on-one support will be provided.

12:30pm - 4:30 pm at Uris Center for Education, Metropolitan Museum of Art (81st Street entrance) at 1000 Fifth Avenue, Manhattan
(note this is just south of the main entrance)
Galleries will be open this evening until 9 pm, and some wiki-visitors may wish to take this opportunity to see Camp: Notes on Fashion together after the formal event.

Newcomers are very welcome! Bring your friends, colleagues and students! --Wikimedia New York City Team 19:38, 4 September 2019 (UTC)

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

Copyvios Detector TimeOut

Hi! Every query runs in a 504 Gateway Time-out. What's going on? Doc Taxon (talk) 22:41, 10 September 2019 (UTC)

I don't see any errors in the logs and a test query I just tried worked fine. Do you have an example that isn't working? — Earwig talk 03:43, 11 September 2019 (UTC)
yes, it's working fine again. Possibly it was a server or network error for some hours. Doc Taxon (talk)

Archive.org sites timing out on earwig

Archive.org results take forever to load on earwig, and never do. Here's an example; it seems to do this everywhere. Any way this could be fixed? Thanks a ton for making the tool in the first place, 💵Money💵emoji💵💸 20:54, 15 September 2019 (UTC)

Hi Money emoji. I'm not seeing an issue with your link—the page generates in under a second. Is the problem still happening for you? How long ago did you notice it and how consistent did it seem to be? If it wasn't very long, maybe archive.org was doing some maintenance or had temporarily blocked the tool's IP (which is shared with many other tools). — Earwig talk 01:42, 18 September 2019 (UTC)
Yeah, the page now generates instantly for me. I noticed the problem about 2 days ago or so, and it affected every archive.org result. Since it wasn't very long, I'm guessing your hypothesis is correct. Thank you for taking you time to look into this, 💵Money💵emoji💵💸 02:01, 18 September 2019 (UTC)

The Signpost: 30 September 2019

October 23rd, 7pm: WikiWednesday Salon NYC
 

You are invited to join the Wikimedia NYC community for our monthly "WikiWednesday" evening salon (7-9pm) and knowledge-sharing workshop at Metropolitan New York Library Council in Midtown Manhattan. Is there a project you'd like to share? A question you'd like answered? A Wiki* skill you'd like to learn? Let us know by adding it to the agenda.

7:00pm - 9:00 pm at Metropolitan New York Library Council (8th floor) at 599 11th Avenue, Manhattan
(note this month we will be meeting in Midtown Manhattan, not at Babycastles)

We especially encourage folks to add your 5-minute lightning talks to our roster, and otherwise join in the "open space" experience! Newcomers are very welcome! Bring your friends and colleagues! --Wikimedia New York City Team 05:33, 22 October 2019 (UTC)

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

The Signpost: 31 October 2019

User:The_Earwig/copyvios.js - user script

Hi Earwig, I have used Earwig's Copyvio Detector via web. I have just downloaded your user Earwig user script at Here and this is my common.js; however, I could not find where is the tool is placed in the top menu (other user scripts are in "More" drop-down list from the Menu). Kindly advise and thanks in advance.05:52, 1 November 2019 (UTC)

Hi CASSIOPEIA. The script should appear in the sidebar under "tools"—see here for an example. — Earwig talk 15:59, 2 November 2019 (UTC)
Thank you Earwig. Found it. I want to take this opportunity to thank you for creating the script and it is so useful where I check copyvio when I review new article. Thank you. CASSIOPEIA(talk) 00:41, 3 November 2019 (UTC)
Thanks, I appreciate that! — Earwig talk 04:11, 3 November 2019 (UTC)
Saturday November 16, 12:30 pm - 4:30pm: Metropolitan Museum of Art Edit-a-thon
 
 

The Wikipedia Asian Month Edit-a-thon @ The Met will be hosted at the Metropolitan Museum of Art on Saturday November 16, 2019 in the Bonnie Sacerdote Classroom, Ruth and Harold D. Uris Center for Education (81st Street entrance) at The Met Fifth Avenue in New York City.

The museum is excited to work with Wikipedia Asian Month for the potential to seed new articles about Asian artworks, artwork types, and art traditions, from any part of Asia. These can be illustrated with thousands of its recently-released images of public domain artworks available for Wikipedia and Wikimedia Commons from the museum’s collection spanning 5,000 years of art. The event is an opportunity for Wikimedia communities to engage The Met's diverse Asian collections onsite and remotely. Asia Art Archive will host a sister event in Hong Kong next week.

12:30 pm - 4:30 pm in Bonnie Sacerdote Classroom, Uris Center for Education
81st Street entrance, Metropolitan Museum of Art, 1000 Fifth Avenue


And there will be sandwiches and Wiki-Cake!

Thanks, and hope to see you there! --Wikimedia New York City Team 16:46, 14 November 2019 (UTC)

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

ArbCom 2019 election voter message

 Hello! Voting in the 2019 Arbitration Committee elections is now open until 23:59 on Monday, 2 December 2019. All eligible users are allowed to vote. Users with alternate accounts may only vote once.

The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.

If you wish to participate in the 2019 election, please review the candidates and submit your choices on the voting page. If you no longer wish to receive these messages, you may add {{NoACEMM}} to your user talk page. MediaWiki message delivery (talk) 00:11, 19 November 2019 (UTC)

November 20, 7pm: WikiWednesday Salon NYC
 

You are invited to join the Wikimedia NYC community for our monthly "WikiWednesday" evening salon (7-9pm) and knowledge-sharing workshop at Metropolitan New York Library Council in Midtown Manhattan. Is there a project you'd like to share? A question you'd like answered? A Wiki* skill you'd like to learn? Let us know by adding it to the agenda.

7:00pm - 9:00 pm at Metropolitan New York Library Council (8th floor) at 599 11th Avenue, Manhattan

We especially encourage folks to add your 5-minute lightning talks to our roster, and otherwise join in the "open space" experience! Newcomers are very welcome! Bring your friends and colleagues! --Wikimedia New York City Team 16:17, 19 November 2019 (UTC)

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

Template:AfC suspected copyvio

Hey, is {{AfC suspected copyvio}} still a thing or can this be TfD? --Gonnym (talk) 11:13, 19 November 2019 (UTC)

Gonnym, I think it should be fine to delete this template. — Earwig talk 01:39, 20 November 2019 (UTC)

Nomination for deletion of Template:AfC suspected copyvio

 Template:AfC suspected copyvio has been nominated for deletion. You are invited to comment on the discussion at the template's entry on the Templates for discussion page. Gonnym (talk) 12:13, 20 November 2019 (UTC)

The Signpost: 29 November 2019

December 18, 7pm: WikiWednesday Salon NYC
 

You are invited to join the Wikimedia NYC community for our monthly "WikiWednesday" evening salon (7-9pm) and knowledge-sharing workshop at Metropolitan New York Library Council in Midtown Manhattan. Is there a project you'd like to share? A question you'd like answered? A Wiki* skill you'd like to learn? Let us know by adding it to the agenda.

7:00pm - 9:00 pm at Metropolitan New York Library Council (8th floor) at 599 11th Avenue, Manhattan

We especially encourage folks to add your 5-minute lightning talks to our roster, and otherwise join in the "open space" experience! Newcomers are very welcome! Bring your friends and colleagues! --Wikimedia New York City Team 02:49, 17 December 2019 (UTC)

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

"Fadd disambiguation" listed at Redirects for discussion

 

An editor has asked for a discussion to address the redirect Fadd disambiguation. Since you had some involvement with the Fadd disambiguation redirect, you might want to participate in the redirect discussion if you wish to do so. DannyS712 (talk) 01:06, 25 December 2019 (UTC)

The Signpost: 27 December 2019

January 22, 7pm: WikiWednesday Salon NYC
 

You are invited to join the Wikimedia NYC community for our monthly "WikiWednesday" evening salon (7-9pm) and knowledge-sharing workshop at Metropolitan New York Library Council in Midtown Manhattan. Is there a project you'd like to share? A question you'd like answered? A Wiki* skill you'd like to learn? Let us know by adding it to the agenda.

7:00pm - 9:00 pm at Metropolitan New York Library Council (8th floor) at 599 11th Avenue, Manhattan

We especially encourage folks to add your 5-minute lightning talks to our roster, and otherwise join in the "open space" experience! Newcomers are very welcome! Bring your friends and colleagues! --Wikimedia New York City Team 20:08, 17 January 2020 (UTC)

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

Copyvios on tools.wmflabs

Hi, the tool isn't working, I'm getting a 502. What's the problem? --Hispano76 (talk) 19:24, 14 January 2020 (UTC)

Hey Hispano76. I'm not sure what the issue is, but I've restarted it and it seems to be working now. Thanks! — Earwig talk 04:13, 15 January 2020 (UTC)
It stopped working again about an hour ago. I can't get the page to load; it just sits there and spins. Thanks, — Diannaa (talk) 14:47, 19 January 2020 (UTC)
I'm seeing a lot of network errors in the logs, so maybe it's something on Wikimedia Cloud's side? I'm going to restart it, but I have to head outside in 10 minutes, so I can't do any further debugging until tonight. — Earwig talk 14:51, 19 January 2020 (UTC)
Thanks very much. — Diannaa (talk) 14:54, 19 January 2020 (UTC)
Hi Ben; the tool has stalled again. Could you give it a re-start? Thanks, — Diannaa (talk) 13:49, 21 January 2020 (UTC)
Jan 25, 12:30pm: Met 'Understanding America' Edit-a-thon @ Metropolitan Museum of Art
 
 

You are invited to join the Wikimedia NYC community for the Met 'Understanding America' Edit-a-thon @ Metropolitan Museum of Art on the Upper East Side.

Together, we'll expand Wikipedia articles on American history and art, and the understanding that all communities bring to American culture, as reflected in the Met collection up until ca. 1900.

With refreshments, and there will be a wiki-cake!

Open to everyone at all levels of experience, wiki instructional workshop and one-on-one support will be provided.

12:30pm - 4:30 pm at Uris Center for Education, Metropolitan Museum of Art (81st Street entrance) at 1000 Fifth Avenue, Manhattan
(note this is just south of the main entrance)
Galleries will be open this evening until 9 pm, and some wiki-visitors may wish to take this opportunity to see exhibits together after the formal event.

Newcomers are very welcome! Bring your friends, colleagues and students! --Wikimedia New York City Team 21:02, 21 January 2020 (UTC)

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

Copyvio detector

Hey, I was wondering if it would be possible to configure your copyvio detector tool to also run using other search engines? I've been running up against the daily query limit with Google quite a bit lately, and was hoping that we could work around this by using say DuckDuckGo or another service as an alternative, at least when Google has had enough of us. signed, Rosguill talk 23:11, 23 January 2020 (UTC)

Hi Rosguill. That's not a bad idea; in the past, the tool has used other search engines, but it's never supported multiple at once. DuckDuckGo is appealing but I don't think we can use them due to how their API works. Bing would probably be the second-best option after Google, but the free tier is fairly limited—we could only check 100-200 articles a month, so I'd have to see if the WMF would want to help pay for that or make a deal with Microsoft. In the past I've used other engines like Yandex but they're usually not as good. Anyway, I'll add this suggestion to my backlog and see what I can do. — Earwig talk 01:50, 24 January 2020 (UTC)

The Signpost: 27 January 2020

Copyvio detector idea -- rapid grant

Hi. As you know, we've been hitting our cap for Google on the copyvio detector. One thing that you might wish to consider is applying for a rapid grant from the WMF (and then perhaps a longer-term one later). If you apply between the 1st and the 15th of February, you can get a grant of $500-$2000 by the 15th of March. To make that money go further, using Bing would cost $3/1k searches vs Google's 5/1k searches, although you might wish to set a cap if you get that grant. Cheers, and thanks for making this tool, Mdaniels5757 (talk) 15:37, 26 January 2020 (UTC)

  • I was coming here to suggest that (though google is enough better perhaps stick with that), but Earwig suggests in his last comment above that it's not the finances where the problem lies (I imagine WMF consider Earwig (the tool) pretty good value and don't mind covering the cost, it's a fix not finances that are the current issue) Nosebagbear (talk) 22:50, 26 January 2020 (UTC)
Might Google make extra allowances for this specific use – helping to enforce copyrights? It would seem to be something that they would want to help us with. —[AlanM1(talk)]— 01:22, 27 January 2020 (UTC)
I have often thought about that, considering the immense value Wikipedia provides to Google, but past attempts to get Google to support the copyvio tool have not been successful as far as I know. At the most, they might be giving us a discount, and even that I'm not sure of. — Earwig talk 01:48, 27 January 2020 (UTC)
The Earwig, I think the first step is to determine whether we truly are hitting the limit, or if there is a software glitch. If it turns out not to be a software glitch, I think someone ought to reach out to Google. I would hope they would be in complete agreement that using the tool to help remove copyright violations is something they be happy to provide for free if only for the positive press. S Philbrick(Talk) 16:54, 27 January 2020 (UTC)
@S Philbrick: this is the type of win that could actually be achieved by Wikipedia:Appeals to Jimbo - as unlike most complaints it is something that could possibly be actionable coming from him (and potentially wouldn't even come at a financial cost to WMF, so a win-win). Would need to have a very good proposal ready to go to get the most traction. — xaosflux Talk 17:37, 27 January 2020 (UTC)
Xaosflux, Maybe, but can we hold that thought. I see a small update at phabricator. On the one hand, it could hardly be smaller — all it does is retitle the issue, but the revised title suggests it is a software glitch not truly an issue of running into the limit. Let's just see what happens there first (although I do concur that, if necessary, this sounds like a good example of where Appeal to Jimbo might be fruitful.) S Philbrick(Talk) 20:45, 27 January 2020 (UTC)
Sphilbrick, apologies if it was not clear from the thread above, but yes, I am reasonably confident this is a software glitch. After things settle down, I'm excited by the idea of reaching out to Google to see if we can work out a better arrangement than what we currently have, with the WMF's help. — Earwig talk 04:52, 28 January 2020 (UTC)
Since this is such a useful tool, it would be good to have alternatives, even another instance running on another server. Or perhaps there could be some kind of javascript version that runs off the invoker's PC. Then it could use the free Google allowance. Graeme Bartlett (talk) 22:36, 27 January 2020 (UTC)
I like the idea of a local Javascript version, but it would be a bit of work to set up. I've had ideas for stability/usability improvements to the normal tool for a while, but I haven't found the free time to work on it. With another instance running on a different server, we need to be careful to avoid the current single-point-of-failure in the Google API proxy, which ideally means a separate API key with separate billing. — Earwig talk 04:52, 28 January 2020 (UTC)
The problem with clientside is your private information (IP + user agent) are sent to directly to Google. This isn't great for privacy, and indeed it might violate the Cloud Service's Terms of Use. I think the CSP headers will eventually block all requests outside WMF anyway (for on-wiki user scripts, possibly with a way to manually whitelist domains). So you'd have to host the tool elsewhere, I suppose. The issue here with the IP changing (now fixed) is a rare event, but we were/are regularly hitting the API quota. I think we should focus on figuring that out, since from my recollection this was also a rare event up until recently? MusikAnimal talk 07:33, 29 January 2020 (UTC)
February 19, 7pm: WikiWednesday Salon NYC
 

You are invited to join the Wikimedia NYC community for our monthly "WikiWednesday" evening salon (7-9pm) and knowledge-sharing workshop at Metropolitan New York Library Council in Midtown Manhattan. Is there a project you'd like to share? A question you'd like answered? A Wiki* skill you'd like to learn? Let us know by adding it to the agenda.

7:00pm - 9:00 pm at Metropolitan New York Library Council (8th floor) at 599 11th Avenue, Manhattan

We especially encourage folks to add your 5-minute lightning talks to our roster, and otherwise join in the "open space" experience! Newcomers are very welcome! Bring your friends and colleagues! --Wikimedia New York City Team 21:01, 14 February 2020 (UTC)

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

copyvios

Today I was getting 504 Gateway timeout when accessing https://tools.wmflabs.org/copyvios

I reported it on IRC at #wikimedia-cloud where @arturo investigated. He restarted the webservice because he saw in the logs `uWSGI listen queue of socket ":8000" (fd: 7) full !!! (101/100)`

He also saw the following error:

open("/usr/lib/uwsgi/plugins/python3_plugin.so"): No such file or directory [core/utils.c line 3664]
!!! UNABLE to load uWSGI plugin: /usr/lib/uwsgi/plugins/python3_plugin.so: cannot open shared object file: No such file or directory !!!
[uWSGI] getting INI configuration from /data/project/copyvios/www/python/uwsgi.ini

arturo pointed out that metrics may indicate this tool needs higher CPU limits. Curb Safe Charmer (talk) 10:32, 17 February 2020 (UTC)

I came here to say that this very useful tool is down, and noticed that Curb Safe Charmer has taken several steps more than I could. I want to emphasize how important this tool is and express that I hope it'll be brought back soon. hujiTALK 16:39, 17 February 2020 (UTC)

It looks like phabricator T245426 is now tracking this aspect of the copyvios failures. David Brooks (talk) 16:43, 17 February 2020 (UTC)

Copyvio Detector not working

Hi Ben. The tool https://tools.wmflabs.org/copyvios seems to have once again stalled. Could you have a look if you have a minute? Thanks, — Diannaa (talk) 19:14, 24 January 2020 (UTC)

Seems to be back up for the moment, though Google's quota has been exceeded. I'll take a look in a bit to see if there's any explanation for the recent downtimes. — Earwig talk 02:41, 25 January 2020 (UTC)
Google quota was used up quite early the last two days. The tool was working only intermittently this afternoon, and was frequently timing out on the types of pages where it normally reports back promptly. Hope that helps. Thanks, — Diannaa (talk) 02:52, 25 January 2020 (UTC)
Our Google quota is already used up for the day. That's not normal. I think this problem has occurred before, though I couldn't find a Phabricator ticket. Perhaps there's a way to put a throttle on the tool so there's a limit set on any given IP as to how many searches they're allowed in a 24 hour period? — Diannaa (talk) 14:32, 25 January 2020 (UTC)
Sometimes I hit it near the end of the day (EST) if I'm going heavy on copy checks, but I haven't been able to use it once this week. I highly doubt the Blofeld CCI is taking up all that since there's not much to check so not sure why it's been capped. Hopefully it's an easily fixable issue on the back end (luckily for me i have CCIs with cited sources to keep me occupied since that half of earwig still works just fine) Wizardman 00:09, 26 January 2020 (UTC)
Again this morning we have no credits, and it's only 5 AM on the West Coast, where the Google offices are located. Either they are being used up at a frantic pace, or we are not actually able to access our credits, or we are not receiving any credits. — Diannaa (talk) 13:14, 26 January 2020 (UTC)

() Toolforge does not give me access to user IPs for privacy reasons. I can throttle individual users by adding a login requirement using OAuth, but that will break API usage without a more complex setup. One thing I could try is leaving the API public and having a separate quota for it and normal tool usage. Recent logs suggest not many people are using the API with searches enabled, so giving the API the same quota as a single user might be safe.

It's not clear to me exactly when the quota rolls over (some of Google's docs say 12 AM Pacific Time, which would be 8:00 UTC, but the logs suggest it's more like 7:00). Looking at just the logs between 7:00 UTC today and your message at 13:14 when the quota had been exceeded, there were only about 375 queries made. Each query is limited to 8 Google searches, so in the absolute worst case this means 3,000 searches, whereas I expect the quota to be closer to 10,000, so something is definitely wrong. Kaldari, are you able to check this on your end? Can we tell if something is wrong with Google?

Separately it seems we are having an issue with many requests taking a ridiculous amount of time, on the order of five minutes or more, even very simple ones. It is not clear to me why this would happen, because the request rate is not very high and should not exhaust the capacity of the tool's workers. I need to do further investigation to figure this out.

— Earwig talk 17:29, 26 January 2020 (UTC)

Are you sure the quota is 10,000? Looking at your source, it looks like you're using | this API, which says it has a free quota of 100 queries per day (after, it's $5/1000 searches up to 10k searches a day, which may be what you're thinking of). Mdaniels5757 (talk) 17:44, 26 January 2020 (UTC)
Yes, the WMF pays for this; we are not supposed to be using the free tier. It's not clear to me exactly what the budget is, but as you point out, the limit is 10k. — Earwig talk 18:02, 26 January 2020 (UTC)
Ahh, I see what's going on:
{
  "error": {
    "code": 403,
    "message": "The supplied API key is not configured for use from this IP address.",
    "errors": [
      {
        "message": "The supplied API key is not configured for use from this IP address.",
        "domain": "global",
        "reason": "forbidden"
      }
    ],
    "status": "PERMISSION_DENIED"
  }
}

(I should've been reading these errors from Google... my bad.) I'm not sure if this is because Wikimedia Cloud's IP range has changed or what, but I think this needs to be fixed on Kaldari's end. — Earwig talk 18:41, 26 January 2020 (UTC)

I'm currently traveling on vacation. Has anyone filed a Phabricator task about the problem? That would probably be the quickest way to get it resolved. Kaldari (talk) 09:29, 27 January 2020 (UTC)
Looks like phabricator:T243736 was filed. — Earwig talk 12:33, 27 January 2020 (UTC)
@Diannaa and The Earwig: I put in a temporary fix for this yesterday, but we're still trying to figure out what's wrong. Kaldari (talk) 17:44, 29 January 2020 (UTC)
Thanks, I tried it out today and it did work. I am following the ticket as well — Diannaa (talk) 00:16, 30 January 2020 (UTC)
  • I'm not able to access it this morning, though different issue (a 504 Gateway Time-out) - I know there was an issue on Sat, no idea if the same as before or this one. Could someone else confirm/deny if they're getting the same issue? Nosebagbear (talk) 09:20, 12 February 2020 (UTC)
    • It worked for me just now. Graeme Bartlett (talk) 11:39, 12 February 2020 (UTC)
      • But now I get the 504 Gateway Time-out error "openresty/1.15.8.1"
      • It was working great, but again we are getting the 504 Gateway Time-out error.Thanks, — Diannaa (talk) 00:11, 14 February 2020 (UTC)
        • Sorry for the recent trouble–it's been a busy couple weeks so I haven't had a chance to follow up. We're on a new API key and we managed to identify some bots/crawlers that were making undesired requests. They've been blocked, so we should have fewer issues with the quota going forward. The current issue with the timeouts might be due to a change in the memory limit because we just moved to slightly newer infrastructure. I'm going to raise the memory limit and we'll see if this helps. — Earwig talk 02:56, 14 February 2020 (UTC)
          • Today I'm again repeatedly getting the tool failing to load due to 504 Gateway Time-out error.— Diannaa (talk) 13:13, 14 February 2020 (UTC)
          • Me too. It's been non-responsive (except for the occasional breakthrough) for several days. I don't use it for Google searches, only ever the specific URL match (it's for text matching when attributing to a specific PD page). Even the query builder page is refusing to load - is the tool server itself sick? David Brooks (talk) 22:48, 15 February 2020 (UTC)
            • I'll look tonight. — Earwig talk 22:49, 15 February 2020 (UTC)
              • The statistics shown here are deceptive; Even when the tool will load, I am getting a 504 gateway time out error almost every time I try to perform a comparison. — Diannaa (talk) 14:00, 16 February 2020 (UTC)

I have created a new Phabricator task for the gateway time-out issue: phabricator:T243736Diannaa (talk) 12:40, 17 February 2020 (UTC)

It's baaaack... direct URL comparison, at any rate. It does seem a little slower than usual, though. Thanks, whoever fixed it. David Brooks (talk) 20:19, 17 February 2020 (UTC)
And it's gone again. No response at all. David Brooks (talk) 23:36, 17 February 2020 (UTC)

Earwig's Copyvio Detector in Czech

Hello, can I translate Earwig's Copyvio Detector to Czech language? I am translator at Translatewiki.net. --Patriccck (talk) 12:59, 30 January 2020 (UTC)

Iis it possible? Patriccck (talk) 14:56, 21 February 2020 (UTC)
Hi Patriccck, sorry for not responding earlier. Thank you for offering to help. I don't have the tool set up to be translated right now, but it's on my to-do list after I fix the current reliability issues. As soon as it's ready, I'll let you know. — Earwig talk 01:25, 22 February 2020 (UTC)
Thank you. Patriccck (talk) 12:09, 22 February 2020 (UTC)

Earwig question but not about timeouts

I am working on the Dr Blofeld CCI, which is challenging. Many items can be cleared easily, but to say there are many many items to check understates the magnitude. Checking into potential copyright for edits over 10 years old is quite challenging, and the Earwig is indispensable. let me quickly state that I'm subscribed to phrabricator T245426 and this is not about the outages. I have some questions the rise when it appears to be working.

I interacted with Diannas (here, but I will summarize everthing).

This article: Economy of Bács-Kiskun had an initial edit, resulting in oldid =88202525. When I ran Earwig Earwig run

The results suggested the current Wikipedia article Economy of Bács-Kiskun but states that a violation is unlikely and has 0.0% confidence.

That's very puzzling as there is a lot of overlap.

However when Diannaa ran the revision ID and selected the URL comparison using the current URL, she got a 98.4% match.

That makes a lot of sense.

Diannaa's run

My concern is I have cleared a few items because I ran the tool and got a 0.0% confidence, but now I'm concerned that something went wrong. Do you have any insight on why my use of the tool generated a 0.0%? Am I using it wrong?--S Philbrick(Talk) 20:28, 28 February 2020 (UTC)

Hi Sphilbrick. In your initial search, note that under "Checked Sources", all URLs have a confidence of "Excluded". This means they are present in the excluded URL list, so the tool did not actually download those pages nor perform any comparisons against them. Wikipedia itself is in the exclusion list because it's freely licensed, results in many false positives, and users are typically not interested in comparing an article to itself. I recognize your use case here is different from normal. To bypass the exclusion list, you need to do a direct URL comparison, as you noted—the easiest way is to click on the "Compare" link next to "Excluded". — Earwig talk 08:45, 29 February 2020 (UTC)
The Earwig, Thanks, that explains why I didn't get a match, when I thought sure it should match. S Philbrick(Talk) 13:37, 29 February 2020 (UTC)