Wikipedia talk:CHECKWIKI/WPC 092 dump

Dump in Wikimedia Commons edit

@NicoV: Hello!

Do you think you can make this dump in Commons as well (or maybe teach me how to do it)? Thank you!Jonteemil (talk) 21:37, 27 May 2021 (UTC)Reply

I have now downloaded WPCleaner but the instructions at Wikipedia:CHECKWIKI/WPC 092 dump don't really make me wiser, unfortunately. The link your.org isn't working. Download the file enwiki-YYYYMMDD-pages-articles.xml.bz2 from the most recent dump. Where is the most recent dump? The two other stages I didn't understand either. Maybe I know to little to try this?Jonteemil (talk) 00:17, 28 May 2021 (UTC)Reply
@Jonteemil: I prefer teaching you how to do it   (I already forgot to do the dump analysis on meta, so adding another wiki to my list isn't a good idea). Feel free to update the instructions so that they are easier to understand.
First step: downloading the dump ! I forgot to mention that the link is to a FTP site, so you need a FTP client to use it : personally, I use FileZilla. Once you run FileZilla, you should run the Site manager (File menu) and create a new site (protocol: FTP, host: ftp.mirror.you.org, encryption: plain FTP, logon type: Anonymous, default remote directory: /pub/wikimedia/dumps). Once you connect to the site, you will see a directory for each wiki : go to commonswiki. In it, you will see directories named with dates (YYYYMMDD) : go to the most recent one (I prefer not to use latest directory). In it, you will see many files : download commonswiki-YYYYMMDD-pages-articles.xml.bz2. Once you've done that, we can go on on the explanations  . --NicoV (Talk on frwiki) 07:00, 28 May 2021 (UTC)Reply
@NicoV: Thanks for your assistance. I have now successfully downloaded the ginormous dump. Now I have a few further questions. Is this correct?
ListCheckWiki enwiki-$-pages-articles.xml.bz2 wiki:Wikipedia:CHECKWIKI/WPC_{0}_dump 92
+
ListCheckWiki commonswiki-$-pages-articles.xml.bz2 wiki:Commons:CHECKWIKI/WPC_{0}_dump 92
I'm especially unsure how the wiki: part should be adapted, if it remains unchanged or if it should be replaced. Also, Run WPCleaner in the command line - what does that mean? Thirdly, would this be a correct adaptation to Commons?
java -Xmx1024m -cp WikipediaCleaner.jar org.wikipediacleaner.Bot en user password DoTasks ListCheckWiki92.txt
+
java -Xmx1024m -cp WikipediaCleaner.jar org.wikipediacleaner.Bot c user password DoTasks ListCheckWiki92.txt
Thanks!Jonteemil (talk) 02:30, 1 June 2021 (UTC)Reply
@Jonteemil:. An explanation about the script ListCheckWiki commonswiki-$-pages-articles.xml.bz2 wiki:Commons:CHECKWIKI/WPC_{0}_dump 92:
  • ListCheckWiki: it's the command for the dump analysis, OK
  • commonswiki-$-pages-articles.xml.bz2: it's the filename ($ is a placeholder for the date), OK
  • wiki:Commons:CHECKWIKI/WPC_{0}_dump: it's where the analysis should be saved. wiki: to save it on wiki (not on local disk). Commons:CHECKWIKI/WPC_{0}_dump to save it on page Commons:CHECKWIKI/WPC_092_dump. Is the target ok for you? You need to create it by hand first, and the page must have the 2 comments (<!-- BOT BEGIN --> and <!-- BOT END -->) which tell WPCleaner where the dump analysis will be saved. You can put whatever text you want outside the comments.
  • 92: it's the error numbers, OK
An explanation about the command line java -Xmx1024m -cp WikipediaCleaner.jar org.wikipediacleaner.Bot c user password DoTasks ListCheckWiki92.txt:
  • java -Xmx1024m -cp WikipediaCleaner.jar org.wikipediacleaner.Bot: run WPCleaner in bot mode, OK
  • c: use commons instead
  • user password: your should use your own user name and password
  • DoTasks ListCheckWiki92.txt: tells WPCleaner to run the first script, OK
Run WPCleaner in the command line means that you need to run the previous command line in a Shell, PowerShell, Command Line or equivalent (see for example in Wikipedia:WPCleaner/Installation#Installation with getdown after the To finish the installation) from the directory where WPCleaner has been installed.
We'll see if it works or if we also need to configure CheckWiki on commons (as I don't see it on the project list).
--NicoV (Talk on frwiki) 06:27, 1 June 2021 (UTC)Reply
@NicoV: Thanks for your reply. When I entered java -Xmx1024m -cp WikipediaCleaner.jar org.wikipediacleaner.Bot commons Jonteemil /password/ DoTasks ListCheckWiki92.txt into the command prompt I got this message: Fel: Kan inte hitta eller kan inte ladda huvudklassen org.wikipediacleaner.BotError: Can not find or can not load main class org.wikipediacleaner.Bot.Jonteemil (talk) 11:52, 1 June 2021 (UTC)Reply
@Jonteemil: Sorry, there was an error in the documentation. You can use either
  • java -Xmx1024m -cp WPCleaner.jar:libs/* org.wikipediacleaner.Bot commons Jonteemil /password/ DoTasks ListCheckWiki92.txt
  • Bot.bat commons Jonteemil /password/ DoTasks ListCheckWiki92.txt on Windows (use Bot.sh on Linux). Each script contains description on how to use it (including memorizing user credentials or adding extra arguments for Java)
--NicoV (Talk on frwiki) 12:44, 1 June 2021 (UTC)Reply
@NicoV: Okay, where should I insert that text? Into the command prompt or in the WPcleaner program? If the latter, where in the WPcleaner should that text go? I don't find any window where it should go. Sorry for me not understanding...Jonteemil (talk) 13:01, 1 June 2021 (UTC)Reply
@Jonteemil: In the command prompt as you did before (when you got the error message about main class). You can use either the first command (which should work on any operating system) or the second command (but it depends on your operating system : Bot.bat for Microsoft Windows, Bot.sh for any Linux, like Ubuntu). --NicoV (Talk on frwiki) 13:22, 1 June 2021 (UTC)Reply