User talk:Emijrp/Wikipedia Archive

Ideas edit

Library of Congress is going to save every public tweet. Why don't they save a copy of Wikipedia? emijrp (talk) 16:47, 10 September 2010 (UTC)Reply

iBiblio edit

I have contacted iBiblio for hosting a copy of the latest dumps, working as a mirror of download.wikimedia.org. No response yet. emijrp (talk) 13:12, 15 November 2010 (UTC)Reply

Their response: Unfortunately, we do not have the resources to provide a mirror of wikipedia. Best of luck!

Who can we contact for hosting a mirror of the XML dumps? edit

We are working in meta:Mirroring Wikimedia project XML dumps. emijrp (talk) 23:36, 10 December 2010 (UTC)Reply

How can i get all the revisions of a language for a duration ? edit

I want all the revisions that are happened from 18-10-2010 tp 26-10-2010 for a particular language. How do I get it? —Preceding unsigned comment added by 207.46.55.31 (talk) 11:53, 29 November 2010 (UTC)Reply

You need to extract that date range from a full dump, stub-meta-history (only metadata) or pages-meta-history (metadata + text). You can use the xmlreader.py from meta:pywikipediabot. emijrp (talk) 15:58, 29 November 2010 (UTC)Reply
Also, you can request a meta:Toolserver account and make a SQL query to the server. emijrp (talk) 15:58, 29 November 2010 (UTC)Reply

Update with latest dump info? edit

If you are around, would you mind updating with more recent dump info? I'd do it but am reluctant to edit another person's user page area. Thanks... --- 85.72.150.131 (talk) 17:04, 19 August 2011 (UTC)Reply

Please, go ahead. Thanks. emijrp (talk) 09:39, 20 August 2011 (UTC)Reply

Offline reader edit

If you are interested, there is another Offline-Reader (with image databases at archive.org): http://xowa.sourceforge.net/ https://sourceforge.net/projects/xowa/ — Preceding unsigned comment added by 188.100.234.211 (talk) 10:56, 13 January 2014 (UTC)Reply

Tarball archive from 2005 edit

@Emijrp and Nemo bis:

User:Emijrp/Wikipedia_Archive#Image_tarballs says: "Another one from 2005 only covers English Wikipedia images." The file description says: "all current images in use on Wikipedia and its related projects". Is it possible to find out that these pictures come from all projects or only from the English Wikipedia? Samat (talk) 23:16, 31 October 2014 (UTC)Reply

@Samat: The 23 MB text file in that item shows only "/en/x/xy" paths, so we can conclude that they are only English Wikipedia images. emijrp (talk) 16:51, 22 October 2015 (UTC)Reply

Offline Wikipedia as epub(s) for e-readers edit

I'm endeavoring to create an offline Wikipedia in the form of epub(s) (more than one file if at least one e-reader turns out not to be able to handle a single 2GB epub) that inexpensive, high-autonomy e-ink readers could read. I intend to download a dump, make necessary transforms using mediawiki-utilities, sort articles by PageRank using for instance https://spark.apache.org/graphx/ , then take the top n articles until they (and the media they link to) reach 2GB. Does that look sound to you? ZPedro (talk) 21:16, 9 December 2016 (UTC)Reply