List of Web archiving initiatives
This page contains a list of Web archiving initiatives worldwide. For easier reading, the information is divided in three tables: web archiving initiatives, archived data, and access methods.
Web archiving initiativesEdit
|Name||Country||Creation Year||Technologies||Number of Employees||Comments|
|Australia's Web Archive||Australia||1996||PANDORA Digital Archiving System (PANDAS), NLA Trove, HTTrack.||10||>5||The PANDORA Archive which takes a selective approach is a collaborative program of 11 agencies that provide an estimate average monthly staffing equivalent to around 10 FTE. IT support provided by the National Library of Australia: 0.25 person-month. Whole .au domain harvests have been conducted annually since 2005 in collaboration with the Internet Archive using Heritrix and Wayback.|
|PROMISE project||Belgium||2017||The PROMISE project is a two-year project that will explore the policy-related, legal, technical and scientific issues related to archiving the Belgian web. The aim of the project is to a) identify best practices in the field of web-archiving b) set up a pilot for archiving the Belgian web c) identify use cases for the scientific study of the Belgian web and d) make recommendations for the implementation of a sustainable web-archiving service. The project was launched by the Royal Library of Belgium and the State Archives of Belgium in collaboration with Ghent University (Research Group for Media, Innovation and Communication and Ghent Centre for Digital Humanities), Université de Namur (Research Centre in Information, Law and Society) and University college Bruxelles-Brabant (Unité de Recherche et de Formation en Sciences de l’Information et de la Documentation).|
|PageFreezer.com||Worldwide||2009||PageFreezer's Deep Web Crawler, Hadoop, Cassandra, Elastic Search||40||Enterprise class SaaS solution for website & social media archiving. Provides automatic collection, replay, full-text search and data export of websites, blogs, social media and enterprise collaboration platforms for eDiscovery and regulatory compliance with FDA, FINRA, FSA, SEC, Federal Rules of Evidence, FOIA and records management laws.|
|WebPreserver.com||Worldwide||2015||WebPreserver||Chrome webbrowser plugin and web-based service to collect authenticated, legally-admissible webpages & social media pages for eDiscovery. Web snapshots can be exported in EDRM-XML, WARC, PDF and native HTML. The WebPreserver.com services allow legal teams to organize, tag and collaborate the digital evidence captured with the WebPreserver tool.|
|OoCities — GeoCities Archive / GeoCities Mirror||Germany|
|Web@rchive Austria||Austria||2008||NetarchiveSuite, Heritrix, OpenWayback||1|
|Deutsche Nationalbibliothek||Germany||2012||Tools of oia GmbH||3||The crawling for the selective web archive is done by the German company oia GmbH. The access is restricted to the reading rooms of the German National Library.|
|DILIMAG (Digital Literature Magazines)||Austria||2007||WebCurator||2||One technician, one for collecting and metadata.|
|Bibliothèque et Archives nationales du Québec (BAnQ)||Canada||2012||Heritrix, Wayback.||4||2 librarians, 2 developers|
|Web Archiving Program at Library and Archives Canada||Canada||2005||Heritrix and Wayback.||8||Web archiving in Canada is a legislated activity that is conducted for digital preservation purposes under section 8 (2) of the Library and Archives of Canada Act. Four librarians, two archivists, one technician and one developer work on the program part-time. Web archiving at Library and Archives Canada is also utilized to effect Legal Deposit.|
|Web Information Collection and Preservation - WICP (Chinese Web Archive)||China||2003||Heritrix, Wayback and NutchWAX.|
|Croatian Web Archive (Hrvatski arhiv weba - HAW)||Croatia||2004||Crawl: DAMP software, Heritrix
Access: Wayback, Lucene
|2||2||The Croatian Web Archive (HAW) is a collection of content harvested from the Internet. In 2004 the Archive started as a concept of selective capturing of web resources. Whole .hr domain harvests have been conducted annually since 2011. as well as thematic/event harvesting for events of national interest. The content of the Archive is publicly available via HAW website. (2 librarians full time, 1 librarian part time, NUL), 2 IT professionals part time (SRCE - University of Zagreb, University Computing Centre)|
|Webarchiv (National Library of the Czech Republic)||Czech Republic||2000||Heritrix, Wayback and Seeder.||5||2||Czech web archive (Webarchiv) maintained by National Library of the Czech Republic focuses on archiving the Czech national web. Acquisition policy consists of three lines: selective harvests (collection of resources based on selection criteria), topic collections (focused on significant topics in the area of the Czech web) and comprehensive harvests (automatic harvests of content on the national domain). Staff contains 1 manager, 3.5 curators + 1.5 technical staff.|
|Netarkivet||Denmark||2005||NetarchiveSuite, Heritrix, Free text search using Solr, and Wayback.||~ 23 = 7.5 FTE||~ 23 people involved (developers, web curators, operations staff, project managers, all part time).|
|Estonian Web Archive||Estonia||2010||Heritrix, Squidwarc, Wayback.||4||Since 2006 the Legal Deposit Law allows the National Library of Estonia to collect Estonian websites as legal deposit copies. Harvested content was permitted to make publicly available. The new Legal Deposit Copy Act came into force in 2017 and since then the public access is allowed only in accordance with the sites' copyright owners permission.|
|Finnish Web Archive||Finland||2008||Heritrix, Solr, Wayback.||2||>2||Maintained by the National Library of Finland. Annually, all *.fi domains are harvested, as well as web servers located in Finland. Outside these harvests, the library manually selects relevant websites.|
|BnF - BnF Web Legal Deposit||France||2006||Heritrix, Wayback, NutchWAX, NetarchiveSuite, BCWeb.||10|
|Ina (Institut National de l'Audiovisuel)||France||2009||Crawl: PhagoSite, Crocket based on Firefox, Fantomas based on PhantomJS / Access: Vortex / Search: Dowser based on Elasticsearch||7||Staff of 80 documentalists taking part in nominating sites and QA|
|E-diaspora (Télécom ParisTech, FMSH)||France||2010||Crawl: PhagoSite||1||30 researchers taking part in nominating sites|
|Internet memory Foundation||France, Netherlands||2004||IM large scale crawler, Heritrix, IM Access software. Storage of Web Content: Hbase||Crawls monitoring, developers & infrastructure, manager & administration.|
|Internet Memory Research (ATN service)||France,||2011||IM large scale crawler, Heritrix, IM Access software. Storage of Web Content: Hbase||Crawls monitoring (QA, crawl engineering, project management), developers & infrastructure, manager & administration|
|Bibliotheksservice-Zentrum Baden-Württemberg||Germany||2003||Archive-It||1||Migration to Archive-It starting from 2017.
Data are permanently stored in San Francisco (Archive-It) as well as in storage infrastructure in Baden-Württemberg.
|Web archive of the German Bundestag||Germany||2005|
|National Library of Ireland Web Archive||Ireland||2011||Heritrix||1||The National Library of Ireland selectively archives Irish websites of scholarly, cultural and political importance through its NLI Selective Web Archive.|
|Israel Web Archive||Israel||2011||Heritrix, Web curator tool, Wayback, Rosetta||1||>3||National Library of Israel collecting '.IL' domains, 1 Project Manager part time, 1 Technical Leader full time, 1 librarian part time, 1 IT Infrastructure part time|
|Web Archiving Project (WARP, The National Diet Library)||Japan||2002||Heritrix, Wayback, Solr. Previously: Wget, Accela BizSearch||9||1||Launched in 2002 as a pilot project, WARP (Web Archiving Project) has been in full-scale operation since 2006. Started the web archiving of official institution sites based on the legislation from April 2010.|
|National Library of Korea - OASIS (Online Archiving & Searching Internet Sources)||Korea||2001||Own system based on Oracle DBMS and specialized search engine (IRS) that performs data management and search function.||3||11|
|Bibliothèque nationale de Luxembourg||Luxembourg||2015||Heritrix, Wayback||2||The National library of Luxembourg conducts biannual broad crawls for the .lu domain as well as selective and event-based crawls.
The websites that are harvested in the Web-Archive enrich the patrimonial collections of the National library, which allows for the preservation of digital publications for future generations. The overarching goal is to preserve the Luxembourgish web and having its information permanently kept available.
|Koninklijke Bibliotheek||Netherlands||2006||Heritrix, Web Curator Tool, Wayback, KB e-Depot system||~10||1 crawl engineer, and 9 collection specialists, all part-time (equivalent to around 1.3 full-time). The KB selectively collects Dutch sites of research and cultural value.|
|National Library of Latvia||Latvia||2005||Web Curator Tool and Wayback||1||Currently only storing for preservation, access to public in development (ETA June 2012). The Latvian term for web harvesting is "rasmošana".|
|New Zealand Web Archive||New Zealand||1999||Web Curator Tool, Heritrix, Rosetta||3||>10||Selective harvesting is undertaken by the National Library of New Zealand using Web Curator Tool. Three full time staff harvest websites and a number of rostered staff harvest HTML serials or HTML monographs. National domain harvests have been run biennially in collaboration with the Internet Archive since 2008. Technical services staff respond to service desk requests as they arise. Web archiving issues are handled by staff who work with Rosetta. |
|The National Library of Norway||Norway|
|Arquivo.pt - the Portuguese web-archive||Portugal||2007||In-house development, Heritrix, Wayback, NutchWAX||4||1||Arquivo.pt is a research infrastructure that preserves information gathered from the web since 1996 and provides a public search service over this collection. Arquivo.pt preserves websites in several languages and provides user interfaces in English. In 2017, Arquivo.pt had over 100 000 users and 53% of them were hosted outside of Portugal. The archived data can be automatically processed to perform Big Data research through a distributed processing platform or through Application Programming Interfaces that facilitate the development of added-value applications. The Arquivo.pt team has also contributed with over 40 scientific and technical articles related to web archiving published in open-access.|
|Web archive of Cacak||Serbia||2009||HTTrack||1|
|Web Archive Singapore||Singapore||Wayback, Heritrix, NutchWAX, WERA|
|Digital Resources (University Library in Bratislava)||Slovak Republic||2015||Heritrix 3.2.0, Wayback 2.2.0, Solr 5.2.1, Invenio, Custom Curator Tool||4||1||The University Library in Bratislava (ULIB) performed the first experiments of webharvesting in 2008-2009. In 2015 ULIB carried into operation a platform for web- and e-Born archiving (during the implementation of the national project „Digital resources“ , that was supported by the European regional development fund) - https://www.webdepozit.sk/en/).|
|Slovenian Web Archive||Slovenia||2007||Heritrix, Wayback||1|
|Archivo de la Web Española||Spain||2009||NetarchiveSuite, OpenWayback, Solr||3+supervisor||2||Maintained by the National Library of Spain with the collaboration of regional libraries. Takes a mixed approach of selective and broad harvests. Whole .es domain harvests have been conducted annually since 2009 to 2013 in collaboration with the Internet Archive using Heritrix and Wayback. Since 2014 selective harvests have been made by National Library of Spain, using NetarchiveSuite. National Library = 3 librarians full time, 1 crawl engineer full time and 2 crawl engineers part time. Regional libraries = several librarians part time. Since 26 October 2015 the Legal Deposit Law allows the National Library of Spain and the regional libraries to collect Spanish websites as part of the legal deposit and make them available to the public observing the rules of copyright law. Testing Solr index.|
|PADICAT: The Web Archive of Catalonia||Spain||2005||Heritrix, Wayback, WERA, NutchWAX, Web Curator and CAT.||4||PADICAT is the open access Web Archive of Catalonia, created by the Biblioteca de Catalunya: the public institution responsible for collecting, preserving and distributing the bibliographic heritage of Catalonia, in Spain.|
|Basque Digital Heritage Archive||Spain||2008||Heritrix, Wayback, NutchWAX and Web Curator.||1|
|Sweden (Kulturarw3)||Sweden||1996||Heritrix. Own system for storage, maintenance and access||1.25||Paus in operation November 2009 - May 2011.|
|Aleph Archives||Switzerland, United States||2010||Distributed crawler, ArchiView access plugin, High performance search engine, Near real time indexing, Web Monitoring tools||7||Enterprise-grade Web archiving platform for online heritage (content, brands) preservation and eDiscovery aimed to corporates, institutions, legal and government industries seeking to preserve their Web content regardless of their types (websites, wikis, social media, forums...).|
|Web Archiving Bucket||Switzerland, United States, Canada||2012||WARC Software Development Kit, Cobalt, Holon web server||The “Web Archiving Bucket” is an initiative launched by Aleph Archives, to preserve data and provide libraries and organizations with free-to-use web archiving tools and components.
The Web Archiving Bucket provides set of tools to help archivists and professionals in their daily work.
|Web Archive Switzerland||Switzerland||2008||Heritrix, Wayback, Webrecorder||5||1 crawl engineer, 3 persons for quality assurance (sharing less than 1 full time), 1 coordinator. The curators, who do the selection, are partner libraries all over Switzerland.|
|NTU Web Archiving System, NTUWAS||Taiwan||2007||Lucene||3|
|Web Archive Taiwan||Taiwan||2007|
|The UK Web Archive||United Kingdom||2004||Heritrix, Web Curator Tool, Wayback, Solr for searching.|
|UK Government Web Archive (UKGWA)||United Kingdom||2003||MirrorWeb||4||0||The UK National Archives' UK Government Web Archive (UKGWA) is a fully open web archive. It includes approx. 3,500 central government websites and social media taken at regular intervals (1996 to present). The scope of UKGWA is outlined in the OSP27 document. Technical side of web archiving operation is supplied by MirrorWeb.|
|UK Parliament Web Archive||United Kingdom||2009||MirrorWeb||1||2||The UK Parliament Web Archive captures, preserves, and make accessible UK Parliament information published on the web. The web archive includes websites and social media dating from 2009 to the present. The technical side of web archiving operation is supplied by MirrorWeb.|
|MirrorWeb||Worldwide||2012||Heritrix, PYWB, custom social media archiving tools.||8||MirrorWeb provide website and social media archiving service for regulated industries and public sector. They run two public archives; the UK Government Web Archive and the UK Parliament Web Archive.|
|Internet Archive (provides Archive-it service)||United States||1996||Heritrix, Wayback, NutchWAX and other tools developed by the Internet Archive||150||Internet Archive's Wayback Machine is the largest and oldest web archive in the world, dating back to 1996. Internet Archive also provide various web archiving services, including Archive-IT, Save Page Now, and domain level contract crawls. The Wayback Machine is the publicly available access service to Internet Archive and partners' collections.|
|Reed Tech Archives||United States||2010||TrueArchive™ Technology||Reed Tech Archives provides support for Information Governance, Litigation Protection, Compliance, e-Discovery and Social Media Management. Solution offers both an automated approach or manual capture. For automated website and social media capture, the application captures sites on a recurring frequency and interval. The entire site is completely rebuilt inside the archive to provide the exact user experience afforded on the live web. A user will have the ability to navigate the site from a set of URLs or from within the visible archived site. Generally this approach supports compliance and risk
mitigation as well as the legal function. On-demand manual capture provides clients with the ability to capture a fully functioning page or series of pages from a website or social media property as needed through the Reed Tech Web Preserver plug-in. This approach tends to be used to support the legal, marketing and competitive intelligence functions.
|Stanford University Libraries||United States||2007||Heritrix, HTTrack, Wayback, CDL Web Archiving Service, Internet Archive Archive-It||2||5||Stanford University Libraries has been engaged in web archiving projects since 2007 and started establishing a web archiving program in 2013.|
|Columbia University Libraries||United States||2009||Archive-it service||2||>1||The Columbia University Libraries (CUL) web resources collection program archives selected websites in thematic areas corresponding to existing CUL collection strengths, websites produced by affiliates of Columbia University, and websites from organizations or individuals whose papers or records are held in CUL's physical archives. Began web archiving in 2008.|
|Cornell University Library||United States||2011||Archive-it service||.5||>1|
|North Carolina State Government Web Site Archives||United States||2005||Archive-it service||3|
|Latin American Web Archiving Project||United States||2005||Archive-it service|
|Web Archiving Project for the Pacific Islands||United States||Archive-it service||4|
|Library of Congress Web Archives||United States||2000||Heritrix, Wayback, and the DigiBoard, an in-house curatorial/permissions tool||6||80||The part time workers spend a few hours per month (on average) selecting content for the collections.|
|Harvard Library: the Web Archive Collection Service (WAX)||United States||2006||Heritrix, Wayback, NutchWAX and WAXi, an in-house curatorial interface.||>6||3 part time on IT support. External curators within 3 units but do not know their size.|
|Web Archiving Service from California Digital Library (WAS service)||United States||2005||Heritix, Wayback, NutchWAX||4||>1||The number of hours that curators devote to the service is very variable.|
|Bentley Historical Library (University of Michigan) Web Archives||United States||2000||HTTrack, Teleport Pro, WAS service (2010-)||2|
|University of Texas at San Antonio Web Archives||United States||2009||Archive-It||3||The number of hours varies dependent upon how the crawls are scheduled.|
|qumram||Switzerland||2010||qumram Web Archiving / Web Information Governance Software Suite||Commercial web archiving / web information governance software suite. Provides both remote harvesting as well as transactional web archiving. Allows integrations with any possible web application (WCMS, Portal, Sharepoint, eShop, custom applications) as well as repository (database, file system, electronic archive or records management system, cloud-based solution). Allows capturing and reproduction of public information as well as specific user interactions.|
|SAPERION||Germany||2011||SAPERION ECM Web Content Archive||Commercial enterprise content management suite specializes on regulatory compliance. The product provides both harvesting as well as transactional web archiving based on the integration of qumram's Chronos Web Archiving Software Suite. Web content is just another channel from which content is reaching SAPERION. Others may be scanner, fax, e-mail, mobiles devices, office suites or any other system creating content like ERP systems.|
|Bibliotheca Alexandrina's Internet Archive||Egypt||2002||Heritrix, OpenWayback, WARCrefs||3||Current crawling interests: Egypt beyond January 25, Arab League ccTLDs|
Deduplication: using WARCrefs tool to deduplicate web archive contents in BA cluster
|AUEB Web Archive||Greece||2010||Heritrix, Wayback and NutchWAX.||1||1||This project is part of the function of the University Library.|
|World Bank Web Archives||United States||2007||HTTrack crawler, Oracle RDBMS, Google Search Appliance||0||3|
|Russian National Digital Archive||Russia||2010||wpull, grab-site, HTTrack crawler, ad-hoc scripts developed for social media archiving. Experimenting: Heritrix, Wayback||About 5000 government websites collected (May 2018) using wpull and provided as archives for downloading.|
|Archive Team||Worldwide||2009||wpull, ad hoc scripts||1||~100||Volunteer group. They partially archived GeoCities, Yahoo! Videos, Google Video and others.|
|WikiTeam||Worldwide||2011||ad hoc scripts||0||0||Volunteers group. Over 20,000 wikis preserved.|
|University of North Texas CyberCemetery||United States||1995||Heritrix, Wayback; formerly HTTrack||2||The CyberCemetery is an archive of government websites that have ceased operation (usually websites of defunct government agencies and commissions that have issued a final report). This collection features a variety of topics indicative of the broad nature of government information. In particular, this collection features websites that cover topics supporting the university’s curriculum and particular program strengths.|
|Archive.is||Worldwide||2012||Apache Accumulo, HDFS, ad hoc scripts||1||1||Saves external links from community web-sites (wikis, forums, blogs, ...). Can save snapshots of Web 2.0 pages.|
|Tamiment Library and Robert F. Wagner Labor Archives at New York University||United States||2007||WAS Service||1||1||Archives websites related to New York City and National Labor and Left Movements. Projects include: Alternative Mass Media / News; Anarchism; Animal Rights; Arts and Cultural Left; Civil Rights and Civil Liberties; Communism, Socialism, Trotskyism; Economic and Social Justice (Including Occupy Wall Street); Education and Student Movements; Electoral Politics and Parties / Political Action (U.S. Left); Environmentalism / Green Movement; Feminism and Women's Movements; Guantanamo Bay Detention Camp & War Crimes (U.S.); Housing; Internet/Cyberspace Democracy; Jewish American Progressive & Left Activity; Labor Unions and Organizations (U.S.); Left Academia and Theory, Intellectuals and Other Notables; LGBT Rights; Other Left Activism; Peace Movements; Prisoners Rights and Political Prisoners; Progressive Policy/ Educational Organizations.|
|Preservica||Worldwide||2012||Heritrix, Preservica core product, Wayback||Cloud-based heterogeneous archiving service that allows ingest from multiple sources (including web archiving ingest via Heritrix). Ability to migrate content within WARC files and render in Wayback. Ingest runs as workflow so very little effort needed to run it. Developed, supported and run by Preservica.|
|Central State Electronic Archives of Ukraine||Ukraine||2007||HTTrack, Wget||2||Archives interested in keeping websites and creating the thematic collections of such websites, Is presently in storage the Archives collections of websites which includes the topic of presidential elections in Ukraine from 2010 until today, about the Chornobyl disaster, the local elections, of Euro 2012 in Ukraine, UNESCO World Heritage sites in Ukraine, the 200th anniversary of the birth of Taras Shevchenko.|
|York University Libraries, York University Digital. Library||Canada||2012||Heritrix, Wget, Islandora, OpenWayback||1||0|
|New York Art Resources Consortium (NYARC)||United States||2012||Archive-It service||2||~6||Collaboration among Frick Art Reference Library, Brooklyn Museum Library & Archives, and Museum of Modern Art (MoMA) Library to archive specialist art historical web resources.|
|Netherlands Institute for Sound and Vision (Sound and Vision) web archive||Netherlands||2011||Heritrix, Elasticsearch for full-text index, Drupal for front-end||~7||Sound and Vision has been involved in web archiving projects since 2008, starting with the EU research project LiWA. After a couple of pilots, web archiving projects were scaled up in 2014.|
|Rhizome (organization)||United States||1999||ArtBase, Webrecorder, Oldweb.Today||3||1||Rhizome operates a digital preservation program, led by Dragan Espenschied, which is focused on the creation of free, open source software tools to decentralize web archiving and software preservation practices and ensure access to its collections of born-digital art. Oldweb.Today and Webrecorder are its tools focused on web archiving specifically.|
|University of Texas at Austin Libraries, Human Rights Documentation Initiative||United States||2009||Archive-It service||1||1||The University of Texas Libraries’ Human Rights Documentation Initiative (HRDI) captures the websites of human rights organizations in order to provide secure access to human rights documentation in the event that these often-fragile sites are taken down.|
|Kentucky Department for Libraries and Archives||United States||2009||Archive-it, Wayback||>1||0||This collection includes captures of websites for Kentucky state agencies in the Executive, Legislative, and Judicial Branches. Stand-alone websites for boards, councils, committees, quasi-governmental agencies, and agency programs are also archived. Captures for websites dating 2000-2008 are included in this collection via a transfer to our account from the Wayback Machine.|
|University of California, San Francisco Library||United States||2007||Archive-it, Wayback, CDL WAS Service||>1||0||This collection documents the web presences of UCSF, as well as the larger health science focuses of AIDS history; anesthesiology; biotechnology and biomedical research; tobacco control and regulation; neuroscience; and computational medicine. Staff is one full-time digital archivist with various responsibilities in addition to web-archives.|
|Ivy Plus Libraries||United States||2013||Archive-It||1||1||The Ivy Plus Libraries Web Collecting Program is a collaborative collection development effort to build curated, thematic collections of freely available, but at-risk, web content in order to support research at participating Libraries and beyond. Participating Libraries are: Brown, Chicago, Columbia, Cornell, Dartmouth, Duke, Harvard, Johns Hopkins, Penn, Princeton, and Yale.|
|Archive.St||United States||2017||Archive.st custom programming provided by US Support LLC||>1||0||Archive.st provides free online web archiving in the form of a .JPG and HTML archive.|
|Name||Archived Contents (millions)||Disk Space Occupied (TB)||Archive Format||TLD/Broad Crawls||Selective Crawls (Yes/No)||Comments|
|Australia's Web Archive||6700||260||ARC/WARC||.AU||Y||.AU crawls (2005-2014): 6.3 billion files (237 TB). Selective crawls (1996-2014): 286 million files (13.67 TB). AGWA (2011-2014): 70 million files (6 TB).|
|Our digital island, a Tasmanian Web Archive||0.336||HTTrack||Y||Preserves online contents related to Tasmania. ODI has operated since its inception under the assumption that web sites fall within the definition of ‘Book’ in the Tasmanian Library Act 1984. Thus, no permission to capture from publishers is required.|
|Web@rchive Austria||2748||42||ARC||.AT, .wien, .tirol||Y||A copy of the data is stored in a high security data storage unit.|
|Deutsche Nationalbibliothek||WARC||.DE||Y||Only one experimental TLD crawl.|
|DILIMAG (Digital Literature Magazines)||0.03||0.996||ARC||Project from 2007-03-01 until 2010-12-23. The project DILIMAG for collecting, describing and archiving of digital German literary magazines.|
|Bibliothèque et Archives nationales du Québec (BAnQ)||167||26||ARC/WARC||Y||Harvesting began in 2009. Selective crawls of Quebec websites.|
|Government of Canada Web Archive (GCWA)||775||37||ARC/WARC||.GC.CA||Y||Harvesting at Library and Archives Canada (LAC) began in 2005 and concentrated on collecting the federal government web presence and capturing the federal elections, the Olympics, and Canadian commemorative events. Thematic web collections of Canadiana research interest have been curated since 2009.|
|Web Information Collection and Preservation - WICP (Chinese Web Archive)||.GOV.CN||Y||Harvest of the web pages about the events that have great influence on the society, economy and so on, and the sites in 'gov.cn' domain.|
|Croatian Web Archive (Hrvatski arhiv weba - HAW)||231||13||Mirror, WARC||.HR||Y||Since 2004 selective harvesting over 5000 web resources. Since 2011 annual harvesting of national .hr domain as well as thematic harvesting. All archived content is publicly available via HAW website.|
|Webarchiv (National Library of the Czech Republic)||9412||350||ARC/WARC||.CZ||Y||Harvesting began in 2001.|
|Netarkivet.dk||24000||700||ARC/WARC||.DK||Y||It uses NetarchiveSuite.dk was developed by two Danish libraries and Heritrix.|
|Estonian Web Archive||393||24.9||ARC/WARC||.EE||Y||Archive consists selective crawls since 2010. The first broad crawl was conducted in 2015/2016. Besides TLD .ee Estonia related web content is harvested from other TLD-s like .eu, .org, .com etc.|
|Finnish Web Archive||494||23||.FI, .AX||Y||Also crawls contents hosted on machines physically located in Finland, independently from their domain.|
|BnF - BnF Web Legal Deposit||18800||370||ARC/WARC||.FR + all sites hosted in France||Y||BnF is making full copies of all sites in the .FR TLD, as well as all sites hosted in France, ignoring both the Robots exclusion standard and the licenses of the documents.|
|BnL Web-Archive||543||41||WARC||.LU||Y||The BnL conducts 2 domain crawls per year, as well as event-based and selective crawls.|
|Ina (Institut National de l'Audiovisuel)||66000||870 (see comments)||DAFF||Y||DAFF handles full content deduplication, so the size on disk takes into account compression and deduplication ; the equivalent disk storage in compressed ARC format would be approximately 4.2 PB|
|E-diaspora (Télécom ParisTech, FMSH)||1030||13 (see comments)||DAFF||Y||DAFF handles full content deduplication, so the size on disk takes into account compression and deduplication ; the equivalent disk storage in compressed ARC format would be approximately 51 TB|
|Internet memory Foundation||180||WARC||Can be done by partners||Y||Formerly European Archive. Collaborate with Internet Memory Research, which rovides the ArchiveTheNet Service (ATN Service). Selective crawls (140 TB), Domain crawls (40 TB), expect to grow to 1PB in 2012. New datacenter and a new crawler in 2012.|
|Bibliotheksservice-Zentrum Baden-Württemberg||9||[HTTrack], WARC||Y||Bibliotheksservice-Zentrum Baden-Württemberg is operating the following Web-Archives: |
1- Baden-Württembergisches Online-Archiv (BOA)
3- Literatur im Netz des Deutschen Literaturarchivs Marbach.4- SWBregio
Web Archives will be migrated to Archive-It in 2018.
|Web archive of the German Bundestag||Y||German Federal Parliament. Selective. At regular intervals or at certain events are snapshots (snapshots) of www.bundestag.de and other web presences of the German Bundestag made. These are available in the web archive to date available.|
|Israel Web Archive||ARC/WARC||.IL||Y||.IL crawls (2006-2011): Pilots Crawls (500 GB). Selective crawls (1996, 2011)|
|Japan Web Archiving Project (WARP)||3998||705||WARC||-||Y||15 TB of selective crawls based on permission (2002–2010). Started the web archiving of official institution sites based on the legislation from April 2010. 3,998 million files, 705TB as of Mar.2016.|
|National Library of Korea - OASIS (Online Archiving & Searching Internet Resource)||24||Y||Requires consent before archiving. Targets 56,401 Websites. Web archiving is managed under Digital resource management systems. In 2011 web archiving system will be rebuilt.|
|Koninklijke Bibliotheek||25||ARC||Y||Selective crawls of 12.000 sites (January 2017)|
|New Zealand Web Archive||585||40||ARC/WARC||.NZ||Y||.NZ crawls (2008-2015): 585 million URLS (40TB). Selective crawls 24,500 websites (ca. 9TB). Legal deposit covers born digital material (including websites).|
|The National Library of Norway|
|Arquivo.pt- the Portuguese web-archive||4 867||238||ARC/WARC||.PT, .CV, .AO, .MZ||Y||.PT domain crawls and integration of external collections since 2007 and daily crawls of a selection of online publications of since 2010. Selective crawls related to national events such as elections or international content related to science such as websites about Research & Development projects funded by the European Union.|
|Web archive of Cacak||0.255||0.013||HTTrack||Y||Selective crawls of 130 sites related to the city of Cacak. Collaboration with the Webarchiv team from the National Library of the Czech Republic.|
|Web Archive Singapore||.SG||Y||Selective crawls of 1000 Singapore-related sites, with the written consent of the owners. Whole .SG domain archiving.|
|Digital Resources (University Library in Bratislava)||70||WARC||.SK||Y||Harvesting of the Slovak web started in 2015. Since then ULB has performed three full-domain harvests (harvesting of the national .SK domain), multiple selective crawls and thematic crawls (topic centered and event devoted campaigns).|
|Slovenian Web Archive||30||WARC||Selective crawls since 2007, national domain crawls since 2014.|
|Archivo de la Web Española||2.539||117||WARC||.ES||Y||Domain .ES crawls (2009-2013): 2.421 million files (111 TB) in collaboration with Internet Archive. Selective crawls (2014-2015): 119 mil files (6 TB). About 30 news media sites crawled every day. Not launched publicly yet.|
|PADICAT : The Web Archive of Catalonia||349||13||ARC/WARC||.CAT||Y||In accordance with the general trend, the archive model is a hybrid system consisting: Mass compilation of open-access digital resources published on the Internet (.cat); Systematic archiving of the web site output of Catalan organizations; Fostering of lines of research through themed integration of the digital resources pertaining to specific events in Catalan public life (elections, museums, etc.)|
|Basque Digital Heritage Archive||21||0.8||ARC||Y|
|Sweden (Kulturarw3)||5700||360||Multipart MIME||.se, Swedish .nu and geolocation for other tld's||Y||Bulk crawls approximately twice a year.|
Selective crawls of about 140 newspapers every day.
|Aleph Archives||23||WARC, WARC2, ARC and HTTrack to WARC migration tools||Y||Enterprise-grade Web archiving platform for online heritage (content, brands) preservation and eDiscovery aimed to corporates, institutions, legal and government industries seeking to preserve their web contents regardless of their types (websites, wikis, social media, forums...).|
|Web Archive Switzerland||17||ARC||Y||Mainly selected .ch crawls|
|NTU Web Archiving System, NTUWAS||200||14||Y|
|Web Archive Taiwan|
|The UK Web Archive||20.6||WARC||Y||Selective crawls with previous permission. Now also conducting wholesale UK domain-scale crawls under Non-Print Legal Deposit legislation, enacted April 2013. This content will only be available on premises controlled by one of the six legal deposit libraries. The UKWA is a spin-off from the UK Web Archiving Consortium that ended in 2007.|
|Hanzo Archives||7||WARC||Y||Commercial web archiving services and appliances, for government and corporations whose compliance or legal obligations / needs extend to their websites, intranet, and social media. Many 'dark' archives across Europe and USA.|
|UK Government Web Archive||1,000+||150||ARC
WARC post July 2017
|Between 2003 - 2005 the Internet Archive undertook the technical side of web archiving on behalf of The UK Government Web Archive. Between 2005 - July 2017 the technical side of the web archiving service was contracted out to the Internet Memory Foundation. From July 2017 MirrorWeb took over the contract and moved the entire archive to the cloud. The UK Government Web Archive was part of the UK Web Archiving Consortium from 2004 - 2009.|
|Internet Archive (provides Archive-it service)||150000||5500||Worldwide||Y||Provides the Archive-it service and leads the Archive-access project (Internet Archive ARC access tools). Collection is mirrored at Bibliotheca of Alexandrina in Egypt.|
|Columbia University Libraries Web Resources Collection Program||322||21.1||ARC/WARC||Y||Selective crawls with permission or notification. Thematic collections in: Human rights; Historic preservation and urban planning; New York City religions. Also capture Columbia University web domain.|
|North Carolina State Government Web Site Archives||51.5||3.8||WARC||Y|
|Latin American Web Archiving Project||Y|
|Web Archiving Project for the Pacific Islands||5.5||ARC/WARC||Y||Includes sites of 18 countries.|
|Library of Congress Web Archives||7741||420||ARC/WARC||Y||Formerly MINERVA. Selective crawls with notification and permission; primarily event and thematic collections.|
|Harvard University Library: the Web Archive Collection Service (WAX)||19||0.661||ARC||Y||Selective crawls with no previous authorization.|
|Web Archiving Service from California Digital Library (WAS service)||216||25.2||ARC/WARC||Can be done by partners||Y||Provides Web Archiving Service (WAS) to partners worldwide. Was developed at the California Digital Library.|
|Bentley Historical Library (University of Michigan) Web Archives||34.5||2.6||ARC/WARC||Y||WAS service since 2010.|
|University of Texas at San Antonio Web Archives||26||1.135||ARC/WARC||Y||University administration, faculty and student sites; as well as selective captures on San Antonio and South Texas subject areas, including San Antonio organizations; San Antonio Online Journals and Blogs; Tejano and Conjunto music; Gay, Lesbian, Bisexual, Transgender and Queer Related Web sites in Texas, San Antonio and the Rio Grande Valley; Immigration/Borderlands; Mexican Cooking Blogs; San Antonio Restaurants; Renewable Energy in Texas; Rio Grande Valley Organizations; and Rio Grande Watershed and Texas Water Issues .|
|AUEB Web Archive||3||WARC||aueb.gr||N||The amount of data crawled from the domain aueb.gr ranges between 10GB and 14.9GB . The data is stored on disk compressed and requires between 8.8GB and 9.7GB, resulting in space savings between 12% and 35%. In the case of new crawl, we can only store on disk the Web pages that change since the previous crawl. Consequently, we crawled 13.1GB from the domain aueb.gr, but we only stored on disk 1.6GB, resulting in space savings of 88%.|
|World Bank Web Archives||143 GB||HTTrack||no, so far||Y||450 sites with historical or research value have been harvested since 2007, each archived before being taken offline or before a major upgrade.|
|University of North Texas CyberCemetery||0.887||WARC||.gov||Y|
|Bibliotheca Alexandrina's Internet Archive||80000||1000||ARC/WARC||Egyptian news and politics||Y|
|York University Digital Library||0.435||WARC||yorku.ca + faculty requests||Y|
|Netherlands Institute for Sound and Vision (Sound and Vision) web archive||ARC/WARC||Y||Among other av-heritage, Sound and Vision is tasked with archiving programmes broadcast by Dutch Public Broadcasters. Therefore, an important part of the web archive consists of websites of public broadcaster related to these programmes. Furthermore, websites are archived that do not have a direct link to the collection, but that are of interest in a broader, media-historical way. Examples are websites of commercial broadcasters.|
|Kentucky Department for Libraries and Archives||3||.3007||WARC||Y|
|University of California, San Francisco Library||12.5||.587||ARC/WARC||Y||Websites requested by staff and faculty, and growing list attempting to capture all UCSF websites as comprehensively as possible.|
|Ivy Plus Libraries||1.5||ARC/WARC||Y||Selective crawls with notification. Thematic collections in architecture and contemporary composers.|
|Name||URL history (Yes/No)||Meta-data (catalog/advanced) search (Yes/No)||Full-text search (Yes/No)||Memento Compliance (No/Native/Proxy)||Comments|
|Australia's Web Archive||Y||Y||Y||No||Selected sites are publicly available through a directory structure. Domain harvests are not. The PANDORA Archive is indexed and searchable through the NLA's single search service Trove.|
The Australian Domain Harvests are full-text indexed but are not currently publicly available. The Australian Government Web Archive is searchable by URL and full-text indexes through its portal.
|Our digital island, a Tasmanian Web Archive||Y||Y||N||No||Presents thumbnails generated through Html To Image supplemented in HTTrack. Information is organized in directory: A-Z Subject listing, A-Z Title listing.|
|Web@rchive Austria||Y||N||Y||No||Possible to search online for versions either by URL or in (partial) fulltext. The websites are only accessible on special terminals at the Austrian National Library. Has bookmarking feature which allows to save versions online and recall them at the library webarchive terminals.|
|Deutsche Nationalbibliothek||Y||Y||Y||No||Only accessible in the reading rooms of the German National Library. The metadata is included in the publicly accessible library catalogue.|
|DILIMAG (Digital Literature Magazines)||Y||Y||N||No||Metadata are publicly available, for the archived versions provides free or restricted access depending on the right holders agreement. Full-text search is implemented in the new version (online since February 2015).|
|Bibliothèque et Archives nationales du Québec (BAnQ)||Y||N||N||No||Provides access according to partner policy.|
|Government of Canada Web Archive (GCWA)||Y||Y||Y||Proxy||Library and Archives Canada makes its federal government web archives (materials under Crown Copyright) publicly accessible. Indices are available for discovering Canadian federal web resources alphabetically by authoring organization and by URL. Full text indexing is based on Lucene.|
|Web Information Collection and Preservation - WICP (Chinese Web Archive)||Y||No||Archive content is only available in intranet in National Library of China. Some collections are publicly available, with meta-data search and browsable by collection.|
|Croatian Web Archive (Hrvatski arhiv weba - HAW)||Y||Y||Y||Proxy||Full open access.|
|Webarchiv (National Library of the Czech Republic)||Y||N||N||N||Due to copyright restrictions, only a limited number of archived websites for which agreements were signed with the publishers is available online. For other resources you can find out whether a given website was archived and the number of harvested versions. Unlimited access to all resources in Webarchiv is available from public terminals in the National Library.|
|Netarkivet.dk||Y||N||Y||No||Online access granted only to researchers through a Citrix login to free text search based on Solr and a proxy solution that accesses an archive through the Wayback. It has established a framework for running batch jobs with the possibility of data mining.|
|Estonian Web Archive||Y||Y||N||No||Since 2017 only archived websites of public sector are openly available. Full archive is accessible in-house.|
|Finnish Web Archive||Y||N||30% of material.||No||URL search but onsite access to contents. Full-text search is available to 30% of material.|
|BnF - BnF Web Legal Deposit||Y||N||15% of the collection||No||Accessible to authorized users of the BnF, through the reading rooms of the Research Library located in Paris and Avignon. Wayback interface was translated to French. Full Text search only for a relatively small portion of the collection (15% of 200 TB) indexed by Internet Archive. No current full text search implemented in workflow. Builds special collection galleries based on a selection from the archive on a given topic.|
|Ina (Institut National de l'Audiovisuel)||Y||Y||Y||No||Full text indexing is based on Lucene. To accommodate results from frequent crawls (several crawls per hour for some pages) clustering is operated to handle similar versions of pages|
|E-diaspora (Télécom ParisTech, FMSH)||Y||N||N||No||1381 sites are currently crawled to build an archive on migrants usage of the web, social studies researchers have launched a long run project based on this archive Ina is handling crawls and storage|
|Internet memory Foundation||Y||Y||Y||No||Provides access and search services according to partners policy.|
|Web archive of the German Bundestag||Y||N||N||No||Web archive itself are snapshots of www.bundestag.de and other websites. Navigation is possible by clicking on the years.|
|Israel Web Archive||N||Y||N||No||Still in development and pilots|
|Japan Web Archiving Project (WARP)||Y||Y||Y||No||All the archived websites are available on the premises. 80% of them is also accessible on the Internet with the permission of webmasters.|
|National Library of Korea - OASIS (Online Archiving & Searching Internet Resource)||Y||Y||Y||No||100% of the archive is indexed. Enables search by topic classification (e.g. Religion, Science, Arts). Search available.|
|Koninklijke Bibliotheek||Y||N||N||No||The web archive is accessible on terminals in the KB reading rooms to full members ('onsite').|
|New Zealand Web Archive||Y||Y||N||No||Domain harvests: available to selected staff using Wayback and limited to URL searches. Selective harvests: each website is described in the catalogue (providing subject, author, title and URL searches) and can be viewed by the public via the Internet by clicking on the link to the archived copy. The websites themselves however are not indexed.|
|The National Library of Norway||N||Y||No||Sites are integrated in the Catalog. Left bar enables facet navigation with drill-down.|
|Arquivo.pt- the Portuguese web-archive||Y||Y||Y||Native||A full-text and URL search service is freely available. Archived data can be mined through an Hadoop platform or publicly available Application Programming Interfaces to develop web applications.|
|Web archive of Cacak||N||N||N||No||Plans to develop a search engine in the future. One bad characteristic of HTTrack is that it renames files during the archiving, so the original structure of the website is lost, as well file names.|
|Web Archive Singapore||No|
|Digital Resources (University Library in Bratislava)||Y||Y||N||No||It is possible to find out whether a website was archived and how many harvested versions exist. Due to the copyright restrictions only a limited number of archived websites is publicly available (based on agreements with publishers). The access to other archived resources is available locally in the University Library in Bratislava.|
|Slovenian Web Archive||Y||N||Y||No||The archive of selective crawls is publicly accessible. Use is possible by browsing and full-text search. National domain crawls are not accessible yet but will be in the future.|
|Archivo de la Web Española||Y (Future)||Y (Future)||Y (Future)||No||Plan to provide access on-site in the short-medium term.|
|PADICAT: The Web Archive of Catalonia||Y||Y||Y||No||Full open access.|
|Basque Digital Heritage Archive||Y||Y||Y||No|
|Sweden (Kulturarw3)||Y||N||N||No||Public access through dedicated machines in the library building.|
|Aleph Archives||Y||Y||Y||No||The full text search engine support automatic metadata extraction, and native results deduplication. Also included: antivirus checker (~250mil. pages/day), archives statistics, text summarizer, archives exports (PDF, PNG, TIFF), etc.|
|Web Archive Switzerland||Y||Y||Y||No||Web Archive Switzerland is the collection of the Swiss National Library containing websites with a bearing on Switzerland. Web Archive Switzerland has been integrated in e-Helvetica, the access system of the Swiss National Library, giving access to the entire digital collection. So you can do full text searching of a part of the Web Archive. But the archived versions of websites can only be viewed in the reading rooms of the Swiss National Library and of our partner libraries who help us build the collection of Swiss websites. But you can view the metadata of the archived versions from anywhere.|
|NTU Web Archiving System, NTUWAS||Y||Y||Y||No||Presents page thumbnails, archived pages mapped to geographical locations.|
|Web Archive Taiwan||Y||Y||Y||No|
|PageFreezer||Y||Y||Y||No||Enterprise Class On Demand service to archive and replay websites, blogs, Ajax, Flash, video, audio & social media for litigation protection, eDiscovery and regulatory compliance with FDA, FINRA, FSA, SEC, SOX, Federal Rules of Evidence and records management laws. Used by government agencies and public listed corporations in Pharmaceutical, Food, Finance, Healthcare and Retail industry.|
|The UK Web Archive||Y||Y||N||Native|
|Hanzo Archives||Y||Y||Y||No||Commercial web archiving services and appliances. Access includes full-text search, annotations, redaction, URL/History, archive policy and temporal browsing, and configurable metadata schema for advanced e-discovery applications. Used in government and corporations whose compliance or legal obligations / needs extend to their websites, intranet, and social media. Many 'dark' archives across Europe and USA.|
|UK Government Web Archive (UKGWA)||Y||Y||Y||Native||Full text search is operational on the UK Government Web Archive (UKGWA). Users can browse the collection using a full A-Z list of all sites|
|Internet Archive (provides Archive-it service)||Y||Y||Y||Native||URL history is available for all archived data. Meta-data and full-text search only for selected crawls. Until 2002 had a mining platform for research composed by Alexa Shell Perl Tools|
|Columbia University Libraries Web Resources Collection Program||Y||Y||Y||No||Accessible through Archive-it service.|
|North Carolina State Government Web Site Archives||Y||Y||Y||No||Accessible through Archive-it service.|
|Latin American Web Archiving Project||Y||Y||Y||No||Content can be accessed via full-text search, or by browsing by country or by specialized sample collection.|
|Web Archiving Project for the Pacific Islands||Y||Y||Y||No||Supported by Archive-it service.|
|Library of Congress Web Archives||Y||Y||N||Proxy||Access provided via LCWA. Records in MODS (Metadata Object Descriptive Schema) format.|
|Harvard University Library: the Web Archive Collection Service (WAX)||Y||Y||Y||No|
|Web Archiving Service from California Digital Library (WAS service)||Y||Y||Y||No||Access for private study, scholarship and research. Most archives built with WAS have not yet been published because it is up to the partners to decide if they want to provide access. There are 16 partners using the service and they have created over 80 web archives, only 30 are publicly accessible. NutchWAX performance did not permit full archive search. Upcoming transition to SOLR will permit both full archive and collection-specific full text search.|
|Bentley Historical Library (University of Michigan) Web Archives||Y||Y||Y||No||Powered by the WAS from the California Digital Library. Access is public but usage is restricted for private study, scholarship and research.|
|University of Texas at San Antonio Web Archives||Y||Y||Y||Native||Accessible through Archive-it service and the Texas Archival Repositories Online database|
|AUEB Web Archive||Y||Y||Y||No|
|World Bank Web Archives||Y||Y||Y||No||URL history provided via open access to collection via standard web browser. Full text search is only available within each individual site. Search on metadata is available via advanced search within Web Archives collection.|
|University of North Texas CyberCemetery||N||Y||Y||No|
|Alabama State Government and Politics Web Site and Social Media Archives||United States||2005||Archive-it service||No|
|Tamiment Library and Robert F. Wagner Labor Archives at New York University||Y||Y||Y||No||Access is provided through the WAS service as well as through finding aids that are searchable through NYU's finding aids portal.|
|York University Digital Library||Y||Y||Y|
|Netherlands Institute for Sound and Vision (Sound and Vision) web archive||Y||Y||N||Selected sites for which agreements have been made are publicly available. Full text indexing is done with Elasticsearch, the front-end is built in Drupal.|
|Kentucky Department for Libraries and Archives||Y||Y||Y||No||Full open access|
|University of California, San Francisco Library||Y||Y||Y||Native (through IA)||Both capture and access for archived content are provided by the Archive it service, so all capabilities are same as for Archive-It|
|Ivy Plus Libraries||Y||Y||Y||No||Accessible through Archive-It service.|
- Daniel Gomes; João Miranda; Miguel Costa (25–29 September 2011). "A survey on web archiving initiatives". International Conference on Theory and Practice of Digital Libraries 2011. Springer. Retrieved 23 October 2012.
- "Pandora — Australia's Web Archive". nla.gov.au. Retrieved 2013-11-17.
- "PROMISE project research blog". Retrieved 2017-09-05.
- "Royal Library of Belgium - PROMISE project". www.kbr.be. Retrieved 2017-09-05.
- "State Archives of Belgium". www.arch.be. Retrieved 2017-09-05.
- "Research Group for Media, Innovation and Communication Technologies". www.ugent.be. Retrieved 2017-09-05.
- "Ghent Centre for Digital Humanities". www.ghentcdh.ugent.be. Retrieved 2017-09-05.
- "Research Centre in Information, Law and Society". www.crids.eu/. Retrieved 2017-09-05.
- "Haute-École Bruxelles-Brabant". he2b.be/. Retrieved 2017-09-05.
- "PageFreezer". pagefreezer.com. 2011-01-20. Retrieved 2013-11-17.
- "WebPreserver". webpreserver.com. 2015-03-18. Retrieved 2015-03-18.
- "Oocities.org Cached GeoCities pages"
- "Web@rchive Austria". Onb.ac.at. Retrieved 2016-08-24.
- "Deutsche Nationalbibliothek". dnb.de. Retrieved 2015-09-18.
- "DILIMAG (Digital Literature Magazines". dilimag.literature.at. Retrieved 2013-11-17.
- "Bibliothèque et Archives nationales du Québec (BAnQ)". banq.qc.ca. Retrieved 2013-11-17.
- "Library and Archives Canada - Government of Canada Web Archive". Library and Archives Canada. 2007-10-17. Retrieved 2014-12-16.
- "Library and Archives of Canada Act, S.C. 2004, c.11". Justice Canada. 2004-04-22. Retrieved 2014-12-16.
- "Library and Archives Canada - Home page". Library and Archives Canada. 2014-10-02. Retrieved 2014-12-16.
- "Legal deposit at Library and Archives Canada". Library and Archives Canada. 2014-09-03. Retrieved 2014-12-16.
- "Web Information Collection and Preservation - WICP (Chinese Web Archive)"
- "Croatian Web Archive (Hrvatski arhiv weba - HAW)". Haw.nsk.hr. 2004-10-01. Retrieved 2013-11-17.
- "Webarchiv (National Library of the Czech Republic)". webarchiv.cz. Retrieved 2015-10-30.
- "Estonian Web Archive". National Library of Estonia. 2014-01-09. Retrieved 2014-01-09.
- "Finnish Web Archive". kansalliskirjasto.fi. Retrieved 2013-11-17.
- "Bibliothèque nationale de France - Web Legal Deposit". Bnf.fr. 2010-08-17. Retrieved 2013-11-17.
- "Ina (Institut National de l'Audiovisuel)" (in French). Ina.fr. Retrieved 2013-11-17.
- "E-diasporas (Télécom ParisTech, FMSH)". ediasporas.ticmigrations.fr. Archived from the original on 2013-09-27. Retrieved 2013-11-17.
- "Internet Memory Research (ATN service)". archivethe.net. Retrieved 2013-11-17.
- "Bibliotheksservice-Zentrum Baden-Württemberg". Bsz-bw.de. Retrieved 2013-11-17.
- "Web archive of the German Bundestag". Webarchiv.bundestag.de. Retrieved 2013-11-17.
- "Iceland - VEFSAFN". Vefsafn.is. Retrieved 2013-11-17.
- "Digital Collections". National Library of Ireland Annual Report. 2011.
- "The National Library of Israel". nli.org.il. Retrieved 2013-08-19.
- "Japan Web Archiving Project". da.ndl.go.jp. Retrieved 2013-11-17.
- "Web Archiving Project - CDNLAO 2010 meeting" (PDF). NDL.go.jp. Retrieved 2013-11-17.
- "National Library of Korea - OASIS (Online Archiving & Searching Internet Resource)". Oasis.go.kr. 2013-08-01. Retrieved 2013-11-17.
- "WebART (Web Archive Retrieval Tools)".
- "Latvijas Nacionālā bibliotēka - Rasmošana".
- "New Zealand Web Archive". Natlib.govt.nz. Retrieved 2013-11-17.
- "The National Library of Norway" (in Norwegian). NB.no. Retrieved 2013-11-17.
- "Portuguese Web Archive: search the past". Foundation for National Scientific Computing (FCCN). 13 August 2013. Retrieved 13 August 2013.
- Web archive of Cacak[permanent dead link]. digital.cacak.dis.rs
- "Web Archive Singapore". Was.nl.sg. Retrieved 2013-11-17.
- "Digital Resources (Webdepozit of the University Library in Bratislava)". webdepozit.sk. 2015-08-01. Retrieved 2016-03-08.
- "Slovenian Web Archive". National and University Library of Slovenia. Retrieved 2018-02-02.
- Biblioteca Nacional de España. "Archivo de la web española".
- National Library of Catalonia (16 November 2012). "PADICAT: The Web Archive of Catalonia". National Library of Catalonia. Retrieved 16 November 2012.
- Kai Oswald Seidler. "Basque Digital Heritage Archive (ONDARENET)". euskadi.net. Retrieved 2013-11-17.
- Krister Persson (2008-04-20). "National Library of Sweden - Sweden (Kulturarw3)". Kb.se. Retrieved 2013-11-17.
- AAW Designs. "Aleph Archives". aleph-archives.com. Retrieved 2013-11-17.
- "Web Archiving Bucket". webarchivingbucket.com. Retrieved 2013-11-17.
- "Web Archive Switzerland". E-helvetica.nb.admin.ch. Retrieved 2013-11-17.
- "NTU Web Archiving System, NTUWAS". ntu.edu.tw. Retrieved 2013-11-17.
- "Web Archive Taiwan". ncl.edu.tw. Retrieved 2013-11-17.
- "UK Web Archive". Webarchive.org.uk. 2005-07-07. Retrieved 2013-11-17.
- "UK Government Web Archive (UKGWA)". nationalarchives.gov.uk. Retrieved 2015-10-30.
- "Internet Archive (provides Archive-it service)". archive.org. 2001-03-10. Retrieved 2013-11-17.
- "Reed Archives". ReedArchives.com. Retrieved 2013-11-17.
- "Web Archiving | Stanford University Libraries". Retrieved 2014-03-26.
- "Columbia University Libraries Web Resources Collection Program". columbia.edu. Retrieved 2013-11-17.
- "North Carolina State Government Web Site Archives". ncdcr.gov. Retrieved 2013-11-17.
- "Latin American Web Archiving Project". utexas.edu. Retrieved 2013-11-17.
- "Web Archiving Project for the Pacific Islands". hawaii.edu. Retrieved 2013-11-17.
- "Library of Congress Web Archives". Loc.gov. Retrieved 2013-11-17.
- "Harvard Library: the Web Archive Collection Service (WAX)". harvard.edu. Retrieved 2013-11-17.
- "Web Archiving Service from California Digital Library (WAS service)". cdlib.org. 2013-10-16. Retrieved 2013-11-17.
- "Bentley Historical Library (University of Michigan) Web Archives". umich.edu. Retrieved 2013-11-17.
- "University of Texas at San Antonio Web Archives". Archive-it.org. Retrieved 2013-11-17.
- "Qumram". Qumram.ch. 2011-06-30. Retrieved 2013-11-17.
- SAPERION AG, Berlin. "Saperion ECM Web Content Archive". saperion.com. Retrieved 2013-11-17.
- "AUEB Web Archive". aueb.gr. 2011-10-21. Retrieved 2013-11-17.
- "Archiving the Web sites of Athens University of Economics and Business" (PDF). aueb.gr. Retrieved 2013-11-17.
- "World Bank Web Archives0". worldbank.org. 2012-12-20. Retrieved 2013-11-17.
- "Национальный цифровой архив России".
- "Websites/WikiTeam". Retrieved 2016-02-05.
- Government Documents Department, University of North Texas Libraries, State of Texas (2009-02-02). "University of North Texas CyberCemetery". unt.edu. Retrieved 2013-11-17.
- "［ウェブサービスレビュー］ZIPや画像のダウンロードにも対応した魚拓サービス「Archive today」 - CNET Japan". CNET Japan. Retrieved 2014-09-02.
- "NYU Libraries | Tamiment Library & Robert F. Wagner Labor Archives". Nyu.edu. Retrieved 2013-08-19.
- "How Preservica Works - Preservica". preservica.com. May 12, 2014. Archived from the original on May 12, 2014. Retrieved May 12, 2014.
- Central State Electronic Archives of Ukraine (CSEA Ukraine)
- Information Booklet CSEA Ukraine
- York University Libraries, Toronto, ON (2012-11-01). "York University Digital Library". library.yorku.ca. Retrieved 2014-12-16.
- "Web Archiving - New York Art Resources Consortium". nyarc.org. Retrieved 2014-12-17.
- Karl-Rainer Blumenthal (October 27, 2014). "All together now: NYARC and the National Agenda for Digital Stewardship". Archived from the original on December 17, 2014. Retrieved December 17, 2014.
- "Sound and Vision web archive". beeldengeluid.nl/en. Retrieved 2015-01-21.
- "Living Web Archives". Retrieved 2015-01-21.
- "WEB ARCHIVING AT SOUND AND VISION: OUTCOMES OF OUR NTR PILOT". 2014-08-18. Retrieved 2015-01-21.
- "WSAVE THE DATE: STUDIEDAG WEBARCHIVERING". 2014-08-19. Retrieved 2015-01-21.
- "A Net Art Pioneer Evolves With the Digital Age: Rhizome Turns 20 | ARTnews". www.artnews.com. Retrieved 2016-11-13.
- "University of Texas Libraries Human Rights Documentation Initiative homepage | University of Texas Libraries". lib.utexas.edu. Retrieved 2017-04-06.
- "Kentucky Department for Libraries and Archives | Archive-It".
- "Archive-It - University of California, San Francisco (UCSF)". archive-it.org. Retrieved 2017-07-12.
- "Ivy Plus Libraries Web Resources Collection Program".
- "Our digital island, a Tasmanian Web Archive". tas.gov.au. Retrieved 2014-05-29.
- "LINC Tasmania Online - Home page". Statelibrary.tas.gov.au. 2012-06-26. Retrieved 2012-07-17.
- "Netarkivet.dk". Netarkivet.dk. 2013-10-17. Retrieved 2013-11-17.
- "European Archive". Europarchive.org. Retrieved 2013-11-17.
- "Literatur im Netz des Deutschen Literaturarchivs Marbach". boa-bw.de. Retrieved 2013-11-17.
- Foundation for National Scientific Computing (FCCN) (16 November 2015). "The Portuguese Web Archive in numbers". Foundation for National Scientific Computing (FCCN). Retrieved 29 December 2016.
- "Hanzo Archives". hanzoarchives.com. Retrieved 2013-11-17.
- "UK Government Web Archive". Nationalarchives.gov.uk. Retrieved 2013-11-17.
- "Harvard University Library: the Web Archive Collection Service (WAX)". harvard.edu. Retrieved 2013-11-17.
- "Archive-It: Ivy Plus Libraries".
- "Trove (Pandora Archive search)". nla.gov.au. Retrieved 2013-11-17.
- "Bibliothèque et Archives nationales du Québec (BAnQ)". banq.qc.ca.
- "Web archive of the German Bundestag". bundestag.de. Retrieved 2013-11-17.
- "National Library of Korea - OASIS". go.kr. 2013-08-01. Retrieved 2013-11-17.
- "National Library of Norway Search". nb.no.
- Foundation for National Scientific Computing (FCCN) (May 2012). "Creating a searchable web archive". Foundation for National Scientific Computing (FCCN). Retrieved November 2015. Check date values in:
- "Web Archive Switzerland - e-Helvetica". nb.admin.ch. Retrieved 2013-11-17.
- "UK Government Web Archive Full Text Search". Retrieved 2018-02-08.
- "UK Government Web Archive A-Z list". nationalarchives.gov.uk. Retrieved 2013-11-17.
- "Researcher - Documentation". archive.org.
- "Using Archive.org". archive.org.
- "Archive-it: Columbia University Libraries". archive-it.org.
- "Human Rights Web Archive at Columbia University". columbia.edu.
- "California Digital Library Alternative Mass Media". cdlib.org.
- "Archive-it Partners". archive-it.org
- "Texas Archival Repositories Online". utexas.edu.
- "Alabama Department of Archives and History Digital Collections".
- "Tamiment Library Web Archiving Project" Archived September 25, 2012, at the Wayback Machine.
- "Institution: New York University Libraries / Tamiment Library (Labor & the Left)". cdlib.org. Retrieved 2013-08-19.
- "Search Finding Aids Hosted at New York University". nyu.edu. Retrieved 2013-08-19.