User:Daniel Mietchen/Talks/Archiving 2014/Web Harvesting and Archiving with and for the Crowd, Including Bots

A 3D model of a fossil jaw - automatically imported from PubMed Central into Wikimedia Commons.

Reuse of multimedia files from PubMed Central on Wikimedia Commons edit

 
Uploads by the Open Access Media Importer to Wikimedia Commons between July 2012 and March 2014

The Open Access Media Importer Bot is a script that crawls the Open Access subset of PubMed Central - a database for biomedical literature - in order to find video and audio files that are licensed compatibly with reuse on Wikimedia platforms. If it finds such materials, it uploads them to Wikimedia Commons, the media repository shared between all Wikipedias and their sister projects. The digital media collection created by the bot now has about 15,000 files and is curated by the volunteer community at Wikimedia Commons. In this talk, I will use the bot as an example to highlight the reusability of digital archives and collections, the importance of open licensing, metadata standards and opportunities for community involvement.

Links edit

Outlook edit

 
This image of Xanthichthys ringens is sourced from an open-access scholarly article licensed for re-use.
How can we make that reusability explicit when citing this source in Wikipedia articles?[1]
For further details, see this Signpost op-ed.

Reference edit

Note the icons and links complementing the bibliographic information.

  1. ^ Williams, J. T.; Carpenter, K. E.; Van Tassell, J. L.; Hoetjes, P.; Toller, W.; Etnoyer, P.; Smith, M. (2010). Gratwicke, Brian (ed.). "Biodiversity Assessment of the Fishes of Saba Bank Atoll, Netherlands Antilles". PLoS ONE. 5 (5): e10676. doi:10.1371/journal.pone.0010676. PMC 2873961. PMID 20505760.{{cite journal}}: CS1 maint: unflagged free DOI (link)   CC0   full text   media   metadata

A million first steps: crowdsourcing the creation of metadata edit

 
The town hall of Jena before 1755
 
Salaga in 1892
 
Traditional clothing
 
Volcanic eruption giving birth to the island of Ferdinandea

In December 2013, the British Library released a set of more than one million images on Flickr. They had extracted them automatically from scans of Public Domain works in their collection. The metadata they had about these images were those pertaining to the scanned work, plus the page number. With the release, they hoped to crowdsource the generation of more specific metadata, describing what the content of the images is rather than their bibliographic location. In this talk, I will review the progress of the initiative over the course of the six months since then, paying special attention to metadata generated through integration of these images into Wikimedia Commons and putting the project into perspective of a range of large-scale releases of media files onto Wikimedia Commons and Wikisource.

Media coverage edit

Similar releases edit

Wikidata: the database anyone can edit edit

 
Wikidata: A free knowledge base that can be read and edited by humans and machines alike
 
Wikidata sample statement: place of birth of Douglas Adams.
 
Wikidata Subclass of (p279) tree for mineral (q7946).
 
Linked Open Data cloud

Wikipedia exists in over 280 languages and has traditionally operated in a way that the content in each of these languages was curated rather independently. In late 2012, Wikidata has been added to the ecosystem of Wikimedia platforms. Much like Wikimedia Commons acts as a common repository for media used across Wikimedia projects, Wikidata acts as such a repository for data. Starting out with data about which Wikipedia articles exist in what languages, the platform is steadily expanding its scope to include other kinds of data and a wider range of properties. While many ontologies have been created with strong community involvement, the Wikidata approach differs in that it is not limited to specific domains and that it allows anyone to join in, expert or not.

Links edit

About edit

This belongs to a talk on May 15, 2014, given as part of Archiving 2014, which took place on May 13 - May 16, 2014, in Berlin.

Abstracts of the originally three proposed talks that have been merged into this one:


Licensing edit

Text displayed on this page is available under a Creative Commons CC0 waiver/ Public Domain dedication. The licensing of embedded media or code or of templates used to display text here may differ, but all are compatible with the Open definition as well as Wikipedia's default license, the Creative Commons Share-Alike License 3.0.

Contact edit