Open main menu

Software Heritage

Software Heritage is an initiative whose goal is to collect, preserve, and share software code—both freely licensed and not—in a universal software storage archive.[1][2]

Software Heritage
Software Heritage logo
FormationJune 30, 2016; 2 years ago (2016-06-30)
FounderRoberto Di Cosmo
Stefano Zacchiroli
Scientific Advisors
Gérard Berry
Jean-François Abramatic
Serge Abiteboul
2 interns



Although started in 2015, the initiative was worked on as a research project for two years before that time. Software Heritage began public operations on June 30, 2016.[3] It was formed under the auspices of the French research institute, French Institute for Research in Computer Science and Automation (Inria), which hosts the initiative on its servers. The budget Inria is providing for the project is €500,000 over three years.[4]

Software Heritage was founded by computer scientists Roberto Di Cosmo and Stefano Zacchiroli.[5] Its repository holds over 20 million software projects, with an archive of over 2.7 billion unique source files as of July 2016.[3]

Additional sponsors of the Software Heritage initiative include Microsoft and the Royal Netherlands Academy of Arts and Sciences and the Netherlands Organisation for Scientific Research's Data Archiving and Networked Services (DANS).[6][7] Creative Commons, Free Software Foundation, GitHub, Jason Scott, the Linux Foundation, and Microsoft among others have endorsed the project.[8]


Software Heritage's goal is to preserve software in its original source code that is free/open source software (FOSS).[9] The focus of the initiative is to collect, preserve, and share software that is across cultural heritage, industry, education, science, and research communities, with the concern that software that is made up of technical and scientific knowledge will be lost without preservation.[1] The project came about because software code is seen as being even more vulnerable to corruption and obsolescence than typical archival holdings like books and other media like video and film.[6]

The interface is built using open source code, with an initial focus on search, where end-users search by SHA-1 hashes.[6] The Software Heritage initiative is open to scientific researchers, with the idea that it would be a Library of Alexandria type resource for software.[8][10] Additionally, Software Heritage will be an infrastructure resource upon which developers can build applications on top of the archive.[2] Another goal is to get guidance from researchers on what features might be valuable as a way to structure output and collection curation.[6]

Other grass-roots initiatives exist, like archivist Jason Scott's project, the Code Archive (which attempts to archive GitHub),[11][12][13] as well as the Internet Archive's Wayback Machine.[8] Software Heritage is gathering software that has free licenses from sources that include GitHub, Debian package archive, and GNU Project FTP archive and from entities like Gitorious and Google Code, projects that no longer exist.[1]

The archive is structured so knowledge can be preserved, enabling continuous access to digital information, as well as creating a building block for thematic portals and collections of software. The initiative can be used to create better software for the industry, where original software has often been lost. Software Heritage will ensure long-term preservation of software, making software provenance more traceable, integrated, and reusable, with an ability to know licensing (which is not always present) and use constraints, track security vulnerabilities, and assist in the discovery of prior code assets.[1][2]

See alsoEdit


  1. ^ a b c d Brown, Paul (30 June 2016). "Software Heritage: Creating a safe haven for software". Boing Boing. Retrieved 26 July 2016.
  2. ^ a b c Dandrimont, Nicolas (16 July 2016). "Software Heritage: Building the Universal Software Archive by Nicolas Dandrimont at DebConf16" (Video presentation). Software Heritage. Retrieved 26 July 2016.
  3. ^ a b "Collect, organise, preserve and share the Software Heritage of mankind" (PDF). Software Heritage. 30 June 2016. Retrieved 26 July 2016.
  4. ^ Jost, Clémence (1 July 2016). "Open source: lancement de Software Heritage, la plus grande bibliothèque de codes source de la planète". Archimag. Retrieved 27 July 2016.
  5. ^ Moody, Lyn (30 June 2016). "Software Heritage, the "Library of Alexandria of software," launches today". Ars Technica. Retrieved 26 July 2016.
  6. ^ a b c d Willis, Nathan (7 July 2016). "Preserving the global software heritage []". Retrieved 26 July 2016.
  7. ^ "DANS and Inria sign the MoU for Software Heritage — English". Data Archiving and Networked Services (DANS). 5 July 2016. Retrieved 26 July 2016.
  8. ^ a b c Brogan, Jacob (30 June 2016). "Introducing Software Heritage, the Library of Alexandria for Code". Slate. Retrieved 26 July 2016.
  9. ^ Schießle, Björn (4 July 2016). "Software Heritage – Erhalt eines Kulturerbes". (in German). Retrieved 26 July 2016.
  10. ^ Moreira, Enrique (4 July 2016). "Bienvenue dans la bibliothèque d'Alexandrie du logiciel". Les Échos (in French). Retrieved 27 July 2016.
  11. ^ "The Code Archive". The Code Archive. Retrieved 26 July 2016.
  12. ^ Valsorda, Filippo; Aljammaz, Salman (23 July 2016). "The Code Archive - from HOPE XI - NOETHER" (Video presentation). HOPE XI. Retrieved 26 July 2016.
  13. ^ Valsorda, Filippo; Aljammaz, Salman (23 July 2016). "The Code Archive - HOPE XI". The Code Archive. Retrieved 26 July 2016.

External linksEdit