Software Heritage is a non-profit organization which provides a service for archiving and referencing historical and contemporary software — with a focus on human readable source code. The site was unveiled in 2016 by Inria[1] and is supported by UNESCO.[2][3][4] The project itself is structured as a non‑profit multi‑stakeholder initiative.

Software Heritage
FormationJune 30, 2016; 7 years ago (2016-06-30)
FounderRoberto Di Cosmo,
Stefano Zacchiroli
TypeNon‑profit
HeadquartersInria
Location
Scientific Advisors
Gérard Berry
Jean-François Abramatic
Julia Lawall
Serge Abiteboul
AffiliationsInria
Staff
13
Websitesoftwareheritage.org

Overview edit

The stated mission of Software Heritage is to collect, preserve and share all software that is publicly available in source code form, with the goal of building a common, shared infrastructure at the service of industry, research, culture and society as a whole.[5]

Software source code is collected by crawling code hosting platforms, like GitHub, GitLab.com or Bitbucket, and packages archives, like npm or PyPI, and ingested into a special data structure, a Merkle DAG, that is the core of the archive.[6] Each artifact in the archive is associated with an identifier called a SWHID.[7] In 2023, the expansion of SWHID was changed from Software Heritage identifier to software hash identifier.

In order to increase the chances of preserving the Software Heritage archive over the long term, a mirror program was established in 2018, joined by ENEA[8] and FossID[9] as of October 2020.

History edit

Development of Software Heritage began at Inria under the direction of computer scientists Roberto Di Cosmo and Stefano Zacchiroli in early 2015,[10] and the project was officially announced to the public on June 30, 2016.[1][11]

In 2017 Inria signed an agreement with UNESCO for the long-term preservation of software source code and for making it widely available, in particular through the Software Heritage initiative.[12]

In June 2018, the Software Heritage Archive [6] was opened at UNESCO headquarters.[2]

On July 4, 2018, Software Heritage was included in the French National Plan for Open Science.[13]

In October 2018 the strategy and vision underlying the mission of Software Heritage were published in Communications of the ACM.[5]

In November 2018, a group of forty international experts met at the invitation of Inria and UNESCO,[14] which led to the publication in February 2019 of Paris Call: Software Source Code as Heritage for Sustainable Development.[15]

In November 2019, Inria signed an agreement with GitHub to improve the archival process for GitHub-hosted projects in the Software Heritage archive.[16]

As of October 2020, Software Heritage’s repository held over 143 million software projects in an archive of over 9.1 billion unique source files.[6]

Funding edit

Software Heritage is a non-profit organization, funded largely from donations from supporting sponsors, that include private companies, public bodies and academic institutions.[17]

Software Heritage also seeks support for funding third parties interested in contributing to its mission. A grant from NLNet[18] funded the work of Octobus[19] and Tweag[20] that led to rescuing 250.000 Mercurial repositories phased out from Bitbucket.[21]

A grant from the Alfred P. Sloan Foundation funds experts to develop new connectors for expanding coverage of the Software Heritage Archive [22]

Development and community edit

The Software Heritage infrastructure is built transparently and collaboratively. All the software developed in the process is released as free and open-source software.[23] An ambassador program has been announced in December 2020 with the stated goal to grow the community of users and contributors.[24]

Awards edit

In 2016 Software Heritage received the best community project award at Paris Open Source Summit 2016.[25][26]

In 2019 Software Heritage received the award of Academic Initiative from the Pôle Systematic.[27]

References edit

  1. ^ a b "Collect, organise, preserve and share the Software Heritage of mankind" (PDF). Software Heritage. 30 June 2016. Retrieved 26 July 2016.
  2. ^ a b UNESCO (14 November 2019). "Software Heritage". Retrieved 2 November 2020.
  3. ^ Brown, Paul (30 June 2016). "Software Heritage: Creating a safe haven for software". Boing Boing. Retrieved 26 July 2016.
  4. ^ Jost, Clémence (1 July 2016). "Open source: lancement de Software Heritage, la plus grande bibliothèque de codes source de la planète". Archimag. Retrieved 27 July 2016.
  5. ^ a b Abramatic, Jean-François; Di Cosmo, Roberto; Zacchiroli, Stefano (1 October 2018). "Building the Universal Archive of Source Code Journal Article". Communications of the ACM. Retrieved 2 November 2020.
  6. ^ a b c "Software Heritage Archive". Retrieved 2 November 2020.
  7. ^ "Software Heritage Persistent Identifiers". Software Heritage. Retrieved 2 November 2020.
  8. ^ "At ENEA the first institutional mirror of Software Heritage". ENEA. Archived from the original on 16 November 2020. Retrieved 2 November 2020.
  9. ^ "FossID establishes first independent mirror of world's larges source code archive". FossID. 6 December 2018. Archived from the original on 23 September 2020. Retrieved 2 November 2020.
  10. ^ Moody, Lyn (30 June 2016). "Software Heritage, the "Library of Alexandria of software," launches today". Ars Technica. Retrieved 26 July 2016.
  11. ^ Brogan, Jacob (30 June 2016). "Introducing Software Heritage, the Library of Alexandria for Code". Slate. Retrieved 26 July 2016.
  12. ^ UNESCO (3 April 2020). "Discours de la Directrice générale de l'UNESCO, Irina Bokova, à l'occasion de la signature de l'accord entre l'UNESCO et INRIA portant sur la préservation et le partage du patrimoine logiciel" (Press release). Paris, France: UNESCO. Retrieved 2020-11-03. Bokova, IG, Director-General, 2009–2017.
  13. ^ "National Plan for Open Science" (PDF). Ouvrir La Science. Archived from the original (PDF) on 1 July 2021. Retrieved 2 November 2020.
  14. ^ "Experts call for greater recognition of software source code as heritage for sustainable development" (Press release). Paris, France: UNESCO. 16 November 2020. Retrieved 2 November 2020.
  15. ^ "Paris Call on software source code as heritage for sustainable development". Paris: UNESCO. February 2019. Retrieved 2 November 2020.
  16. ^ "GitHub Archive Program". November 2019. Retrieved 2 November 2020.
  17. ^ "Software Heritage Sponsors". Retrieved 2 November 2020.
  18. ^ "NLNet Software Heritage grant". Retrieved 2 November 2020.
  19. ^ "Augmenting Software Heritage archiving capabilities". Retrieved 2 November 2020.
  20. ^ "Long-term reproducibility with Nix and Software HERITAGE". Retrieved 2 November 2020.
  21. ^ "Announcing the Mercurial public Bitbucket archive". Retrieved 2 November 2020.
  22. ^ Sloan Foundation. "Excited to support Software Heritage". Retrieved 2 November 2020.
  23. ^ "Software Heritage licensing". Retrieved 25 February 2021.
  24. ^ "Software Heritage Ambassadors". Retrieved 25 February 2021.
  25. ^ "Les Acteurs du Libre - Précédents Lauréats". Archived from the original on 18 January 2019. Retrieved 8 May 2020.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
  26. ^ "Paris Open Source Summit 2016 : Prix Acteurs du Libre : et les gagnants sont..." Programmez! (in French). 17 November 2016. Retrieved 28 June 2019.
  27. ^ @Pole_Systematic (June 27, 2019). "Convention @Pole_Systematic le Trophée Prix Initiative académique est remis @SWHeritage" (Tweet) – via Twitter.

External links edit