Data citation

Data citation is the provision of accurate, consistent and standardised referencing for datasets just as bibliographic citations are provided for other published sources like research articles or monographs. Typically the well established Digital Object Identifier (DOI) approach is used with DOIs taking users to a website that contains the metadata on the dataset and the dataset itself.[1][2]

A data citation example

A 2011 paper reported an inability to determine how often data citation happened in social sciences.[3]

2012-13 papers reported that data citation was becoming more common but the practice for it was not standard.[4][5][6]

In 2014 FORCE 11 published the Joint Declaration of Data Citation Principles covering the purpose, function and attributes of data citation.[7]

In October 2018 CrossRef expressed its support for cataloging datasets and recommending their citation.[8]

A popular data-oriented journal reported in April 2019 that it would now use data citations.[9]

A June 2019 paper suggested that increased data citation will make the practice more valuable for everyone by encouraging data sharing and also by increasing the prestige of people who share.[10]

Data citation is an emerging topic in computer science and it has been defined as a computational problem.[11] Indeed, citing data poses significant challenges to computer scientists and the main problems to address are related to:[12]

  • the use of heterogeneous data models and formats – e.g., relational databases, Comma-Separated Values (CSV), Extensible Markup Language (XML),[13][14] Resource Description Framework (RDF);[15]
  • the transience of data;
  • the necessity to cite data at different levels of coarseness – i.e., deep citations;[16]
  • the necessity to automatically generate citations to data with variable granularity.

See alsoEdit


  1. ^ Australian National Data Service: Data Citation Awareness Archived 2012-03-07 at the Wayback Machine (Accessed 20 March 2012)
  2. ^ Ball, A., Duke, M. (2011). ‘Data Citation and Linking’. DCC Briefing Papers. Edinburgh: Digital Curation Centre. Available online:
  3. ^ MOONEY, Hailey (April 2011). "Citing data sources in the social sciences: do authors do it?". Learned Publishing. 24 (2): 99–108. doi:10.1087/20110204. S2CID 34513423.
  4. ^ Edmunds, Scott C.; Pollard, Tom J.; Hole, Brian; Basford, Alexandra T. (2012-07-02). "Adventures in data citation: sorghum genome data exemplifies the new gold standard". BMC Research Notes. 5 (1): 223. doi:10.1186/1756-0500-5-223. ISSN 1756-0500. PMC 3392744. PMID 22571506.
  5. ^ "Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data". Data Science Journal. 12: CIDCR1–CIDCR75. 2013. doi:10.2481/dsj.OSOM13-043.
  6. ^ Mooney, Hailey; Newton, Mark P. (2012). "The Anatomy of a Data Citation: Discovery, Reuse, and Credit". Academic Commons. Columbia University. doi:10.7916/D8MW2STM.
  7. ^ Data Citation Synthesis Group (2014). Martone, M. (ed.). "Joint Declaration of Data Citation Principles". San Diego: Force11 Scholarly Communication Institute. doi:10.25490/a97f-egyk. Cite journal requires |journal= (help)
  8. ^ Lin, Jennifer (4 October 2018). "Data citation: let's do this". Crossref.
  9. ^ "Data citation needed". Scientific Data. 6 (1): 27. 10 April 2019. doi:10.1038/s41597-019-0026-5. PMC 6472333. PMID 30971699.
  10. ^ Pierce, Heather H.; Dev, Anurupa; Statham, Emily; Bierer, Barbara E. (4 June 2019). "Credit data generators for data reuse". Nature. 570 (7759): 30–32. doi:10.1038/d41586-019-01715-4. PMID 31164773. S2CID 174809246.
  11. ^ Buneman, P., Davidson, S. and Frew, J. (2016). ‘Why data citation is a computational problem’. Communications of the ACM, September 2016, Vol. 59 No. 9, pp. 50-57.
  12. ^ Silvello, G. (2018). ‘Theory and Practice of Data Citation’. Journal of the Association for Information Science and Technology (JASIST) (AIS Review), vol. 69 issue 1, pp. 6-20, 2018. Available online (open access):
  13. ^ Buneman, P. and Silvello, G. (2010). ‘A Rule-Based Citation System for Structured and Evolving Datasets’. IEEE Bulletin of the Technical Committee on Data Engineering, Vol. 3, No. 3. IEEE Computer Society, pp. 33-41, September 2010. Available online:
  14. ^ Silvello, G. (2017). ‘Learning to Cite Framework: How to Automatically Construct Citations for Hierarchical Data’. Journal of the Association for Information Science and Technology (JASIST), Volume 68 issue 6, pp. 1505-1524, June 2017. Available online:
  15. ^ Silvello, G. (2015). ‘A Methodology for Citing Linked Open Data Subsets’. D-Lib Magazine 21 (1/2), 2015. Available online:
  16. ^ Buneman, P. (2006). ‘How to Cite Curated Databases and how to Make Them Citable’. In Proc. of the 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006, pages 195–203, 2006.