The Reference Sequence (RefSeq) database[1] is an open access, annotated and curated collection of publicly available nucleotide sequences (DNA, RNA) and their protein products. This database is built by National Center for Biotechnology Information (NCBI), and, unlike GenBank, provides only a single record for each natural biological molecule (i.e. DNA, RNA or protein) for major organisms ranging from viruses to bacteria to eukaryotes.

Descriptioncurated non-redundant sequence database of genomes.
Research centerNational Center for Biotechnology Information
Primary citationPruitt KD & al. (2005)[1]

For each model organism, RefSeq aims to provide separate and linked records for the genomic DNA, the gene transcripts, and the proteins arising from those transcripts. RefSeq is limited to major organisms for which sufficient data are available (more than 66,000 distinct “named” organisms as of September 2011),[2] while GenBank includes sequences for any organism submitted (approximately 250,000 different named organisms).

RefSeq categoriesEdit

Category Description
NC Complete genomic molecules
NG Incomplete genomic region
NP Protein
XM predicted mRNA model
XR predicted ncRNA model
XP predicted Protein model (eukaryotic sequences)
WP predicted Protein model (prokaryotic sequences)

For more details and more categories, see Table 1 in Chapter 18 of the book The Reference Sequence (RefSeq) Database.

See alsoEdit


  1. ^ a b Pruitt KD, Tatusova T, Maglott DR (2005). "NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins". Nucleic Acids Res. 33 (Database issue): D501-4. doi:10.1093/nar/gki025. PMC 539979. PMID 15608248.
  2. ^ RefSeq Release 80 Statistics (Report). National Library of Medicine. 2017. Retrieved 13 January 2017.


External linksEdit