Earth Microbiome Project

The Earth Microbiome Project (EMP) is an initiative founded by Janet Jansson, Jack Gilbert and Rob Knight in 2010 to collect natural samples and to analyze the microbial community around the globe.[1]

Earth Microbiome Project
Formation2010
Websitehttps://earthmicrobiome.org/

Microbes are highly abundant, diverse, and important in ecology. Yet as of 2010, it was estimated that the total global environmental DNA sequencing effort had produced less than 1 percent of the total DNA found in a liter of seawater or a gram of soil,[2] and the specific interactions between microbes are largely unknown.

The EMP is aiming to process at most 200,000 samples in different biomes, creating a complete database of microbes on earth to characterize environments and ecosystems by microbial composition and interaction.[3]

Actors edit

The non-governmental international project was launched in 2010. As of January 2018, it listed 161 institutions, all of them are universities and university-affiliated institutions, except for IBM Research and the Atlanta Zoo. Crowdsourcing has come from the John Templeton Foundation, the W. M. Keck Foundation, the Argonne National Laboratory by the U.S. Dept. of Energy, the Australian Research Council, the Tula Foundation, and the Samuel Lawrence Foundation. Companies have provided in-kind support, including MO BIO Laboratories, Luca Technologies, Eppendorf, Boreal Genomics, Illumina, Roche and Integrated DNA Technologies.[4]

Goals edit

The primary goal[by whom?] of the Earth Microbiome Project (EMP) has been[when?] to survey microbial composition in many environments across the planet, across time as well as space, using a standard set of protocols.[1] The development of standardized protocols reduces variation and bias in analytical pipelines that complicates comparison of microbial community structures.[5]

Another important goal is to determine how the reconstruction of microbial communities is affected by analytic biases. The rate of technological advancement is rapid, and it is necessary to understand how data using updated protocols will compare with data collected using earlier techniques. Information from this project will be archived in a database to facilitate analysis. Other outputs will include a global atlas of protein function and a catalog of reassembled genomes classified by their taxonomic distributions.[5]

Methods edit

Standard protocols for sampling, DNA extraction, 16S rRNA amplification, 18S rRNA amplification, and "shotgun" metagenomics have been developed or are under development.[6]

Sample collection edit

Samples will be collected using appropriate methods from various environments including the deep ocean, fresh water lakes, desert sand, and soil. Standardized collection protocols will be used when possible, so that the results are comparable. Microbes from natural samples cannot always be cultured. Because of this, metagenomic methods will be employed to sequence all the DNA or RNA in a sample in a culture-independent fashion.

Wet lab edit

The wet lab usually needs to perform a series of procedures to select and purify the microbial portion of the samples. The purification process may be very different according to the type of sample. DNA will be extracted from soil particles, or microbes will be concentrated using filtration techniques. In addition, various amplification techniques may be used to increase DNA yield. For example, non-PCR based Multiple displacement amplification is preferred by some researchers. DNA extraction, the use of primers, and PCR protocols are all areas that, to avoid bias, need to be performed following carefully standardized protocols.[5]

Sequencing edit

Researchers can sequence a metagenomic sample using two main approaches depending on the biological question. To identify the types and abundances of organisms present, the preferred approach is to target and amplify a specific gene, often that is highly conserved among the species of interest, often the 16S ribosomal RNA gene for bacteria and the 18S ribosomal RNA gene for protists. This approach is called "deep sequencing", which allows rare species to be identified in a sample. However, this approach will not enable assembly of any whole genomes, nor will it provide information on how organisms may interact with each other. The second approach is shotgun metagenomics, in which all the DNA in the sample is sheared and the fragments sequenced. In principle, this approach allows for the assembly of whole microbial genomes and inference of metabolic relationships. However, if most microbes are uncharacterized in a given environment, de novo assembly will be computationally expensive.[7]

Data analysis edit

EMP proposes to standardize the bioinformatics aspects of sample processing.[5]

Data analysis usually includes the following steps: 1) Data clean up. A pre-procedure to clean up any reads with low quality scores removing any sequences containing "N" or ambiguous nucleotides and 2) Assigning taxonomy to the sequences which is usually done using tools such as BLAST[8] or RDP.[9] Very often, novel sequences are discovered which cannot be mapped to existing taxonomy. In this case, taxonomy is derived from a phylogenetic tree which is created with the novel sequences and a pool of closely related known sequences.[10]

Additional methods may be employed depending on the sequencing technology and the underlying biological question. For example, an assembly will be required if the sequenced reads are too short to infer any useful information. An assembly can also be used to construct whole genomes, providing useful information on the species. Furthermore, if the metabolic relationships within a microbial metagenome are to be understood, DNA sequences would need to be translated into amino acid sequences, for example with using gene prediction tools such as GeneMark[11] or FragGeneScan.[12]

Project output edit

The four key outputs from the EMP have been:[13]

  • Regardless of their degree of conclusiveness, all primary data generated from the Earth Microbiome Project will be stored in a centralized database called the "Gene Atlas" (GA). The GA will have sequence data, annotations and environmental metadata. Both known and unknown sequences, i.e. "Dark Matter", will be included hoping that in time the unknown sequences may eventually be characterized.
  • Assembled genomes, annotated using an automated pipeline, will be stored in "Earth Microbiome Assembled Genomes" (EM-AG) in public repositories. These will enable comparative genomic analysis.
  • Interactive visualizations of the data will be provided through the "Earth Microbiome Visualization Portal" (EM-VIP), which will allow the relationship between microbial makeup, environmental parameters, and genomic function to be viewed.
  • Reconstructed metabolic profiles will be offered through "Earth Microbiome Metabolic Reconstruction" (EMMR).

Challenges edit

Large amounts of sequence data generated from analyzing diverse microbial communities are a challenge to store, organize and analyse. The problem is exacerbated by the short reads provided by the high-throughput sequencing platform that will be the standard instrument used in the EMP project. Improved algorithms, analysis tools, huge amounts of computer storage, and access to thousands of hours of supercomputer time will be necessary.[7]

Another challenge is the large number of sequencing errors expected, and distinguishing them from actual diversity in the collected microbial samples.[7] Next-generation sequencing technologies provide enormous throughput but lower accuracies than older sequencing methods. When sequencing a single genome, the intrinsic lower accuracy of these methods is more than compensated for by the ability to cover the entire genome multiple times in opposite directions from multiple start points, but this capability provides no improvement in accuracy when sequencing a diverse mixture of genomes.

Despite the issuance of standard protocols, systematic biases from lab to lab are expected. The need to amplify DNA from samples with low biomass will introduce additional distortions of the data. Assembly of genomes of even the dominant organisms in a diverse sample of organisms requires gigabytes of sequence data.[7]

With the advancement in high-throughput sequencing technologies, many sequences are entering public databases with no experimentally determined function, but which have been annotated on the basis of observed homologies with a known sequence. The first known sequence is used to annotate the first unknown sequence, but a problem that has become prevalent in the public sequence databases, which the EMP must avoid, is that the first unknown sequence is being used to annotate the second unknown sequence and so on. Sequence homology is only a modestly reliable predictor of function.[14]

See also edit

Notes edit

  1. ^ a b Gilbert, J.A.; Jansson, J. K.; Knight, R. (2014). "The Earth Microbiome project: successes and aspirations". BMC Biology. 12 (1): 69. doi:10.1186/s12915-014-0069-1. PMC 4141107. PMID 25184604.
  2. ^ Gilbert, J. A.; Meyer, F.; Antonopoulos, D.; Balaji, P.; Brown, C. T.; Brown, C. T.; Desai, N.; Eisen, J. A.; Evers, D.; Field, D.; Feng, W.; Huson, D.; Jansson, J.; Knight, R.; Knight, J.; Kolker, E.; Konstantindis, K.; Kostka, J.; Kyrpides, N.; MacKelprang, R.; McHardy, A.; Quince, C.; Raes, J.; Sczyrba, A.; Shade, A.; Stevens, R. (2010). "Meeting Report: The Terabase Metagenomics Workshop and the Vision of an Earth Microbiome Project". Standards in Genomic Sciences. 3 (3): 243–248. doi:10.4056/sigs.1433550. PMC 3035311. PMID 21304727.
  3. ^ Gilbert, J. A.; O'Dor, R.; King, N.; Vogel, T. M. (2011). "The importance of metagenomic surveys to microbial ecology: Or why Darwin would have been a metagenomic scientist". Microbial Informatics and Experimentation. 1 (1): 5. doi:10.1186/2042-5783-1-5. PMC 3348666. PMID 22587826.
  4. ^ The Earth Microbiome Project is a systematic attempt to characterize global microbial taxonomic and functional diversity for the benefit of the planet and humankind. Archived 2020-05-11 at the Wayback Machine Earth Microbiome Project 2018, retrieved 3 January 2018
  5. ^ a b c d Gilbert, J.A.; Meyer, F. (2012). "Modeling the Earth Microbiome". Microbe Magazine. 7 (2): 64–69. doi:10.1128/microbe.7.64.1.
  6. ^ "Earth Microbiome Project / Standard Protocols". Archived from the original on 2012-03-16. Retrieved 2012-03-07.
  7. ^ a b c d Jansson, Janet (2011). "Towards "Tera-Terra": Terabase Sequencing of Terrestrial Metagenomes". Microbe Magazine. 6 (7): 309–15. doi:10.1128/microbe.6.309.1.
  8. ^ "BLAST: Basic Local Alignment Search Tool". Archived from the original on 2011-08-09. Retrieved 2012-03-05.
  9. ^ "Ribosomal Database Project". Archived from the original on 2020-08-19. Retrieved 2012-03-06.
  10. ^ Meyer, F.; Paarmann, D.; d'Souza, M.; Olson, R.; Glass, E. M.; Kubal, M.; Paczian, T.; Rodriguez, A.; Stevens, R.; Wilke, A.; Wilkening, J.; Edwards, R. A. (2008). "The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes". BMC Bioinformatics. 9: 386. doi:10.1186/1471-2105-9-386. PMC 2563014. PMID 18803844.
  11. ^ "GeneMark - Free gene prediction software". Archived from the original on 2013-10-30. Retrieved 2012-03-06.
  12. ^ "FragGeneScan". Archived from the original on 2019-09-17. Retrieved 2012-03-06.
  13. ^ "Earth Microbiome Project / Defining the Tasks". Archived from the original on 2012-03-16. Retrieved 2012-03-07.
  14. ^ Gilbert, J. A.; Dupont, C. L. (2011). "Microbial Metagenomics: Beyond the Genome". Annual Review of Marine Science. 3: 347–371. Bibcode:2011ARMS....3..347G. doi:10.1146/annurev-marine-120709-142811. PMID 21329209.

External links edit