TY - JOUR AU - Sommeria-Klein, G. AU - Zinger, L. AU - Coissac, E. AU - Iribar, A. AU - Schimann, H. AU - Taberlet, P. AU - Chave, J. PY - 2020// TI - Latent Dirichlet Allocation reveals spatial and taxonomic structure in a DNA-based census of soil biodiversity from a tropical forest T2 - Mol. Ecol. Resour. JO - Molecular Ecology Resources SP - 371 EP - 386 VL - 20 IS - 2 PB - Blackwell Publishing Ltd KW - community ecology KW - environmental DNA KW - metabarcoding KW - OTU presence–absence KW - soil microbiome KW - topic modelling KW - bacterium KW - biodiversity KW - biology KW - classification KW - eukaryote KW - fungus KW - genetics KW - high throughput sequencing KW - isolation and purification KW - microbiology KW - parasitology KW - procedures KW - soil KW - Bacteria KW - Computational Biology KW - Eukaryota KW - Fungi KW - High-Throughput Nucleotide Sequencing KW - Soil Microbiology N2 - High-throughput sequencing of amplicons from environmental DNA samples permits rapid, standardized and comprehensive biodiversity assessments. However, retrieving and interpreting the structure of such data sets requires efficient methods for dimensionality reduction. Latent Dirichlet Allocation (LDA) can be used to decompose environmental DNA samples into overlapping assemblages of co-occurring taxa. It is a flexible model-based method adapted to uneven sample sizes and to large and sparse data sets. Here, we compare LDA performance on abundance and occurrence data, and we quantify the robustness of the LDA decomposition by measuring its stability with respect to the algorithm's initialization. We then apply LDA to a survey of 1,131 soil DNA samples that were collected in a 12-ha plot of primary tropical forest and amplified using standard primers for bacteria, protists, fungi and metazoans. The analysis reveals that bacteria, protists and fungi exhibit a strong spatial structure, which matches the topographical features of the plot, while metazoans do not, confirming that microbial diversity is primarily controlled by environmental variation at the studied scale. We conclude that LDA is a sensitive, robust and computationally efficient method to detect and interpret the structure of large DNA-based biodiversity data sets. We finally discuss the possible future applications of this approach for the study of biodiversity. © 2019 John Wiley & Sons Ltd SN - 1755098x (Issn) UR - http://dx.doi.org/10.1111/1755-0998.13109 N1 - exported from refbase (http://php.ecofog.gf/refbase/show.php?record=981), last updated on Mon, 08 Feb 2021 14:18:24 -0300 ID - Sommeria-Klein_etal2020 ER -