%0 Journal Article %T Latent Dirichlet Allocation reveals spatial and taxonomic structure in a DNA-based census of soil biodiversity from a tropical forest %A Sommeria-Klein, G. %A Zinger, L. %A Coissac, E. %A Iribar, A. %A Schimann, H. %A Taberlet, P. %A Chave, J. %J Molecular Ecology Resources %D 2020 %V 20 %N 2 %I Blackwell Publishing Ltd %@ 1755098x (Issn) %F Sommeria-Klein_etal2020 %O exported from refbase (http://php.ecofog.gf/refbase/show.php?record=981), last updated on Mon, 08 Feb 2021 14:18:24 -0300 %X High-throughput sequencing of amplicons from environmental DNA samples permits rapid, standardized and comprehensive biodiversity assessments. However, retrieving and interpreting the structure of such data sets requires efficient methods for dimensionality reduction. Latent Dirichlet Allocation (LDA) can be used to decompose environmental DNA samples into overlapping assemblages of co-occurring taxa. It is a flexible model-based method adapted to uneven sample sizes and to large and sparse data sets. Here, we compare LDA performance on abundance and occurrence data, and we quantify the robustness of the LDA decomposition by measuring its stability with respect to the algorithm's initialization. We then apply LDA to a survey of 1,131 soil DNA samples that were collected in a 12-ha plot of primary tropical forest and amplified using standard primers for bacteria, protists, fungi and metazoans. The analysis reveals that bacteria, protists and fungi exhibit a strong spatial structure, which matches the topographical features of the plot, while metazoans do not, confirming that microbial diversity is primarily controlled by environmental variation at the studied scale. We conclude that LDA is a sensitive, robust and computationally efficient method to detect and interpret the structure of large DNA-based biodiversity data sets. We finally discuss the possible future applications of this approach for the study of biodiversity. © 2019 John Wiley & Sons Ltd %K community ecology %K environmental DNA %K metabarcoding %K OTU presence–absence %K soil microbiome %K topic modelling %K bacterium %K biodiversity %K biology %K classification %K eukaryote %K fungus %K genetics %K high throughput sequencing %K isolation and purification %K microbiology %K parasitology %K procedures %K soil %K Bacteria %K Computational Biology %K Eukaryota %K Fungi %K High-Throughput Nucleotide Sequencing %K Soil Microbiology %U http://dx.doi.org/10.1111/1755-0998.13109 %P 371-386