Looking for data sets to be used in teaching and RDM training? Check out the list below!

Questions regarding other training material? Contact our Helpdesk!

This is a list of well documented training datasets covering different data types and different aspects of research data management for use in research data management training. 

What are training datasets

Training datasets are essential for the effective teaching and training of young researchers. They form the basis for teaching data skills and analysis methods. Here, we are refering to datasets used in tutorials on research data management, as demo data set in tools or methods, or as examples for challenges in data handling. This definition does not cover datasets used to train AI applications.

To be labelled as a training dataset they have to:

  • be FAIR (Findable, Accessible, Interoperable, Reusable).
  • be freely available, with an appropriate license and open data format.
  • be of reasonable size.
  • be citable.
  • enable easy-to-understand but interesting questions to be addressed.
  • be sufficiently documented.
  • be either “perfect” or datasets with didactic errors.

For an overview, check the Poster on What are training datasets in the context of NFDI4Biodiversity (in German): Signer, J., Schlägel, U., Tschink, D., & Röder, J. (2024). Trainingsdatensätze. Zenodo. https://doi.org/10.5281/zenodo.13805722

Training datasets help to illustrate all stages of the data life cycle (DLC), e.g.

  • Metadata standards to describe and structure (newly collected) data
  • (Reproducible) processing of data
  • (Reproducible) data analysis
  • Workflows to archive, share and publish data for personal and/or public re-use
















Figure 1: Data life cycle, CC BY 4.0. Source: RDMkit: The ELIXIR Research Data Management toolkit for Life Sciences URL: https://rdmkit.elixir-europe.org

Biological Datasets

Natur conservation

  • DiverReef: A global database of the behavior of recreational divers and their interactions with reefs over 20 years: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4519
    • Giglio, V. J., Adelir‐Alves, J., Balzaretti Merino, N., Bravo‐Olivas, M. L., Camp, E. F., Casoli, E., Chávez‐Dagostino, R. M., Ferretti, E., Fraser, D., Grillo, A. C., Jiménez‐Guiérrez, S., Leite, K. L., Lucrezi, S., Luiz, O. J., Luna‐Pérez, B., McBride, J., Milanese, M., Moity, N., Pinheiro, J. V., … Ferreira, C. E. L. (2025). DiverReef: A global database of the behavior of recreational divers and their interactions with reefs over 20 years. Ecology, 106(2), e4519. https://doi.org/10.1002/ecy.4519
  • Global Roadkill Data: a dataset on terrestrial vertebrate mortality caused by collision with vehicles: https://www.nature.com/articles/s41597-024-04207-x
    • Grilo, C., Neves, T., Bates, J., Le Roux, A., Medrano-Vizcaíno, P., Quaranta, M., Silva, I., Soanes, K., Wang, Y., Data Collection Consortium, Abate, S. D., D’ Abra, F., Cedeño, S. A., De Alencar, P. R., De Almeida, M. F. P., Alves, M. H., Alves, P., De Assis, A. A., Ament, R., … Guinard, E. (2025). Global Roadkill Data: A dataset on terrestrial vertebrate mortality caused by collision with vehicles. Scientific Data, 12(1), 505. https://doi.org/10.1038/s41597-024-04207-x
  • SNAPSHOT USA 2019–2023: The First Five Years of Data From a Coordinated Camera Trap Survey of the United States: https://onlinelibrary.wiley.com/doi/10.1111/geb.13941
    • Rooney, B., Kays, R., Cove, M. V., Jensen, A., Goldstein, B. R., Pate, C., Castiblanco, P., Abell, M. E., Adley, J., Agenbroad, B., Ahlers, A. A., Alexander, P. D., Allen, D., Allen, M. L., Alston, J. M., Alyetama, M., Anderson, T. L., Andrade, R., Anhalt‐Depies, C., … McShea, W. J. (2025). SNAPSHOT USA 2019–2023: The First Five Years of Data From a Coordinated Camera Trap Survey of the United States. Global Ecology and Biogeography, 34(1), e13941. https://doi.org/10.1111/geb.13941
  • CamTrapAsia: A dataset of tropical forest vertebrate communities from 239 camera trapping studies: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4299
    • Mendes, C. P., Albert, W. R., Amir, Z., Ancrenaz, M., Ash, E., Azhar, B., Bernard, H., Brodie, J., Bruce, T., Carr, E., Clements, G. R., Davies, G., Deere, N. J., Dinata, Y., Donnelly, C. A., Duangchantrasiri, S., Fredriksson, G., Goossens, B., Granados, A., … Luskin, M. S. (2024). CamTrapAsia: A dataset of tropical forest vertebrate communities from 239 camera trapping studies. Ecology, 105(6), e4299. https://doi.org/10.1002/ecy.4299
  • The Breeding Bird Survey of the United Kingdom: https://onlinelibrary.wiley.com/doi/10.1111/geb.13943
    • Massimino, D., Baillie, S. R., Balmer, D. E., Bashford, R. I., Gregory, R. D., Harris, S. J., Heywood, J. J. N., Kelly, L. A., Noble, D. G., Pearce‐Higgins, J. W., Raven, M. J., Risely, K., Woodcock, P., Wotton, S. R., & Gillings, S. (2025). The Breeding Bird Survey of the United Kingdom. Global Ecology and Biogeography, 34(1), e13943. https://doi.org/10.1111/geb.13943
  • Integrated evidence-based extent of occurrence for North American bison (Bison bison) since 1500 CE and before: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3864
    • Martin, J. M., Short, R. A., Plumb, G. E., Markewicz, L., Van Vuren, D. H., Wehus‐Tow, B., Otárola‐Castillo, E., & Hill, M. E. (2023). Integrated evidence‐based extent of occurrence for North American bison (Bison bison) since 1500 CE and before. Ecology, 104(1), e3864. https://doi.org/10.1002/ecy.3864

Biology

Forestry

  • DFG research training group RTG2300: Enrichment of European beech forests with conifers
    • Relative location and diameter of a full tree inventory on 8 studyplots: https://doi.pangaea.de/10.1594/PANGAEA.932023
    • The study plots, where the trees were recorded:
      • Ammer, C., Annighöfer, P., Balkenhol, N., Hertel, D., Leuschner, C., Polle, A., Lamersdorf, N., Scheu, S., & Glatthorn, J. (2020). RTG 2300—Study design, location, topography and climatic conditions of research plots in 2020 (p. 470 data points) [Text/tab-separated-values]. PANGAEA. https://doi.org/10.1594/PANGAEA.923125
    • Abundance and taxonomic data of forest arthropods, collected on the plots from above: https://doi.org/10.1594/PANGAEA.949484
      • Matevski, D., & Kriegel, P. (2022). Abundance and taxonomic data of arthropods collected with pitfall traps from temperate forest stands from Lower Saxony, Germany in 2019 (p. 2 datasets) [Application/zip]. PANGAEA. https://doi.org/10.1594/PANGAEA.949484

  • Tree inventory data from permanent plots in French forest reserves: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4324
    • Cateau, E., Debaive, N., Drapier, N., Chantreau, F., Gilg, O., Laroche, F., Morin, X., Demets, V., Pimenta, R., Thompson, L., & Paillet, Y. (2024). Tree inventory data from permanent plots in F rench forest reserves. Ecology, 105(7), e4324. https://doi.org/10.1002/ecy.4324

Genetics

Taxnomy, Traits

Animal Tracking

Time Series

Spatial Data

Environmental Datasets

Land cover

Other Collections





Do you have questions, feedback or need help?

Contact our Helpdesk for direct support.