Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Panel
borderColorblack
bgColorwhite

Image Added

Looking for data sets to be used in teaching and RDM training? Check out the list below!

Questions regarding other training material? Contact our Helpdesk! Image Added

Info
This is a list of well documented training data sets datasets covering different data types and different aspects of research data management for use in research data management training

What are training datasets

Training datasets are essential for the effective teaching and training of young researchers. They form the basis for teaching data skills and analysis methods. Here, we are refering to datasets used in tutorials on research data management, as demo data set in tools or methods, or as examples for challenges in data handling. This definition does not cover datasets used to train AI applications.

To be labelled as a training dataset they have to:

What are training data sets:
  • be FAIR (Findable, Accessible, Interoperable, Reusable).
  • be freely available, with an appropriate license and open data format.
  • be of reasonable size.
  • be citable.
  • enable easy-to-understand but interesting questions to be addressed.
  • be sufficiently documented.
  • be either “perfect” or datasets with didactic errors.

For an overview, check the Poster on What are training datasets in the context of NFDI4Biodiversity (in German): Signer, J., Schlägel, U., Tschink, D., & Röder, J. (2024). Trainingsdatensätze. Zenodo. https://

zenodorecords

10.5281/zenodo.13805722

Training datasets help to illustrate all stages of the data life cycle (DLC), e.g.

  • Metadata standards to describe and structure (newly collected) data
  • (Reproducible) processing of data
  • (Reproducible) data analysis
  • Workflows to archive, share and publish data for personal and/or public re-use


Image Added















Figure 1: Data life cycle, CC BY 4.0. Source: RDMkit: The ELIXIR Research Data Management toolkit for Life Sciences URL: https://rdmkit.elixir-europe.org

Anchor
Datasets
Datasets
Biological Datasets

Biological Data

Natur conservation

  • DiverReef: A global database of the behavior of recreational divers and their interactions with reefs over 20 years: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4519
    • Giglio, V. J., Adelir‐Alves, J., Balzaretti Merino, N., Bravo‐Olivas, M. L., Camp, E. F., Casoli, E., Chávez‐Dagostino, R. M., Ferretti, E., Fraser, D., Grillo, A. C., Jiménez‐Guiérrez, S., Leite, K. L., Lucrezi, S., Luiz, O. J., Luna‐Pérez, B., McBride, J., Milanese, M., Moity, N., Pinheiro, J. V., … Ferreira, C. E. L. (2025). DiverReef: A global database of the behavior of recreational divers and their interactions with reefs over 20 years. Ecology, 106(2), e4519. https://doi.org/10.1002/ecy.4519
      Talk
      idtalk-2158
  • Global Roadkill Data: a dataset on terrestrial vertebrate mortality caused by collision with vehicles: https://www.nature.com/articles/s41597-024-04207-x
    • Grilo, C., Neves, T., Bates, J., Le Roux, A., Medrano-Vizcaíno, P., Quaranta, M., Silva, I., Soanes, K., Wang, Y., Data Collection Consortium, Abate, S. D., D’ Abra, F., Cedeño, S. A., De Alencar, P. R., De Almeida, M. F. P., Alves, M. H., Alves, P., De Assis, A. A., Ament, R., … Guinard, E. (2025). Global Roadkill Data: A dataset on terrestrial vertebrate mortality caused by collision with vehicles. Scientific Data, 12(1), 505. https://doi.org/10.1038/s41597-024-04207-x
  • SNAPSHOT USA 2019–2023: The First Five Years of Data From a Coordinated Camera Trap Survey of the United States: https://onlinelibrary.wiley.com/doi/10.1111/geb.13941
    • Rooney, B., Kays, R., Cove, M. V., Jensen, A., Goldstein, B. R., Pate, C., Castiblanco, P., Abell, M. E., Adley, J., Agenbroad, B., Ahlers, A. A., Alexander, P. D., Allen, D., Allen, M. L., Alston, J. M., Alyetama, M., Anderson, T. L., Andrade, R., Anhalt‐Depies, C., … McShea, W. J. (2025). SNAPSHOT USA 2019–2023: The First Five Years of Data From a Coordinated Camera Trap Survey of the United States. Global Ecology and Biogeography, 34(1), e13941. https://doi.org/10.1111/geb.13941
  • CamTrapAsia: A dataset of tropical forest vertebrate communities from 239 camera trapping studies: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4299
    • Mendes, C. P., Albert, W. R., Amir, Z., Ancrenaz, M., Ash, E., Azhar, B., Bernard, H., Brodie, J., Bruce, T., Carr, E., Clements, G. R., Davies, G., Deere, N. J., Dinata, Y., Donnelly, C. A., Duangchantrasiri, S., Fredriksson, G., Goossens, B., Granados, A., … Luskin, M. S. (2024). CamTrapAsia: A dataset of tropical forest vertebrate communities from 239 camera trapping studies. Ecology, 105(6), e4299. https://doi.org/10.1002/ecy.4299
  • The Breeding Bird Survey of the United Kingdom: https://onlinelibrary.wiley.com/doi/10.1111/geb.13943
    • Massimino, D., Baillie, S. R., Balmer, D. E., Bashford, R. I., Gregory, R. D., Harris, S. J., Heywood, J. J. N., Kelly, L. A., Noble, D. G., Pearce‐Higgins, J. W., Raven, M. J., Risely, K., Woodcock, P., Wotton, S. R., & Gillings, S. (2025). The Breeding Bird Survey of the United Kingdom. Global Ecology and Biogeography, 34(1), e13943. https://doi.org/10.1111/geb.13943
  • Integrated evidence-based extent of occurrence for North American bison (Bison bison) since 1500 CE and before: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3864
    • Martin, J. M., Short, R. A., Plumb, G. E., Markewicz, L., Van Vuren, D. H., Wehus‐Tow, B., Otárola‐Castillo, E., & Hill, M. E. (2023). Integrated evidence‐based extent of occurrence for North American bison (Bison bison) since 1500 CE and before. Ecology, 104(1), e3864. https://doi.org/10.1002/ecy.3864
Biological data sets

Biology

Forestry

  • DFG research training group RTG2300: Enrichment of European beech forests with conifers
    • Relative location and diameter of a full tree inventory on 8 studyplots: https://doi.pangaea.de/10.1594/PANGAEA.932023
    • The study plots, where the trees were recorded:
      • Ammer, C., Annighöfer, P., Balkenhol, N., Hertel, D., Leuschner, C., Polle, A., Lamersdorf, N., Scheu, S., & Glatthorn, J. (2020). RTG 2300—Study design, location, topography and climatic conditions of research plots in 2020 (p. 470 data points) [Text/tab-separated-values]. PANGAEA. https://doi.
  • pangaea.de
  • Anthropods
    • arthropods, collected on the plots from above: https://doi
  • .pangaea.de
  • 923125
  • Tree inventory data from permanent plots in French forest reserves: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4324
    • Cateau, E., Debaive, N., Drapier, N., Chantreau, F., Gilg, O., Laroche, F., Morin, X., Demets, V., Pimenta, R., Thompson, L., & Paillet, Y. (2024). Tree inventory data from permanent plots in F rench forest reserves. Ecology, 105(7), e4324. https://doi.org/10.1002/ecy.4324

Genetics

Taxnomy, Traits

Animal Tracking

Time Series

Spatial Data


Section


Column
width33%

Image Added



Column
width33%

Do you have questions, feedback or need help?

Contact our Helpdesk for direct support.


Column
width33%

Image Added