Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Panel
borderColorblack
bgColorwhite

Image Added

Looking for data sets to be used in teaching and RDM training? Check out the list below!

Questions regarding other training material? Contact our Helpdesk! Image Added

Info
This is a list of well documented training data sets datasets covering different data types and different aspects of research data management for use in research data management training

What are training datasets

Training datasets are essential for the effective teaching and training of young researchers. They form the basis for teaching data skills and analysis methods.

Training data

What are training data sets: https://zenodo.org/records/13805722

Here, we are refering to

data sets

datasets used in tutorials on research data management, as demo data set in tools or methods, or as examples for challenges in data handling. This definition does not cover

data sets

datasets used to train AI applications.

To be labelled as a training dataset they have to:

  • be FAIR (Findable, Accessible, Interoperable, Reusable).
  • be freely available, with an appropriate license and open data format.
  • be of reasonable size.
  • be citable.
  • enable easy-to-understand but interesting questions to be addressed.
  • be sufficiently documented.
  • be either “perfect” or datasets with didactic errors.

For an overview, check the Poster on What are training datasets in the context of NFDI4Biodiversity (in German): Signer, J., Schlägel, U., Tschink, D., & Röder, J. (2024). Trainingsdatensätze. Zenodo. https://doi.org/10.5281/zenodo.13805722

Training datasets . Training data sets help to illustrate all stages of the data life cycle (DLC), e.g.

  • Metadata standards to describe and structure (newly collected) data
  • (Reproducible) processing of data
  • (Reproducible) data analysis
  • Workflows to archive, share and publish data for personal and/or public re-use
















Figure 1: Data life cycle, CC BY 4.0. Source: RDMkit: The ELIXIR Research Data Management toolkit for Life Sciences URL: https://rdmkit.elixir-europe.org

Anchor
Datasets
Datasets
Biological

Data

Datasets

Natur conservation

  • DiverReef: A global database of the behavior of recreational divers and their interactions with reefs over 20 years: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4519
    • Giglio, V. J., Adelir‐Alves, J., Balzaretti Merino, N., Bravo‐Olivas, M. L., Camp, E. F., Casoli, E., Chávez‐Dagostino, R. M., Ferretti, E., Fraser, D., Grillo, A. C., Jiménez‐Guiérrez, S., Leite, K. L., Lucrezi, S., Luiz, O. J., Luna‐Pérez, B., McBride, J., Milanese, M., Moity, N., Pinheiro, J. V., … Ferreira, C. E. L. (2025). DiverReef: A global database of the behavior of recreational divers and their interactions with reefs over 20 years. Ecology, 106(2), e4519. https://doi.org/10.1002/ecy.4519
      Talk
      idtalk-2158
  • Global Roadkill Data: a dataset on terrestrial vertebrate mortality caused by collision with vehicles: https://www.nature.com/articles/s41597-024-04207-x
    • Grilo, C., Neves, T., Bates, J., Le Roux, A., Medrano-Vizcaíno, P., Quaranta, M., Silva, I., Soanes, K., Wang, Y., Data Collection Consortium, Abate, S. D., D’ Abra, F., Cedeño, S. A., De Alencar, P. R., De Almeida, M. F. P., Alves, M. H., Alves, P., De Assis, A. A., Ament, R., … Guinard, E. (2025). Global Roadkill Data: A dataset on terrestrial vertebrate mortality caused by collision with vehicles. Scientific Data, 12(1), 505. https://doi.org/10.1038/s41597-024-04207-x
  • SNAPSHOT USA 2019–2023: The First Five Years of Data From a Coordinated Camera Trap Survey of the United States: https://onlinelibrary.wiley.com/doi/10.1111/geb.13941
    • Rooney, B., Kays, R., Cove, M. V., Jensen, A., Goldstein, B. R., Pate, C., Castiblanco, P., Abell, M. E., Adley, J., Agenbroad, B., Ahlers, A. A., Alexander, P. D., Allen, D., Allen, M. L., Alston, J. M., Alyetama, M., Anderson, T. L., Andrade, R., Anhalt‐Depies, C., … McShea, W. J. (2025). SNAPSHOT USA 2019–2023: The First Five Years of Data From a Coordinated Camera Trap Survey of the United States. Global Ecology and Biogeography, 34(1), e13941. https://doi.org/10.1111/geb.13941
  • CamTrapAsia: A dataset of tropical forest vertebrate communities from 239 camera trapping studies: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4299
    • Mendes, C. P., Albert, W. R., Amir, Z., Ancrenaz, M., Ash, E., Azhar, B., Bernard, H., Brodie, J., Bruce, T., Carr, E., Clements, G. R., Davies, G., Deere, N. J., Dinata, Y., Donnelly, C. A., Duangchantrasiri, S., Fredriksson, G., Goossens, B., Granados, A., … Luskin, M. S. (2024). CamTrapAsia: A dataset of tropical forest vertebrate communities from 239 camera trapping studies. Ecology, 105(6), e4299. https://doi.org/10.1002/ecy.4299
  • The Breeding Bird Survey of the United Kingdom: https://onlinelibrary.wiley.com/doi/10.1111/geb.13943
    • Massimino, D., Baillie, S. R., Balmer, D. E., Bashford, R. I., Gregory, R. D., Harris, S. J., Heywood, J. J. N., Kelly, L. A., Noble, D. G., Pearce‐Higgins, J. W., Raven, M. J., Risely, K., Woodcock, P., Wotton, S. R., & Gillings, S. (2025). The Breeding Bird Survey of the United Kingdom. Global Ecology and Biogeography, 34(1), e13943. https://doi.org/10.1111/geb.13943
  • Integrated evidence-based extent of occurrence for North American bison (Bison bison) since 1500 CE and before: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3864
    Biological data sets
      • Martin, J. M., Short, R. A., Plumb, G. E., Markewicz, L., Van Vuren, D. H., Wehus‐Tow, B., Otárola‐Castillo, E., & Hill, M. E. (2023). Integrated evidence‐based extent of occurrence for North American bison (Bison bison) since 1500 CE and before. Ecology, 104(1), e3864. https://doi.org/10.1002/ecy.3864

    Biology

    Forestry

    DFG research training group RTG2300: Enrichment of European beech forests with conifers
      • The study plots, where the trees were recorded:
        • Ammer, C., Annighöfer, P., Balkenhol, N., Hertel, D., Leuschner, C., Polle, A., Lamersdorf, N., Scheu, S., & Glatthorn, J. (2020). RTG 2300—Study design, location, topography and climatic conditions of research plots in 2020 (p. 470 data points) [Text/tab-separated-values]. PANGAEA. https://doi.
    pangaea.de
    .pangaea.de
    923125

    Genetics

    Taxnomy, Traits

    Animal Tracking

    Time Series

    Spatial Data

    Environemntal Data
      • Dworczyk, C., Dunkel, A., Rafiei, F., & Syrbe, R.-U. (2025). Replication Package for: Exploring Spatial and Biodiversity Data with Python and JupyterLab: Release version v1.8.0 (Version 1.0, pp. 123835203, 17495) [Application/zip,text/markdown]. ioerDATA. https://doi.org/10.71830/6ILS40

    Environmental Datasets

    Land cover

    • EGLC: Ensemble Global Land Cover Reference Dataset (2000-2022): https://
    /zenodo.org/records/

    Other Collections



    Section


    Column
    width33%

    Image Added



    Column
    width33%

    Do you have questions, feedback or need help?

    Contact our Helpdesk for direct support.


    Column
    width33%

    Image Added