Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Panel
borderColorblack
bgColorwhite

Looking for data sets to be used in teaching and RDM training? Check out the list below!

Questions regarding other training material? Contact our Helpdesk!

Info
This is a list of well documented training datasets covering different data types and different aspects of research data management for use in research data management training. 

What are training datasets

Training datasets are essential for the effective teaching and training of young researchers. They form the basis for teaching data skills and analysis methods. Here, we are refering to datasets used in tutorials on research data management, as demo data set in tools or methods, or as examples for challenges in data handling. This definition does not cover datasets used to train AI applications.

To be labelled as a training dataset they have to:

  • be FAIR (Findable, Accessible, Interoperable, Reusable).
  • be freely available, with an appropriate license and open data format.
  • be of reasonable size.
  • be citable.
  • enable easy-to-understand but interesting questions to be addressed.
  • be sufficiently documented.
  • be either “perfect” or datasets with didactic errors.

For an overview, check the Poster on What are training datasets in the context of NFDI4Biodiversity (in German): Signer, J., Schlägel, U., Tschink, D., & Röder, J. (2024). Trainingsdatensätze. Zenodo. https://doi.org/10.5281/zenodo.13805722

Training datasets help to illustrate all stages of the data life cycle (DLC), e.g.

  • Metadata standards to describe and structure (newly collected) data
  • (Reproducible) processing of data
  • (Reproducible) data analysis
  • Workflows to archive, share and publish data for personal and/or public re-use
















Figure 1: Data life cycle, CC BY 4.0. Source: RDMkit: The ELIXIR Research Data Management toolkit for Life Sciences URL:   https://rdmkit.elixir-europe.org

Anchor
Datasets
Datasets
Biological Datasets

Natur conservation

  • DiverReef: A global database of the behavior of recreational divers and their interactions with reefs over 20 years: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4519
    • Giglio, V. J., Adelir‐Alves, J., Balzaretti Merino, N., Bravo‐Olivas, M. L., Camp, E. F., Casoli, E., Chávez‐Dagostino, R. M., Ferretti, E., Fraser, D., Grillo, A. C., Jiménez‐Guiérrez, S., Leite, K. L., Lucrezi, S., Luiz, O. J., Luna‐Pérez, B., McBride, J., Milanese, M., Moity, N., Pinheiro, J. V., … Ferreira, C. E. L. (2025). DiverReef: A global database of the behavior of recreational divers and their interactions with reefs over 20 years. Ecology, 106(2), e4519. https://doi.org/10.1002/ecy.4519
      Talk
      idtalk-2158
  • Global Roadkill Data: a dataset on terrestrial vertebrate mortality caused by collision with vehicles: https://www.nature.com/articles/s41597-024-04207-x
    • Grilo, C., Neves, T., Bates, J., Le Roux, A., Medrano-Vizcaíno, P., Quaranta, M., Silva, I., Soanes, K., Wang, Y., Data Collection Consortium, Abate, S. D., D’ Abra, F., Cedeño, S. A., De Alencar, P. R., De Almeida, M. F. P., Alves, M. H., Alves, P., De Assis, A. A., Ament, R., … Guinard, E. (2025). Global Roadkill Data: A dataset on terrestrial vertebrate mortality caused by collision with vehicles. Scientific Data, 12(1), 505. https://doi.org/10.1038/s41597-024-04207-x
  • SNAPSHOT USA 2019–2023: The First Five Years of Data From a Coordinated Camera Trap Survey of the United States: https://onlinelibrary.wiley.com/doi/10.1111/geb.13941
    • Rooney, B., Kays, R., Cove, M. V., Jensen, A., Goldstein, B. R., Pate, C., Castiblanco, P., Abell, M. E., Adley, J., Agenbroad, B., Ahlers, A. A., Alexander, P. D., Allen, D., Allen, M. L., Alston, J. M., Alyetama, M., Anderson, T. L., Andrade, R., Anhalt‐Depies, C., … McShea, W. J. (2025). SNAPSHOT USA 2019–2023: The First Five Years of Data From a Coordinated Camera Trap Survey of the United States. Global Ecology and Biogeography, 34(1), e13941. https://doi.org/10.1111/geb.13941
  • CamTrapAsia: A dataset of tropical forest vertebrate communities from 239 camera trapping studies: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4299
    • Mendes, C. P., Albert, W. R., Amir, Z., Ancrenaz, M., Ash, E., Azhar, B., Bernard, H., Brodie, J., Bruce, T., Carr, E., Clements, G. R., Davies, G., Deere, N. J., Dinata, Y., Donnelly, C. A., Duangchantrasiri, S., Fredriksson, G., Goossens, B., Granados, A., … Luskin, M. S. (2024). CamTrapAsia: A dataset of tropical forest vertebrate communities from 239 camera trapping studies. Ecology, 105(6), e4299. https://doi.org/10.1002/ecy.4299
  • The Breeding Bird Survey of the United Kingdom: https://onlinelibrary.wiley.com/doi/10.1111/geb.13943
    • Massimino, D., Baillie, S. R., Balmer, D. E., Bashford, R. I., Gregory, R. D., Harris, S. J., Heywood, J. J. N., Kelly, L. A., Noble, D. G., Pearce‐Higgins, J. W., Raven, M. J., Risely, K., Woodcock, P., Wotton, S. R., & Gillings, S. (2025). The Breeding Bird Survey of the United Kingdom. Global Ecology and Biogeography, 34(1), e13943. https://doi.org/10.1111/geb.13943
  • Integrated evidence-based extent of occurrence for North American bison (Bison bison) since 1500 CE and before: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3864
    • Martin, J. M., Short, R. A., Plumb, G. E., Markewicz, L., Van Vuren, D. H., Wehus‐Tow, B., Otárola‐Castillo, E., & Hill, M. E. (2023). Integrated evidence‐based extent of occurrence for North American bison (Bison bison) since 1500 CE and before. Ecology, 104(1), e3864. https://doi.org/10.1002/ecy.3864

Biology

  • Palmer Penguins (trait data): https://allisonhorst.github.io/palmerpenguins/articles/intro.html
    • Horst, A. M., Hill, A. P., & Gorman, K. B. (2020). allisonhorst/palmerpenguins: V0.1.0 (Version v0.1.0) [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.3960218
    • Data originally published in:

      • Gorman KB, Williams TD, Fraser WR (2014). Ecological sexual dimorphism and environmental variability within a community of Antarctic penguins (genus Pygoscelis). PLoS ONE 9(3):e90081. https://doi.org/10.1371/journal.pone.0090081

      Individual datasets: Individual data can be accessed directly via the Environmental Data Initiative:

      • Palmer Station Antarctica LTER and K. Gorman, 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Adélie penguins (Pygoscelis adeliae) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative. https://doi.org/10.6073/pasta/98b16d7d563f265cb52372c8ca99e60f 

      • Palmer Station Antarctica LTER and K. Gorman, 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Gentoo penguin (Pygoscelis papua) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative. https://doi.org/10.6073/pasta/7fca67fb28d56ee2ffa3d9370ebda689 

      • Palmer Station Antarctica LTER and K. Gorman, 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Chinstrap penguin (Pygoscelis antarcticus) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 6. Environmental Data Initiative. https://doi.org/10.6073/pasta/c14dfcfada8ea13a17536e73eb6fbe9e

  • Hawks: contained in the Stat2DataR package (https://cran.r-project.org/web/packages/Stat2Data/index.html)
  • Count data of harbour seal counts in the Elbe Estuary: https://doi.pangaea.de/10.1594/PANGAEA.907670
    • Bundesanstalt Für Gewässerkunde. (2019). Harbour seal counts in the Elbe Estuary, Germany, between Wedel and Cuxhaven in 2018/2019 (p. 451 data points) [Text/tab-separated-values]. PANGAEA. https://doi.org/10.1594/PANGAEA.907670
  • Small mammals from the Caatinga: A dataset for the Brazilian semiarid biome: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3879
    • Da Costa‐Pinto, A. L., Bovendorp, R. S., Bocchiglieri, A., Caccavo, A., Delciellos, A. C., Malhado, A. C., De Almeida, A. K. R., Braga, C., Loretto, D., Câmara, E. M. V. C., Menezes, F. H., Guilhon, G., Paise, G., Sobral, G., Varjão, I. C. G., Ferreira, J. V. A., Da Silva Oliveira, L., Geise, L., Pereira, L. C. M., … Ladle, R. J. (2023). Small mammals from the Caatinga: A dataset for the Brazilian semiarid biome. Ecology, 104(1), e3879. https://doi.org/10.1002/ecy.3879
  • A bird species occurrence dataset from passive audio recordings across dense urban areas in Gothenburg, Sweden: https://www.nature.com/articles/s41597-025-05481-z
      Long-term observation of the egg and chick size in the nests of Larus ichthyaetus in Lake Chany, Russia: https://www.nature.com/articles/s41597-022-01454-8
      • Yurlov, A. K., Yurlova, N. I., Garyushkina, M. Yu., Selivanova, M. A., & Doi, H. (2022). Long-term observation of the egg and chick size in the nests of Larus ichthyaetus in Lake Chany, Russia. Scientific Data, 9(1), 372. httpsEldesoky, A. H., Gil, J., Kindvall, O., Stavroulaki, I., Jonasson, L., Bennett, D., Yang, W., Martínez Diaz, A. F., Lichter, R., Petrou, F., & Pont, M. B. (2025). A bird species occurrence dataset from passive audio recordings across dense urban areas in Gothenburg, Sweden. Scientific Data, 12(1), 1180. https://doi.org/10.1038/s41597-022025-0145405481-8

    Forestry

    • Long-term observation of the egg and chick size in the nests of Larus ichthyaetus in Lake Chany, Russia
    • DFG research training group RTG2300: Enrichment of European beech forests with conifersRelative location and diameter of a full tree inventory on 8 studyplots
    • : https://
    • doi
    • www.
    • pangaea
    • nature.
    • de/10.1594/PANGAEA.932023Glatthorn, J., & Parth, A. (2021). RTG 2300—Tree census data—Winter 2017/2018 (p. 112643 data points) [Text/tab-separated-values]. PANGAEA
    • com/articles/s41597-022-01454-8
      • Yurlov, A. K., Yurlova, N. I., Garyushkina, M. Yu., Selivanova, M. A., & Doi, H. (2022). Long-term observation of the egg and chick size in the nests of Larus ichthyaetus in Lake Chany, Russia. Scientific Data, 9(1), 372. https://doi.org/10.1038/s41597-022-01454-8

    Forestry

    • DFG research training group RTG2300: Enrichment of European beech forests with conifers
      • Relative location and diameter of a full tree inventory on 8 studyplots: https://doi.pangaea.de/10.1594/PANGAEA.932023
      • 1594/PANGAEA.932023
      • The study plots, where the trees were recorded:
        • Ammer, C., Annighöfer, P., Balkenhol, N., Hertel, D., Leuschner, C., Polle, A., Lamersdorf, N., Scheu, S., & Glatthorn, J. (2020). RTG 2300—Study design, location, topography and climatic conditions of research plots in 2020 (p. 470 data points) [Text/tab-separated-values]. PANGAEA. https://doi.org/10.1594/PANGAEA.923125
      • Abundance and taxonomic data of forest arthropods, collected on the plots from above: https://doi.pangaea.de/10.1594/PANGAEA.923125.org/10.1594/PANGAEA.949484 (sollte
        • Matevski, D., & Kriegel, P. (2022). Abundance and taxonomic data of arthropods collected with pitfall traps from temperate forest stands from Lower Saxony, Germany in 2019 (p. 2 datasets) [Application/zip]. PANGAEA.

        sein)
    • Tree inventory data from permanent plots in French forest reserves: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4324
      • Cateau, E., Debaive, N., Drapier, N., Chantreau, F., Gilg, O., Laroche, F., Morin, X., Demets, V., Pimenta, R., Thompson, L., & Paillet, Y. (2024). Tree inventory data from permanent plots in F rench forest reserves. Ecology, 105(7), e4324. https://doi.org/10.1002/ecy.4324

    Genetics

    Taxnomy, Traits

    Animal Tracking

    • HomeRange: A global database of mammalian home ranges: https://onlinelibrary.wiley.com/doi/full/10.1111/geb.13625
      • Broekman, M. J. E., Hoeks, S., Freriks, R., Langendoen, M. M., Runge, K. M., Savenco, E., Ter Harmsel, R., Huijbregts, M. A. J., & Tucker, M. A. (2023). HomeRange: A global database of mammalian home ranges. Global Ecology and Biogeography, 32(2), 198–205
      • Fruhner, M., Tapken, H., & Stroetmann, E. (2024). Images of 175 individual animals of five distinct taxonomic groups: Camels, penguins, goats, tortoises and toads (ZooMix ID) (p. 3356 data points) [Text/tab-separated-values]. PANGAEA. https://doi.org/10.15941111/PANGAEAgeb.967637

    Animal Tracking

    • The body size and temperature dependence of organismal locomotionHomeRange: A global database of mammalian home ranges: https://esajournals.onlinelibrary.wiley.com/doi/full/10.11111002/gebecy.136253114
      • Cloyed, C. S., & Dell, A. I. (2020). The body size and temperature dependence of organismal locomotion. Ecology, 101(10), e03114
      • Broekman, M. J. E., Hoeks, S., Freriks, R., Langendoen, M. M., Runge, K. M., Savenco, E., Ter Harmsel, R., Huijbregts, M. A. J., & Tucker, M. A. (2023). HomeRange: A global database of mammalian home ranges. Global Ecology and Biogeography, 32(2), 198–205. https://doi.org/10.11111002/geb.13625
      The body size and temperature dependence of organismal locomotion

    Time Series

    • Nutrient and stoichiometric time series measurements of decomposing coarse detritus in freshwaters: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3114ecy.4114
      • Robbins, C. J., Norman, B. C., Halvorson, H. M., Manning, D. W. P., Bastias, E., Biasi, C., Dodd, A. K., Eckert, R. A., Gossiaux, A., Jabiol, J., Mehring, A
      • Cloyed, C. S., & DellPastor, A. I. (20202023). The body size and temperature dependence of organismal locomotionNutrient and stoichiometric time series measurements of decomposing coarse detritus in freshwaters. Ecology, 101 104(108), e03114e4114. https://doi.org/10.1002/ecy.3114

    Time Series

    • Annual biomass spatial data for southern California (2001–2021): Above- and belowground, standing dead, and litterNutrient and stoichiometric time series measurements of decomposing coarse detritus in freshwaters: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.41144031
      • Schrader‐Patton, C. C., Underwood, E. C., & Sorenson, Q. M. (2023). Annual biomass spatial data for southern California (2001–2021): Above‐ and belowground, standing dead, and litter
      • Robbins, C. J., Norman, B. C., Halvorson, H. M., Manning, D. W. P., Bastias, E., Biasi, C., Dodd, A. K., Eckert, R. A., Gossiaux, A., Jabiol, J., Mehring, A. S., & Pastor, A. (2023). Nutrient and stoichiometric time series measurements of decomposing coarse detritus in freshwaters. Ecology, 104(85), e4114e4031. https://doi.org/10.1002/ecy.4114
      Annual biomass spatial data for southern California (2001–2021): Above- and belowground, standing dead, and litter
    • Long-term monitoring of Mount St. Helens micrometeorology: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.40313950
      • Schrader‐PattonChe‐Castaldo, C. C., Underwood& Crisafulli, E. C., & Sorenson, Q. M. (2023). Annual biomass spatial data for southern California (2001–2021): Above‐ and belowground, standing dead, and litterLong‐term monitoring of Mount St. Helens micrometeorology. Ecology, 104(53), e4031e3950. https://doi.org/10.1002/ecy.4031
      Long-term monitoring of Mount St. Helens micrometeorology:

    Spatial Data

    Spatial Data

    Environmental Datasets

    Land cover

    Environmental Datasets

    Land cover

    Other Collections



Section


Column
width33%




Column
width33%

Do you have questions, feedback or need help?

Contact our Helpdesk for direct support.


Column
width33%