Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Panel
borderColorblack
bgColorwhite

Looking for data sets to be used in teaching and RDM training? Check out the list below!

Questions regarding other training material? Contact our Helpdesk!

Info
This is a list of well documented training datasets covering different data types and different aspects of research data management for use in research data management training. 

What are training datasets

Training datasets are essential for the effective teaching and training of young researchers. They form the basis for teaching data skills and analysis methods. Here, we are refering to datasets used in tutorials on research data management, as demo data set in tools or methods, or as examples for challenges in data handling. This definition does not cover datasets used to train AI applications.

To be labelled as a training dataset they have to:

  • be FAIR (Findable, Accessible, Interoperable, Reusable).
  • be freely available, with an appropriate license and open data format.
  • be of reasonable size.
  • be citable.
  • enable easy-to-understand but interesting questions to be addressed.
  • be sufficiently documented.
  • be either “perfect” or datasets with didactic errors.

For an overview, check the Poster on What are training datasets in the context of NFDI4Biodiversity (in German): Signer, J., Schlägel, U., Tschink, D., & Röder, J. (2024). Trainingsdatensätze. Zenodo. https://doi.org/10.5281/zenodo.13805722

Training datasets help to illustrate all stages of the data life cycle (DLC), e.g.

  • Metadata standards to describe and structure (newly collected) data
  • (Reproducible) processing of data
  • (Reproducible) data analysis
  • Workflows to archive, share and publish data for personal and/or public re-use
















Figure 1: Data life cycle, CC BY 4.0. Source: RDMkit: The ELIXIR Research Data Management toolkit for Life Sciences URL:   https://rdmkit.elixir-europe.org

Anchor
Datasets
Datasets
Biological Datasets

Natur conservation

  • DiverReef: A global database of the behavior of recreational divers and their interactions with reefs over 20 years: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4519
    • Giglio, V. J., Adelir‐Alves, J., Balzaretti Merino, N., Bravo‐Olivas, M. L., Camp, E. F., Casoli, E., Chávez‐Dagostino, R. M., Ferretti, E., Fraser, D., Grillo, A. C., Jiménez‐Guiérrez, S., Leite, K. L., Lucrezi, S., Luiz, O. J., Luna‐Pérez, B., McBride, J., Milanese, M., Moity, N., Pinheiro, J. V., … Ferreira, C. E. L. (2025). DiverReef: A global database of the behavior of recreational divers and their interactions with reefs over 20 years. Ecology, 106(2), e4519. https://doi.org/10.1002/ecy.4519
      Talk
      idtalk-2158
  • Global Roadkill Data: a dataset on terrestrial vertebrate mortality caused by collision with vehicles: https://www.nature.com/articles/s41597-024-04207-x
    • Grilo, C., Neves, T., Bates, J., Le Roux, A., Medrano-Vizcaíno, P., Quaranta, M., Silva, I., Soanes, K., Wang, Y., Data Collection Consortium, Abate, S. D., D’ Abra, F., Cedeño, S. A., De Alencar, P. R., De Almeida, M. F. P., Alves, M. H., Alves, P., De Assis, A. A., Ament, R., … Guinard, E. (2025). Global Roadkill Data: A dataset on terrestrial vertebrate mortality caused by collision with vehicles. Scientific Data, 12(1), 505. https://doi.org/10.1038/s41597-024-04207-x
  • SNAPSHOT USA 2019–2023: The First Five Years of Data From a Coordinated Camera Trap Survey of the United States: https://onlinelibrary.wiley.com/doi/10.1111/geb.13941
    • Rooney, B., Kays, R., Cove, M. V., Jensen, A., Goldstein, B. R., Pate, C., Castiblanco, P., Abell, M. E., Adley, J., Agenbroad, B., Ahlers, A. A., Alexander, P. D., Allen, D., Allen, M. L., Alston, J. M., Alyetama, M., Anderson, T. L., Andrade, R., Anhalt‐Depies, C., … McShea, W. J. (2025). SNAPSHOT USA 2019–2023: The First Five Years of Data From a Coordinated Camera Trap Survey of the United States. Global Ecology and Biogeography, 34(1), e13941. https://doi.org/10.1111/geb.13941
  • CamTrapAsia: A dataset of tropical forest vertebrate communities from 239 camera trapping studies: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4299
    • Mendes, C. P., Albert, W. R., Amir, Z., Ancrenaz, M., Ash, E., Azhar, B., Bernard, H., Brodie, J., Bruce, T., Carr, E., Clements, G. R., Davies, G., Deere, N. J., Dinata, Y., Donnelly, C. A., Duangchantrasiri, S., Fredriksson, G., Goossens, B., Granados, A., … Luskin, M. S. (2024). CamTrapAsia: A dataset of tropical forest vertebrate communities from 239 camera trapping studies. Ecology, 105(6), e4299. https://doi.org/10.1002/ecy.4299
  • The Breeding Bird Survey of the United Kingdom: https://onlinelibrary.wiley.com/doi/10.1111/geb.13943
    • Massimino, D., Baillie, S. R., Balmer, D. E., Bashford, R. I., Gregory, R. D., Harris, S. J., Heywood, J. J. N., Kelly, L. A., Noble, D. G., Pearce‐Higgins, J. W., Raven, M. J., Risely, K., Woodcock, P., Wotton, S. R., & Gillings, S. (2025). The Breeding Bird Survey of the United Kingdom. Global Ecology and Biogeography, 34(1), e13943. https://doi.org/10.1111/geb.13943
  • Integrated evidence-based extent of occurrence for North American bison (Bison bison) since 1500 CE and before: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3864
    • Martin, J. M., Short, R. A., Plumb, G. E., Markewicz, L., Van Vuren, D. H., Wehus‐Tow, B., Otárola‐Castillo, E., & Hill, M. E. (2023). Integrated evidence‐based extent of occurrence for North American bison (Bison bison) since 1500 1500 CE and before. Ecology, 104(1), e3864. https://doi.org/10.1002/ecy.3864

Biology

  • Palmer Penguins (trait data): https://allisonhorst.github.io/palmerpenguins/articles/intro.htmlHawks: contained in the Stat2DataR package (
    • Data originally published in:

      • Gorman KB, Williams TD, Fraser WR (2014). Ecological sexual dimorphism and environmental variability within a community of Antarctic penguins (genus Pygoscelis). PLoS ONE 9(3):e90081. https://
    cran.r-projectweb/packages/Stat2Data/index.html)
    • Individual datasets: Individual data can be accessed directly via the Environmental Data Initiative:

      • Palmer Station Antarctica LTER and K. Gorman, 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Adélie penguins (Pygoscelis adeliae) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative

    • Ann Cannon, George Cobb, Bradley Hartlaub, Julie Legler, Robin Lock, Thomas Moore, Allan Rossman, Jeffrey Witmer. (2013). Stat2Data: Datasets for Stat2 (p. 2.0.0) [Dataset]32614/CRAN.package.Stat2Data
    Count data of harbour seal counts in the Elbe Estuary:
      • 6073/pasta/98b16d7d563f265cb52372c8ca99e60f 

      • Palmer Station Antarctica LTER and K. Gorman, 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Gentoo penguin (Pygoscelis papua) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative. https://doi.

    pangaea.de1594/PANGAEA.907670
      • 6073/pasta/7fca67fb28d56ee2ffa3d9370ebda689 

      • Palmer Station Antarctica LTER and K. Gorman, 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Chinstrap penguin (Pygoscelis antarcticus) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 6. Environmental Data Initiative

    • Bundesanstalt Für Gewässerkunde. (2019). Harbour seal counts in the Elbe Estuary, Germany, between Wedel and Cuxhaven in 2018/2019 (p. 451 data points) [Text/tab-separated-values]. PANGAEA1594/PANGAEA.907670
  • Hawks: contained in the Stat2DataR package (Small mammals from the Caatinga: A dataset for the Brazilian semiarid biome: https://esajournalscran.onlinelibrary.wiley.com/doi/10.1002/ecy.3879
    • Da Costa‐Pinto, A. L., Bovendorp, R. S., Bocchiglieri, A., Caccavo, A., Delciellos, A. C., Malhado, A. C., De Almeida, A. K. R., Braga, C., Loretto, D., Câmara, E. M. V. C., Menezes, F. H., Guilhon, G., Paise, G., Sobral, G., Varjão, I. C. G., Ferreira, J. V. A., Da Silva Oliveira, L., Geise, L., Pereira, L. C. M., … Ladle, R. J. (2023). Small mammals from the Caatinga: A dataset for the Brazilian semiarid biome. Ecology, 104(1), e3879. https://doi.org/10.1002/ecy.3879
  • A bird species occurrence dataset from passive audio recordings across dense urban areas in Gothenburg, Sweden: https://www.nature.com/articles/s41597-025-05481-z
  • Long-term observation of the egg and chick size in the nests of Larus ichthyaetus in Lake Chany, Russia: https://www.nature.com/articles/s41597-022-01454-8
    • Yurlov, A. K., Yurlova, N. I., Garyushkina, M. Yu., Selivanova, M. A., & Doi, H. (2022). Long-term observation of the egg and chick size in the nests of Larus ichthyaetus in Lake Chany, Russia. Scientific Data, 9(1), 372. https://doi.org/10.1038/s41597-022-01454-8

Forestry

Forestry

  • DFG research training group RTG2300: Enrichment of European beech forests with conifers
    • Relative location and diameter of a full tree inventory on 8 studyplots: https://doi.pangaea.de/10.1594/PANGAEA.932023
    • The study plots, where the trees were recorded:
      • Ammer, C., Annighöfer, P., Balkenhol, N., Hertel, D., Leuschner, C., Polle, A., Lamersdorf, N., Scheu, S., & Glatthorn, J. (2020). RTG 2300—Study design, location, topography and climatic conditions of research plots in 2020 (p. 470 data points) [Text/tab-separated-values]. PANGAEA. https://doi.org/10.1594/PANGAEA.923125
    • Abundance and taxonomic data of forest arthropods, collected on the plots from above: https://doi.org/10.1594/PANGAEA.949484
      • Matevski, D., & Kriegel, P. (2022). Abundance and taxonomic data of arthropods collected with pitfall traps from temperate forest stands from Lower Saxony, Germany in 2019 (p. 2 datasets) [Application/zip]. PANGAEA. https://doi.org/10.1594/PANGAEA.949484

  • Tree inventory data from permanent plots in French forest reserves: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4324
    • Cateau, E., Debaive, N., Drapier, N., Chantreau, F., Gilg, O., Laroche, F., Morin, X., Demets, V., Pimenta, R., Thompson, L., & Paillet, Y. (2024). Tree inventory data from permanent plots in F rench forest reserves. Ecology, 105(7), e4324. https://doi.org/10.1002/ecy.4324

Genetics

  • Environmental DNA analysis of cephalopods off the Azores: https://doi.pangaea.de/10.1594/PANGAEA.926840
    • Visser, F., Merten, V., Bayer, T., Oudejans, M. G., de Jonge, D., Puebla, O., Reusch, T. B. H., Fuss, J., & Hoving, H.-J. T. (2021). Environmental DNA analysis of cephalopods off the Azores (p. 10
  • DFG research training group RTG2300: Enrichment of European beech forests with conifers
  • Relative location and diameter of a full tree inventory on 8 studyplots: https://doi.pangaea.de/10.1594/PANGAEA.932023
  • The study plots, where the trees were recorded:Ammer, C., Annighöfer, P., Balkenhol, N., Hertel, D., Leuschner, C., Polle, A., Lamersdorf, N., Scheu, S., & Glatthorn, J. (2020). RTG 2300—Study design, location, topography and climatic conditions of research plots in 2020 (p. 470
  • 923125Abundance and taxonomic data of forest arthropods, collected on the plots from above

Taxnomy, Traits

Genetics

  • Ecological traits for 1374 arthropod species collected in a German grasslandEnvironmental DNA analysis of cephalopods off the Azores: https://doiesajournals.onlinelibrary.pangaeawiley.decom/doi/10.15941002/PANGAEA.926840ecy.70077
    •  Jena Experiment
    • Bröcher, M., Meyer, S. T., Leher, A. G., & Ebeling, A. (2025). Ecological traits for 1374 arthropod species collected in a German grassland. Ecology, 106(4), e70077
    • Visser, F., Merten, V., Bayer, T., Oudejans, M. G., de Jonge, D., Puebla, O., Reusch, T. B. H., Fuss, J., & Hoving, H.-J. T. (2021). Environmental DNA analysis of cephalopods off the Azores (p. 10 data points) [Text/tab-separated-values]. PANGAEA. https://doi.org/10.15941002/PANGAEAecy.926840

Taxnomy, Traits

  • Images of 175 individual animals of five distinct taxonomic groups: camels, penguins, goats, tortoises and toadsOpenRefine Training Dataset based on a subset of the BGBM Herbarium: https://zenodo.org/records/14918375. Some ideas howto work with the data: https://zenododoi.pangaea.orgde/records10.1594/14732682PANGAEA.967637
    • Fruhner, M., Tapken, H., & Stroetmann, E. (2024). Images of 175 individual animals of five distinct taxonomic groups: Camels, penguins, goats, tortoises and toads (ZooMix ID) (p. 3356 data points) [Text/tab-separated-values]. PANGAEA
    • Botanic Garden and Botanical Museum Berlin. (2025). OpenRefine Training Dataset based on a subset of the BGBM Herbarium (Version 1.0) [Dataset]. Botanic Garden and Botanical Museum, Berlin. https://doi.org/10.5281/ZENODO.14918375
    • Fichtmueller, D. (2025). OpenRefine—Hands On. https://doi.org/10.52811594/ZENODO.14732682PANGAEA.967637

Animal Tracking

  • HomeRange: A global database of mammalian home ranges: https://onlinelibrary.wiley.com/doi/full/10.1111/geb.13625
    • Broekman, M. J. E., Hoeks, S., Freriks, R., Langendoen, M. M., Runge, K. M., Savenco, E., Ter Harmsel, R., Huijbregts, M. A. J., & Tucker, M. A. (2023). HomeRange: A global database of mammalian home ranges. Global Ecology and Biogeography, 32(2), 198–205
    Ecological traits for 1374 arthropod species collected in a German grassland: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.70077
    •  Jena Experiment
    • Bröcher, M., Meyer, S. T., Leher, A. G., & Ebeling, A. (2025). Ecological traits for 1374 arthropod species collected in a German grassland. Ecology, 106(4), e70077. https://doi.org/10.10021111/ecygeb.7007713625
  • The body size and temperature dependence of organismal locomotionImages of 175 individual animals of five distinct taxonomic groups: camels, penguins, goats, tortoises and toads: https://doiesajournals.onlinelibrary.pangaeawiley.decom/doi/10.15941002/PANGAEAecy.9676373114
    • FruhnerCloyed, M., Tapken, HC. S., & StroetmannDell, EA. I. (20242020). Images of 175 individual animals of five distinct taxonomic groups: Camels, penguins, goats, tortoises and toads (ZooMix ID) (p. 3356 data points) [Text/tab-separated-values]. PANGAEAThe body size and temperature dependence of organismal locomotion. Ecology, 101(10), e03114. https://doi.org/10.15941002/PANGAEAecy.9676373114

Animal Tracking

Time Series

  • Nutrient and stoichiometric time series measurements of decomposing coarse detritus in freshwatersHomeRange: A global database of mammalian home ranges: https://esajournals.onlinelibrary.wiley.com/doi/full/10.11111002/gebecy.13625
    • Broekman, M. J. E., Hoeks, S., Freriks, R., Langendoen, M. M., Runge, K. M., Savenco, E., Ter Harmsel, R., Huijbregts, M. A. J., & Tucker, M. A. (2023). HomeRange: A global database of mammalian home ranges. Global Ecology and Biogeography, 32(2), 198–205. https://doi.org/10.1111/geb.13625
    4114
    • Robbins, C. J., Norman, B. C., Halvorson, H. M., Manning, D. W. P., Bastias, E., Biasi, C., Dodd, A. K., Eckert, R. A., Gossiaux, A., Jabiol, J., Mehring, A. S., & Pastor, A. (2023). Nutrient and stoichiometric time series measurements of decomposing coarse detritus in freshwaters. Ecology, 104(8), e4114
    The body size and temperature dependence of organismal locomotion: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3114
    • Cloyed, C. S., & Dell, A. I. (2020). The body size and temperature dependence of organismal locomotion. Ecology, 101(10), e03114. https://doi.org/10.1002/ecy.3114

Time Series

  • Annual biomass spatial data for southern California (2001–2021): Above- and belowground, standing dead, and litterNutrient and stoichiometric time series measurements of decomposing coarse detritus in freshwaters: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.41144031
    • Schrader‐Patton, C. C., Underwood, E. C., & Sorenson, Q. M. (2023). Annual biomass spatial data for southern California (2001–2021): Above‐ and belowground, standing dead, and litter
    • Robbins, C. J., Norman, B. C., Halvorson, H. M., Manning, D. W. P., Bastias, E., Biasi, C., Dodd, A. K., Eckert, R. A., Gossiaux, A., Jabiol, J., Mehring, A. S., & Pastor, A. (2023). Nutrient and stoichiometric time series measurements of decomposing coarse detritus in freshwaters. Ecology, 104(85), e4114e4031. https://doi.org/10.1002/ecy.4114
    Annual biomass spatial data for southern California (2001–2021): Above- and belowground, standing dead, and litter
  • Long-term monitoring of Mount St. Helens micrometeorology: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.40313950
    • Schrader‐PattonChe‐Castaldo, C. C., Underwood& Crisafulli, E. C., & Sorenson, Q. M. (2023). Annual biomass spatial data for southern California (2001–2021): Above‐ and belowground, standing dead, and litter. Long‐term monitoring of Mount St. Helens micrometeorology. Ecology, 104(53), e4031e3950. https://doi.org/10.1002/ecy.4031
    Long-term monitoring of Mount St. Helens micrometeorology:

Spatial Data

Spatial Data

Environmental Datasets

Land cover

Environmental Datasets

Land cover

Other Collections



Section


Column
width33%




Column
width33%

Do you have questions, feedback or need help?

Contact our Helpdesk for direct support.


Column
width33%