This is a list of well documented training data sets covering different data types and different aspects of research data management for use in research data management training.
Biological Data sets
Natur conservation
- DiverReef: A global database of the behavior of recreational divers and their interactions with reefs over 20 years: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4519
- Global Roadkill Data: a dataset on terrestrial vertebrate mortality caused by collision with vehicles: https://www.nature.com/articles/s41597-024-04207-x
- SNAPSHOT USA 2019–2023: The First Five Years of Data From a Coordinated Camera Trap Survey of the United States: https://onlinelibrary.wiley.com/doi/10.1111/geb.13941
- CamTrapAsia: A dataset of tropical forest vertebrate communities from 239 camera trapping studies: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4299
- The Breeding Bird Survey of the United Kingdom: https://onlinelibrary.wiley.com/doi/10.1111/geb.13943
- Integrated evidence-based extent of occurrence for North American bison (Bison bison) since 1500 CE and before: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3864
Biology
- Palmer Penguins (trait data): https://allisonhorst.github.io/palmerpenguins/articles/intro.html
- Hawks: contained in the
Stat2DataR package (https://cran.r-project.org/web/packages/Stat2Data/index.html) - Count data of harbour seal counts in the Elbe Estuary: https://doi.pangaea.de/10.1594/PANGAEA.907670
- Small mammals from the Caatinga: A dataset for the Brazilian semiarid biome: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3879
- A bird species occurrence dataset from passive audio recordings across dense urban areas in Gothenburg, Sweden: https://www.nature.com/articles/s41597-025-05481-z
- Long-term observation of the egg and chick size in the nests of Larus ichthyaetus in Lake Chany, Russia: https://www.nature.com/articles/s41597-022-01454-8
Forestry
- Relative location and diameter of a full tree inventory on 8 studyplots: https://doi.pangaea.de/10.1594/PANGAEA.932023
- DFG research training group RTG2300: Enrichment of European beech forests with conifers
- The study plots, where the trees were recorded: https://doi.pangaea.de/10.1594/PANGAEA.923125
- Abundance and taxonomic data of forest arthropods, collected on the plots from above: https://doi.pangaea.de/10.1594/PANGAEA.923125
- Tree inventory data from permanent plots in French forest reserves: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4324
Genetics
- Environmental DNA analysis of cephalopods off the Azores: https://doi.pangaea.de/10.1594/PANGAEA.926840
Taxnomy, Traits
- OpenRefine Training Dataset based on a subset of the BGBM Herbarium: https://zenodo.org/records/14918375. Some ideas howto work with the data: https://zenodo.org/records/14732682.
- Ecological traits for 1374 arthropod species collected in a German grassland: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.70077
- Jena Experiment
- Images of 175 individual animals of five distinct taxonomic groups: camels, penguins, goats, tortoises and toads: https://doi.pangaea.de/10.1594/PANGAEA.967637
Animal Tracking
- HomeRange: A global database of mammalian home ranges: https://onlinelibrary.wiley.com/doi/full/10.1111/geb.13625
- The body size and temperature dependence of organismal locomotion: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3114
Time Series
- Nutrient and stoichiometric time series measurements of decomposing coarse detritus in freshwaters: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4114
- Annual biomass spatial data for southern California (2001–2021): Above- and belowground, standing dead, and litter: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4031
- Long-term monitoring of Mount St. Helens micrometeorology: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3950
Spatial Data
- https://data.fdz.ioer.de/dataset.xhtml?persistentId=doi:10.71830/6ILS40 with a tutorial https://training.fdz.ioer.info/intro.html#.
Environmental Data sets
Land cover
- EGLC: Ensemble Global Land Cover Reference Dataset (2000-2022): https://zenodo.org/records/15594682
Other Collections
- A collection of vegetation and zoological data sets: https://www.davidzeleny.net/anadat-r/doku.php/en:data
- The Humanitarian Data Exchange: https://data.humdata.org/