- Created by Juliane Röder, last modified on Sep 16, 2025
You are viewing an old version of this page. View the current version.
Compare with Current View Page History
« Previous Version 6 Next »
Training data
What are training data sets: https://zenodo.org/records/13805722
Here, we are refering to data sets used in tutorials on research data management, as demo data set in tools or methods, or as examples for challenges in data handling. This definition does not cover data sets used to train AI applications.
Training data sets help to illustrate all stages of the data life cycle (DLC), e.g.
- Metadata standards to describe and structure (newly collected) data
- (Reproducible) processing of data
- (Reproducible) data analysis
- Workflows to archive, share and publish data for personal and/or public re-use
Figure 1: Data life cycle. Source: RDMkit: The ELIXIR Research Data Management toolkit for Life Sciences URL: https://rdmkit.elixir-europe.org
Biological Data
Natur conservation
- DiverReef: A global database of the behavior of recreational divers and their interactions with reefs over 20 years: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4519
- Global Roadkill Data: a dataset on terrestrial vertebrate mortality caused by collision with vehicles: https://www.nature.com/articles/s41597-024-04207-x
- SNAPSHOT USA 2019–2023: The First Five Years of Data From a Coordinated Camera Trap Survey of the United States: https://onlinelibrary.wiley.com/doi/10.1111/geb.13941
- CamTrapAsia: A dataset of tropical forest vertebrate communities from 239 camera trapping studies: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4299
- The Breeding Bird Survey of the United Kingdom: https://onlinelibrary.wiley.com/doi/10.1111/geb.13943
- Integrated evidence-based extent of occurrence for North American bison (Bison bison) since 1500 CE and before: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3864
Biological data sets
- Palmer Penguins (trait data): https://allisonhorst.github.io/palmerpenguins/articles/intro.html
- Hawks: contained in the
Stat2DataR package (https://cran.r-project.org/web/packages/Stat2Data/index.html) - Count data of harbour seal counts in the Elbe Estuary: https://doi.pangaea.de/10.1594/PANGAEA.907670
- Small mammals from the Caatinga: A dataset for the Brazilian semiarid biome: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3879
- A bird species occurrence dataset from passive audio recordings across dense urban areas in Gothenburg, Sweden: https://www.nature.com/articles/s41597-025-05481-z
- Long-term observation of the egg and chick size in the nests of Larus ichthyaetus in Lake Chany, Russia: https://www.nature.com/articles/s41597-022-01454-8
Forestry
- Relative location and diameter of a full tree inventory on 8 studyplots: https://doi.pangaea.de/10.1594/PANGAEA.932023
- DFG research training group RTG2300: Enrichment of European beech forests with conifers
- The study plots, where the trees were recorded: https://doi.pangaea.de/10.1594/PANGAEA.923125
- Abundance and taxonomic data of forest arthropods, collected on the plots from above: https://doi.pangaea.de/10.1594/PANGAEA.923125
- Tree inventory data from permanent plots in French forest reserves: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4324
Genetics
- Environmental DNA analysis of cephalopods off the Azores: https://doi.pangaea.de/10.1594/PANGAEA.926840
Taxnomy, Traits
- OpenRefine Training Dataset based on a subset of the BGBM Herbarium: https://zenodo.org/records/14918375. Some ideas howto work with the data: https://zenodo.org/records/14732682.
- Ecological traits for 1374 arthropod species collected in a German grassland: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.70077
- Jena Experiment
- Images of 175 individual animals of five distinct taxonomic groups: camels, penguins, goats, tortoises and toads: https://doi.pangaea.de/10.1594/PANGAEA.967637
Animal Tracking
- HomeRange: A global database of mammalian home ranges: https://onlinelibrary.wiley.com/doi/full/10.1111/geb.13625
- The body size and temperature dependence of organismal locomotion: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3114
Time Series
- Nutrient and stoichiometric time series measurements of decomposing coarse detritus in freshwaters: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4114
- Annual biomass spatial data for southern California (2001–2021): Above- and belowground, standing dead, and litter: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4031
- Long-term monitoring of Mount St. Helens micrometeorology: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3950
Spatial Data
- https://data.fdz.ioer.de/dataset.xhtml?persistentId=doi:10.71830/6ILS40 with a tutorial https://training.fdz.ioer.info/intro.html#.
Environemntal Data
Land cover
- EGLC: Ensemble Global Land Cover Reference Dataset (2000-2022): https://zenodo.org/records/15594682
Other Collections
- A collection of vegetation and zoological data sets: https://www.davidzeleny.net/anadat-r/doku.php/en:data
- The Humanitarian Data Exchange: https://data.humdata.org/
- No labels