Overview of available tools and services for early data mobilization
This article introduces tools and resources that help scientists work with biodiversity data more quickly and effectively, even before the data acquisition is fully complete. The presented tools make it easier to organize, check, and share data so that it can be reused by others in research and beyond. NFDI4Biodiversity supports this process by offering training, technical help, and infrastructure that ensures data remains reliable, accessible, and useful in the long term.
In recent years, the availability of biodiversity data has improved considerably, which has led to an increasing demand for methods and tools for Early Data Mobilization. This term refers to the rapid and efficient processing and analysis of data to gain valuable knowledge and insights before the data is fully available. Also, open educational resources (OER) are available to further early career scientists’ ability to work with data or enable PIs and Postdocs to teach good data management practices. NFDI4Biodiversity, a consortium in the German National Research Data Infrastructure, aims to further improve the availability of such tools for early data mobilization and analysis to facilitate the creation of FAIR Data in biodiversity research and related fields.
The focus of the tools and services collected in this article lies on Early Data Mobilization. These tools should ultimately prepare the data which are collected, created, and/or processed to seamlessly be integrated into the NFDI4Biodiversity’s Research Data Commons (RDC) infrastructure.
The RDC is conceptualised as an expandable, cloud-based research infrastructure that provides scientists, data providers, and data consumers with powerful tools for creating FAIR data products and facilitates the exchange of data and services in a collaborative manner, both within the German National Research Data Infrastructure (NFDI) and beyond. It is currently in a midstage development state and available for pilot implementations of software tools and data resources.
Therefore, NFDI4Biodiversity supports data producers, i.e. researchers and consortia participant facilities that provides data or metadata to be distributed to data users by
- Creating awareness for existing tools which are better suited towards the above goals compared to spreadsheet software as well as tools that enhance spreadsheet software by, e.g. including easier metadata annotation etc.
- Providing custom training and facilitating IT-support in tool development, deployment and use via the NFDI4Biodiversity Helpdesk
- Enabling direct, problem-specific tool development
The following community demands/needs regarding data mobilization will be addressed:
- Gathering, structuring and managing raw data
- Workflow realization
- Quality control and plausibility assurance
- Analysis and modelling of data
- Virtual research environments for networking and sharing
- Data publication
- Long-term archiving
This article serves to provide an overview of current tools which fulfil the above criteria and are as generic as possible yet as specific as necessary, to help a wide variety of users from different disciplines and remain maintainable and interoperable. In addition to the tools, the article provides an overview of appropriate eductational resources. All tools and resources are mapped to the NFDI4Biodiversity personas who represent the target groups for the service.
During its initial five-year funding period, NFDI4Biodiversity successfully engaged 50 partner institutions, including scientific organizations, museums, natural history societies, and government offices. This collaboration aimed to facilitate the creation and adaptation of research tools for implementation in their respective projects. The following is an overview of tools considered to be especially useful in the early phase of data mobilization within a project’s life cycle. For services supported by NFDI4Biodiversity, i.e. those provided by the consortium partners, the NFDI4Biodiversity Helpdesk is always a first point of contact.
| Service | Short Description | Personas/Target Groups | Requirements | Point of contact | License | Link |
|---|---|---|---|---|---|---|
| Aruna | Aruna is a FAIR, open-source data storage and management platform for scientific data and metadata. The decentralized data storage system provides a global catalog and authorization functionality to be used in conjuction with data proxies. Data proxies are either hosted locally or provided by data storage providers partnering with the service. Via the proxies, Aruna is able to incorporate the data into it's catalog, enforce domain and location specific rules and policies, encrypt, compress, anonymize or control end-user access. The main focus of Aruna is to enable a seemless, geo-reduntant data orchestration, to integrate existing data, metadata and workflows into it's structures, sovereignly manage access and to then allow the users to list, search or view all available data. Aruna promises to improve collaboration, reusability, FAIR principle compliance, scalability, and security compared to localized domain specific data silos. This tool is provided in cooperation with the Justus-Liebig-University Giessen, NFDI4Microbiota and the GAIA-X connector project FAIR Data Spaces. | Data manager Doro | Aruna requirements | https://dev.aruna-engine.org/ | ||
| BEXIS2 | BEXIS2 is an open source web application used for research data management within medium to large projects. Supported are projects with multiple subprojects and up to hundreds of users. Originally, BEXIS2 was developed for ecological data and has since been updated to support data from a variety of scientific disciplines, e.g. biodiversity, environmental and the humanities. Datasets within BEXIS2 consist of metadata and primary data, thereby facilitating FAIR research data and providing a means to not only manage observational data. BEXIS2 does not provide a metadata standard directly, but supports the use of common standards and allows users to integrate individual or project specific metadata schemas. To get to know the service, a training environment is available without data storage capacities. Interested institutions will either be provided with an instance of BEXIS2 hosted and maintained by the partners of NFDI4Biodiversity or can host their own instances with support from NFDI4Biodiversity’s developer team regarding installation and support. As a tool for early data mobilization, BEXIS2 offers a system to access, reuse and store datasets. This tool is provided in cooperation with the Friedrich Schiller University Jena. | Post-doc Paul Data manager Doro | BEXIS2 requirements | bexis2-support (at) uni-jena (dot) de | GNU-LGPL-3.0 | https://bexis2.uni-jena.de/ |
| Diversity Workbench | The Diversity Workbench (DWB) itself is a set of independent but interconnected software applications to build advanced virtual working environments for bio- and geodiversity information. It is designed for researchers and scientific database curators by the State Natural Science Collections of Bavaria (SNSB). The tools serve for early data mobilization, offering a comprehensive platform for research data management from the project's initiation until long-term data management. It includes tools for administrating data and metadata, describing digital objects and variables, outlining responsibilities, managing taxonomies, terminologies and scientific references, for initiating data publication and more. The DWB Virtual Training Environment is provided in cooperation with the Gesellschaft für wissenschaftliche Datenverarbeitung mbH (GWDG), the SNSB and the German Federation for Biological Data (GFBio). The cooperation provides free remote access to training environment instances and SQL database software including test data from real research projects, terminologies and taxonomies, as well as the possibility to exchange data with external resources. | Post-doc Paul Data manager Doro | DWB requirements | GNU General Public License 3.0 – GPLv3 | https://www.diversityworkbench.de/manual/dwb/ https://doi.org/10.5281/zenodo.12802893 | |
| BioME | BioMe is an open-source project that provides a universal, modular application framework and infrastructure for biodiversity research and monitoring, with a focus on data integration and synthesis. This toolkit is easily accessible via a web browser or mobile app, enabling easy access for data collection in the field as well as data curation in the office. As a tool for early data mobilization, BioMe offers an open and FAIR approach to data collection by incorporating community metadata standards from the outset, along with the ability to modularly utilise tools relevant to each phase of the data life cycle. Its usability, not unlike the DWB’s, goes beyond the early project stages. This tool is provided in cooperation with the Helmholtz Centre for Environmental Research. | Post-doc Paul | alexander (dot) harpke (at) ufz (dot) de kristina (dot) haase (at) ufz (dot) de | https://www.ufz.de/biome/index.php?en=49810 | ||
| Rightfield | RightField provides utility to work directly with tabular data in Microsoft Excel. Use cases are to annotate with ontology terms, to add structured metadata and to improve data interoperability. As a tool for early data mobilization it provides a means to enrich data with metadata after the initial data collection, to embed ontology terms within cells. By incorporating consistent terminologies and controlled vocabularies as templates to be used in Excel, data quality will be ensured even before the actual data collection. RightField helps to bridge the gap between the researcher’s tried and tested methods and workflows of data collection and the need for standardization through ontologies to create FAIR data and keep data reusable throughout their life cycle. This tool is provided in cooperation with the Heidelberg Institute for Theoretical Studies gGmbH. | Post-doc Paul | Rightfield requirements | Mailing List | BSD | https://rightfield.org.uk/ |
NFDI4Biodiversity already provides a wide variety of useful tools for early project data mobilization, many of which go beyond the initial stages of the data life cycle. Data provision can be handled by BEXIS2 and RightField opens the door for researchers, who have not yet dismissed the spreadsheet but want to implement the FAIR principles into their work. The Biodiversity Workbench and BioMe on the other hand provide powerful frameworks to manage data, from their collection until the end of the project, providing necessary functionality at the early stages that will be relevant when data will be published or shared eventually. However, these tools are not yet incorporated into the aforementioned Research Data Commons infrastructure.
Services for Early Data Mobilization from related NFDI consortia
| Service | Short Description | Personas/Target Groups | Requirements | Point of contact | Lizens | Link |
|---|---|---|---|---|---|---|
| ISA Wizard | The ISA Wizard is a tool developed within use cases of NFDI4Biodiversity and FAIRagro, and partly in collaboration with DataPlant. Its primary purpose is to facilitate the creation of datasets structured according to the ISA (Investigation, Study, Assay) data model, which is widely recognized for organizing life science experiments. The ISA Wizard achieves this through an intuitive questionnaire interface, complemented by file upload functionality, allowing users to systematically collect and annotate metadata. One of the key strengths of the ISA Wizard is its configurable questionnaire system, which ensures that the tool remains domain agnostic. This flexibility allows it to be used across a variety of scientific disciplines without modification to the core software. Domain-specific requirements, such as the Minimum Information About a Plant Phenotyping Experiment (MIAPPE) standard for plant phenotyping, can be seamlessly integrated by adjusting the questionnaire configuration. This approach enables tailored data mobilization workflows while maintaining interoperability and standardization. Upon completion, users can export their curated datasets either in the ARC format, also including direct integration into the PLANTdataHUB platform, or as ISA JSON, supporting further data sharing and reuse. | Post-doc Paul | internet access | GitHub | MIT License | Repository: https://github.com/IPK-BIT/isa-wizard Life Deployment: https://ipk-bit.github.io/isa-wizard/ |
| PLANTDataHub and ARCitect | The NFDI consortium DataPlant focuses on establishing a sustainable research data management for the German plant research community by providing digital and in-person services. Based on the Annotated Research Context (ARC) framework, DataPlant provides a central reference implementation called PLANTDataHub. PLANTDataHub can be used collaboratively for data storage, management and sharing. It can handle large datasets and is developed with research data management in mind, i.e. increasing FAIRness of the data by providing structured and transparent data and metadata storage, incorporating data analysis within the platform and conforming to international Research Data Management (RDM) standards concerning the digital objects (e.g. RO-Crate FAIR digital objects). Additionally ARC is an open-source and community-driven initiative keeping the framework adaptable and relevant within the community. In combination with the ARCitect to create and manage ARCs and then synchronize them with the database, DataPlant provides a powerful tool for early data mobilization and beyond. PLANTDataHub can be used as a data provider and as FAIR data storage. | Post-doc Paul | GitHub | https://www.nfdi4plants.org/arc-data-hub/ | ||
| NOMAD Lab | The NFDI consortium FAIRmat specializes in RDM for material sciences. One of their services is the NOMAD Lab which combines the functionalities of a data repository and archive, an encyclopedia and an analytics toolkit divided within their service subcategories NOMAD Oasis and the public NOMAD services. As a tool for early data mobilization the repository can be used as a data provider, the encyclopedia as a tool for data availability and exploration and the analytics toolkit, which has evolved into a artificial intelligence toolkit, can be useful in accessing the available data in NOMAD. | Post-doc Paul | internet access | Discord | Not applicable | https://nomad-lab.eu/nomad-lab/ |
| ClOWM | The cloud based workflow manager is a tool hosted by NFDI4Microbiota to integrate scientific workflows (curated, written in Nextflow DSL) with data storage, hpc components and a user friendly interface as an open-source software. The provision of standardized workflows is beneficial along the whole data life cycle and can be helpful with early data mobilization, by creating awareness for the available tools and standard operating procedures. | Post-doc Paul | internet access | support@clowm.bi.denbi.de | Not applicable | https://clowm.bi.denbi.de/ |
| OMERO | OMERO is a cross-platform client-server software platform for visualising, managing and annotating scientific image data. This includes the archival of images and the export to a number of formats. OMERO provides rights and role management. As a tool for early data mobilization OMERO takes on the role of an ELN specificially for image data and accompanies the scientists from the data collection to the data publication, enhancing FAIR data requirements and reproducibility. | Post-doc Paul | https://omero.readthedocs.io/en/stable/sysadmins/index.html | GitHub | GNU GPL | https://www.openmicroscopy.org/omero/scientists/ |
The overall goal of the NFDI is to increase awareness for the importance of research data and research data management and to establish infrastructure, services and tools to ultimately create workflows and automated processes for researchers within and beyond different scientific disciplines. Nomad and the PLANTDataHub work adjacent to the idea of NFDI4Biodiversity’s Research Data Commons, by providing infrastructure that combines several aspects from different phases of the data life cycle. NOMAD and the PLANTDataHub are already usable for scientists from the respective disciplines. Experiences from the development, initialisation and subsequent community acceptance of these services can be useful in establishing and further developing the Research Data Commons platform within the discipline of biodiversity science. ELNs are a tool to overhaul the way data is traditionally collected and processed. This entails the development of suitable software, the allocation of storage space on (locally) hosted servers for the collected data that are available long term, secure and maintainable, but also the availability of input devices which, especially in biodiversity sciences, need to be able to withstand fieldwork and laboratory conditions. Currency ELNs remain individual solutions for local institutions. These are just a few highlights of services other consortia provide, mainly for their own community. Not mentioned here are knowledge bases many consortia provide to explain and guide users regarding research data management. The project base4NFDI tries to consolidate a lot of services from the different consortia into single access points but does not yet focus on tools concerning early data mobilization.
State Initiatives, forschungsdaten.info, and the DINI/nestor AG Forschungsdaten
Of the 16 states of Germany, only two do not have a dedicated state initiative working on research data management, support and infrastructure. Only 6 of those offer an ELN, often in direct cooperation with a local university where the instance is hosted. Additionally a variety of services, ranging from certification courses, Research Data Management (RDM) consultations, events, Repositories and many more are provided. The visibility of these state initiatives also varies.
forschungsdaten.info is a portal for RDM related knowledge. It bundles information about the state initiatives, the NFDI consortia, international infrastructures, their tools and services, as well as some basic introductory information regarding RDM. The website is maintained by a national team of RDM specialists.
The UAG Schulung und Fortbildung of the DINI/nestor AG Forschungsdaten provides a comprehensive workshop concept (meant to train RDM trainers), which is modular, expandable and can easily be adapted to teach researchers, whose main focus will not be on teaching RDM, but on using RDM within their research.
Educational resources and miscellaneous services
NFDI4Biodiversity
In support of students and early-career scientists who want to learn the basics of RDM we provide an open educational resource on the topic research data management (Selbstlerneinheit), containing valuable information on all stages of the data life cycle. This is the easiest entry point into RDM, as it will provide a basic understanding and be a guide for further reading. Additionally, NFDI4Biodiversity supports teachers and students of RDM with video series published on Youtube, with topics ranging from Basics of RDM, services provided and tutorials on how to use them, as well as the handling of data in general. The consortium provides GitHub Repositories usable as working environments for Jupyter, R and data validation in PANGAEA, and a Zenodo-Community where slides and other material is published for reuse.
NFDI4Biodiversity provides custom trainings for biodiversity related institutions, working groups and projects on the following topics:
- Creating a Data Management Plan (DMP)
- Data annotation
- Data publication
- Data archiving
- Data submission via Data Submission Service
- Metadata standards
- Legal aspects in research data management (see also the Podcast "Rechtsdschungel")
- Naming conventions and taxonomic harmonization
- Legal regulations for handling biodiversity and environmental data
- Introductions to working with our tools and services
- Fundamentals of research data management and data literacy
- General introduction to NFDI4Biodiversity and the National Research Data Infrastructure (NFDI)
We also provide annual Seasonal Schools for PhD students, researchers, data collectors or data center staff members, offering a broad range of basic and advanced knowledge in the management of biodiversity, ecology and environmental data. The intensive courses include input presentations from experts and hands-on exercises and are tailored to the attendees prior expertise. The Seasonal Schools are designed to create an open, collaborative environment that invites networking and knowledge sharing among participants and can therefore facilitate early data mobilization by furthering the participants knowledge on available tools and services, workflows and best practices.
- A template has been published on Zenodo and can help to set up similar Seasonal Schools:
- Röder, J., Fischer, M., Tschink, D., & Brand, O. (2023). Template for the NFDI4Biodiversity & GfÖ Winter School. Zenodo. https://doi.org/10.5281/zenodo.8272156
- For basic RDM skills, a self-learning course is available as LiaScript, PDF and in an Ilias version at the University of Marburg. It can also be implemented in other learning environments:
- Fischer, M., Röder, J., Signer, J., Tschink, D., Weibulat, T., & Brand, O. (2023, Dezember 14). NFDI4Biodiversity Self-Study Unit - Research Data Management for Biodiversity Data. Zenodo. https://doi.org/10.5281/zenodo.10377868
Research Data Management for Biodiversity Data: https://ilias.uni-marburg.de/ilias.php?baseClass=ilrepositorygui&ref_id=3219917
- LiaScript: https://liascript.github.io/course/?https://raw.githubusercontent.com/NFDI4Biodiversity/nfdi4biodiversity-sle/main/README.md#1
More educational resources as well as information about events can be found on the NFDI4Biodiversity website and its Knowledge Base. If you want to stay up to date regarding training developements in the life sciences, consider subscribing to the trainings4lifescience mailinglist.
Other relevant NFDI Consortia and adjacent Infrastructures
| Service | Description | Persona | Link |
|---|---|---|---|
| NFDI4Microbiota Website | Website | Post-doc Paul | https://nfdi4microbiota.de/ |
| NFDI4Bioimage Website | Website | Post-doc Paul | https://nfdi4bioimage.de/home/ |
| NFDI4Earth OneStop4All | central web-based access point to all NFDI4Earth resources and services | Post-doc Paul | https://www.nfdi4earth.de/2facilitate/onestop4all |
| Materialsammlung | Database providing teaching material for RDM | Data Manager Doro | https://rs.cms.hu-berlin.de/uag_fdm/pages/home.php?login=true |
DALIA | Database providing links to teaching material for RDM | Data Manager Doro | https://dalia.education/en |
FAIRagro training content | Publication of teaching materials for agrosystems research | Data Manager Doro | https://zenodo.org/records/11148701 |
GFBio DMP Service | Online Tool to guide through the process of creating a Data Management Plan | Post-doc Paul Data Manager Doro | https://dmp.gfbio.org/ |