You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Next »

Overview of the availability of tools and services for early data mobilization


Status quo


In recent years, the availability of biodiversity data has improved considerably, which has led to an increasing demand for methods and tools for Early Data Mobilization. This term refers to the rapid and efficient processing and analysis of data to gain valuable knowledge and insights before the data is fully available. Additionally we will include open educational resources to further early career scientists’ ability to work with data or enable PIs and Postdocs to teach about data management. NFDI4Biodiversity aims to further improve the availability of such tools for early data mobilization and analysis to facilitate the creation of FAIR Data.

The focus lies on early data mobilization and should ultimately provide data which are collected/created /processed with tools that can seamlessly be integrated into the NFDI4Biodiversity’s Research Data Commons (RDC) platform. 

The RDC is conceptualised as an expandable, cloud-based research infrastructure that provides scientists, data providers, and data consumers with powerful tools for creating FAIR data products and facilitates the exchange of data and services in a collaborative manner, both within the German National Research Data Infrastructure (NFDI) and beyond. It is currently in a midstage development state and will be made available incrementally.

Therefore, NFDI4Biodiversity supports data providers (i.e.researchers and consortia participant facility that provides data or metadata to be distributed to data users) by

  1. Creating awareness for existing tools which are better suited towards the above goals compared to spreadsheet software as well has tools that enhance spreadsheet software by, e.g. including easier metadata annotation etc.
  2. Providing custom training and facilitating IT-support in tool development, deployment and use via the helpdesk 
  3. Enabling direct, problem-specific tool development

The following community demands/needs regarding data mobilization will be addressed:

  1. Gathering, structuring and managing raw data
  2. Workflow realization
  3. Quality control and plausibility assurance
  4. Analyzation and modelling of data
  5. Virtual research environments for networking and sharing
  6. Data publication
  7. Long-term archiving

Here, we aim at clarifying the status quo by providing an overview of existing tools which fulfil the above criteria and are as generic as possible yet as specific as necessary, to be applicable to a wide variety of users from different disciplines and remaining maintainable and interoperable. In addition to the tools, we will provide an overview of appropriate training modules.


 Services for Early Data Mobilization from NFDI4Biodiversity partners


During its initial five-year funding period, NFDI4Biodiversity successfully engaged 50 partner institutions, including scientific organizations, museums, natural history societies, and government offices. This collaboration aimed to facilitate the creation and adaptation of research tools for implementation in their respective projects. The following is an overview of tools considered to be especially useful in the early phase of data mobilization within a project’s life cycle.


Oops, it seems that you need to place a table or a macro generating a table within the Table Filter macro.

The table is being loaded. Please wait for a bit ...


ServiceShort DescriptionPersonas/Target GroupsRequirementsKommunikationskanäleLizensLink
BioME





Rightfield





NFDI4Biodiversity  already provides a wide variety of useful tools for early project data mobilization, many of which go beyond the initial stages of the data life cycle. Data provision can be handled by BEXIS2 and RightField opens the door for researchers, who have not yet dismissed the spreadsheet but want to implement the FAIR principles into their work. The Biodiversity Workbench and BioMe on the other hand provide powerful frameworks to manage data, from their collection until the end of the project, providing necessary functionality at the early stages that will be relevant when data will be published or shared eventually. However, these tools are not yet incorporated into the aforementioned RDC NFDI4Biodiversity aims to establish.

 Services for Early Data Mobilization from related NFDI consortia

ServiceShort DescriptionPersonas/Target GroupsRequirementsKommunikationskanäleLizensLink
ISAWizard

The ISA Wizard is a tool developed within use cases of NFDI4Biodiversity and FAIRagro, and partly in collaboration with DataPlant. Its primary purpose is to facilitate the creation of datasets structured according to the ISA (Investigation, Study, Assay) data model, which is widely recognized for organizing life science experiments. The ISA Wizard achieves this through an intuitive questionnaire interface, complemented by file upload functionality, allowing users to systematically collect and annotate metadata. One of the key strengths of the ISA Wizard is its configurable questionnaire system, which ensures that the tool remains domain agnostic. This flexibility allows it to be used across a variety of scientific disciplines without modification to the core software. Domain-specific requirements, such as the Minimum Information About a Plant Phenotyping Experiment (MIAPPE) standard for plant phenotyping, can be seamlessly integrated by adjusting the questionnaire configuration. This approach enables tailored data mobilization workflows while maintaining interoperability and standardization. Upon completion, users can export their curated datasets either in the ARC format, also including direct integration into the PLANTdataHUB platform, or as ISA JSON, supporting further data sharing and reuse.






PLANTDataHubThe NFDI consortium DataPlant focuses on establishing a sustainable research data management for the German plant research community by providing digital and in-person services. Based on the Annotated Research Context (ARC) framework, DataPlant provides a central reference implementation called PLANTDataHub. PLANTDataHub can be used collaboratively for data storage, management and sharing. It can handle large datasets and is developed with research data management in mind, i.e. increasing FAIRness of the data by providing structured and transparent data and metadata storage, incorporating data analysis within the platform and conforming to international Research Data Management (RDM) standards concerning the digital objects (e.g. RO-Crate FAIR digital objects). Additionally ARC is an open-source and community-driven initiative keeping the framework adaptable and relevant within the community. As a tool for early data mobilization PLANTDataHub can be used as a data provider and as FAIR data storage.




NOMAD LabThe NFDI consortium FAIRmat specializes in RDM for material sciences. One of their services is the NOMAD Lab which combines the functionalities of a data repository and archive, an encyclopedia and an analytics toolkit divided within their service subcategories NOMAD Oasis and the public NOMAD services. As a tool for early data mobilization the repository can be used as a data provider, the encyclopedia as a tool for data availability and exploration and the analytics toolkit, which has evolved into a artificial intelligence toolkit, can be useful in accessing the available data in NOMAD.




ClOWMThe cloud based workflow manager is a tool hosted by NFDI4Microbiota to integrate scientific workflows (curated, written in Nextflow DSL) with data storage, hpc components and a user friendly interface as an open-source software. The provision of standardized workflows is beneficial along the whole data life cycle and can be helpful with early data mobilization, by creating awareness for the available tools and standard operating procedures.




OMERO





The overall goal of the NFDI is to increase awareness for the importance of research data and research data management and to establish infrastructure, services and tools to ultimately create workflows and automated processes for researchers within and beyond different scientific disciplines. Nomad and the PLANTDataHub work adjacent to the idea of NFDI4Biodiversity’s Research Data Commons, by providing infrastructure that combines several aspects from different phases of the data life cycle. NOMAD and the PLANTDataHub are already usable for scientists from the respective disciplines. Experiences from the development, initialisation and subsequent community acceptance of these services can be useful in establishing and further developing the Research Data Commons platform within the discipline of biodiversity science. ELNs are a tool to overhaul the way data is traditionally collected and processed. This entails the development of suitable software, the allocation of storage space on (locally) hosted servers for the collected data that are available long term, secure and maintainable,  but also the availability of input devices which, especially in biodiversity sciences, need to be able to withstand fieldwork and laboratory conditions. Currency ELNs remain individual solutions for local institutions. These are just a few highlights of services other consortia provide, mainly for their own community. Not mentioned here are knowledge bases many consortia provide to explain and guide users regarding research data management. The project base4NFDI tries to consolidate a lot of services from the different consortia into single access points but does not yet focus on tools concerning  early data mobilization.

State Initiatives, forschungsdaten.info, and the DINI/nestor AG Forschungsdaten

Of the 16 states of Germany, only two do not have a dedicated state initiative working on research data management, support and infrastructure. Only 6 of those offer an ELN, often in direct cooperation with a local university where the instance is hosted. Additionally a variety of services, ranging from certification courses, Research Data Management (RDM) consultations, events, Repositories and many more are provided. The visibility of these state initiatives also varies.

forschungsdaten.info is a portal for RDM related knowledge. It bundles information about the state initiatives, the NFDI consortia, international infrastructures, their tools and services, as well as some basic introductory information regarding RDM. The website is maintained by a national team of RDM specialists.

The UAG Schulung und Fortbildung of the DINI/nestor AG Forschungsdaten provides a comprehensive workshop concept (meant to teach RDM trainers), which is modular, expandable and can easily be adapted to teach researchers, whose main focus will not be on teaching RDM, but on using RDM within their research.

Training Modules and miscellaneous services

NFDI4Biodiversity

In support of students and early-career scientists who want to learn the basics of RDM we provide an open educational resource on the topic research data management (Selbstlerneinheit), containing valuable information on all stages of the data life cycle. This is the easiest entry point into RDM, as it will provide a basic understanding and be a guide for further reading. Additionally, NFDI4Biodiversity supports teachers and students of RDM with video series published on Youtube, with topics ranging from Basics of RDM, services provided and tutorials on how to use them, as well as the handling of data in general. The consortium provides GitHub Repositories usable as working environments for Jupyter, R and data validation in PANGAEA, and a Zenodo-Community where slides and other material is published for reuse.

NFDI4Biodiversity provides custom trainings for biodiversity related institutions, working groups and projects on the following topics:

  • Creating a Data Management Plan (DMP)
  • Data annotation
  • Data publication
  • Data archiving
  • Data submission via Data Submission Service
  • Metadata standards
  • Legal aspects in research data management
  • Naming conventions and taxonomic harmonization
  • Legal regulations for handling biodiversity and environmental data
  • Introductions to working with our tools and services
  • Fundamentals of research data management and data literacy
  • General introduction to NFDI4Biodiversity and the National Research Data Infrastructure (NFDI)

We also provide yearly Seasonal Schools for PhD students, researchers, data collectors or data center staff members, offering a broad range of basic and advanced knowledge in the management of biodiversity, ecology and environmental data. The intensive courses include input presentations from experts and hands-on exercises and are tailored to the attendees prior expertise.

The seasonal schools are designed to create an open, collaborative environment that invites networking and knowledge sharing among participants and can therefore facilitate early data mobilization by furthering the participants knowledge on available tools and services, workflows and best practices.

All of these teaching and training materials and the information about the events can be found on the NFDI4Biodiversity website and its Knowledgebase.

Other relevant NFDI Consortia and the RDMTraining4NFDI from Base4NFDI


Do you have questions, feedback or need help?

Contact our Helpdesk for direct support.

  • No labels