The Data Collection Service is a data hub to store and retrieve JSON files for communication and synchronization of data between different services. It provides a REST API for storing and retrieving JSON data. Services that use the data collection service get their own secret key for secure authentication.For example, the Data Collection Service is used by the GFBio Search (Search and Harvesting-Infrastructure) and Visualization, Analysis and Transformation (VAT) to exchange data and mediates the display of data in the VAT that was previously selected in the GFBio Data Portal.

The primary target audience are RDC developers and service providers that want to connect their services to the RDC.

Status: PRODUCTIVE

Weblink: collections.gfbio.org

Target group: service provider

Keywords: data exchange, JSON, API

RDC Integration: integrated or connected

Product owner: GFBio e.V.


RDC Integration

The GFBio Data Collection Service is currently used to receive data selected within the GFBio Data Search and provide it to the VAT System (Visualization and Analysis tool). The user can select data entries in the GFBio Data Search to visualize and analyze. The list of these entries is then transferred to the Data Collection Service and saved there. Then the user can access this list in the VAT System and visualize the data there.

Getting started

You can use the Data Collection Service as an independent micro-service to exchange data between multiple services. The Data Collection Service is agnostic to the contents of the data, the only requirement is that it is formatted as a valid JSON document. Currently, the service requires by default no structural limits or administrative metadata for the content of the collection. But for each collection it saves the date it was created, the service that created it and assigns an id (Guid) to it. Also the collection needs to be mapped to a user id. These metadata is included with every collection in a response as "created" (date), "origin" & "service" (service name & id), "id" and "external_user_id", the collection itself is transferred as field "set". The service is agnostic to the type and source of the user id. Currently service providers need to agree on the format of the data and the user id format independently of the Data Collection Service and has to rely on the origin of the collection to deduce what the format of the external user id is.

User Guide

The Data Collection Service is a technical service that is intended for exchanging data in JSON format between user-facing services in RDC. As such the Data Collection Service currently only offers a REST API and no graphical or command line user interface. The primary target audience are RDC developers and service providers. Please refer to the API Documentation for details on usage.

Token-based Authentication

For another tool to access or create data via the REST API, it is required first to be registered as a service in the backend. These services are also used for permission handling. A service then can get one or multiple auth tokens consisting of a UUID. These tokens then need to be exchanged via a secure channel. For authentication the requesting service needs to include it as a header in the format "Authentication: Token {{auth_token}}".

To ensure that a token stays secret, requests must only be sent from the backend-server and never from the client directly.

To add your  service as a provider you need to request and be granted access by an admin (send a request to the support-email referenced below).

Deployment Guide

Please refer to the "developer guide" section on GitHub if you plan to deploy your own instance of this service.

References

Publications