Cat4KIT:#

A cross-institutional data catalog framework for the FAIRification of environmental research data

https://codebase.helmholtz.cloud/cat4kit/cat4kit-plumber/badges/main/pipeline.svg https://readthedocs.org/projects/cat4kit/badge/?version=latest https://img.shields.io/badge/code%20style-black-000000.svg https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336 https://img.shields.io/badge/code%20style-pep8-orange.svg http://www.mypy-lang.org/static/mypy_badge.svg https://api.reuse.software/badge/codebase.helmholtz.cloud/cat4kit/cat4kit-documentation http://img.shields.io/badge/DOI-10.11588/heibooks.1288.c18072-blue.svg

Warning

This page has been automatically generated as has not yet been reviewed by the authors of cat4kit-documentation! Stay tuned for updates and discuss with us at
https://codebase.helmholtz.cloud/cat4kit

Overview:#

In order to establish a solid basis for open and reproducible earth system sciences, it is imperative to implement a modern and adaptable Research Data Management (RDM) architecture that ensures the findability, accessibility, interoperability, and reusability (FAIR) of environmental research data. Scientific journals commonly make use of prominent data repositories like Pangaea, Zenodo, or RADAR4KIT to publish accompanying datasets. However, for the exchange of intermediate, day-to-day, or actively utilized data, researchers generally resort to utilizing basic cloud storage services and email. Nevertheless, despite the emphasis placed on the FAIR principles, which advocate for the open findability and accessibility of data, it is frequently limited to closed and restricted infrastructures as well as local file systems.

Hence, the objective of this research endeavor, Cat4KIT, is to provide a cross-institutional catalog and research data management (RDM) framework with the purpose of enhancing the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles of day-to-day research data. The framework comprises four distinct modules that serve various functions:

1. Data provider/services#

Facilitating data retrieval from storage systems via interfaces that are well-defined and standardized,

2. Harvester and Ingester#

Harvesting and transforming (meta)data into formats that are consistent and standardized,

3. Catalog services/Exposer#

Ensuring public accessibility of (meta)data through catalog services and interfaces that are well-defined and standardized, and

4. Data portal#

Empowering users to search, filter, and investigate data from decentralized research data infrastructures.

_images/components.png

This methodology guarantees the versatility of our framework, allowing it to be utilized with various types of research data, spanning from multi-dimensional climate model outputs to high-frequency in-situ observations. Our focus in this project lies on the utilization of established open-source solutions and community norms for data interfaces, (meta)data schemes, and catalog services, such as the Spatio-Temporal Assets Catalog (STAC). This approach ensures seamless integration of research data into the Cat4KIT framework and enables effortless expansion to different research data infrastructures.

The subsequent image provides a comprehensive depiction of the Cat4KIT components and their interconnectedness in KIT.

_images/schematic.png

How to use#

Contribution#