The DMA Data Pipelines

The recently published reports “Data Technology Specification and Development Roadmap” and “DMA Blockchain Design” are preparing ground for the conceptual basis of the DMA. Following this first development steps were undertaken, beginning with the creation of user stories up to the deployment of components in cloud environments. Technologies for data sharing, blockchain, cloud deployment and persistent identifier generation needed to be evaluated and integration with data ingest and management needed to be envisaged.

Data Market Austria Vocabulary

In an project’s horizontal action, a metadata working group was established. The first task of this group was to define the “Data Market Austria Vocabulary” (DMAV) which provides classes and properties for describing datasets and services that are accessible in the Data Market Austria (DMA). The DMAV was updated and extended with a controlled vocabulary and is now available in a stable version 1.0, i.e. it can be used for the first data ingestion tasks. The DMAV is built to be compatible with other standards like DCAT-AP and INSPIRE which allows us to cooperate with different companies and data providers.

Metadata and data quality play a pivotal role in the ecosystem of the Data Market Austria. To address the importance of these aspects, the DMA platform incorporates services to provide an assessment in terms of quality metrics to evaluated datasets that are entering DMA. Metrics in this category are, for example, the completeness of mandatory metadata fields, the understandability of metadata descriptions of data, or the currentness of a given dataset, i.e., how long has the dataset been out in regard to its update frequency.

Quality improvement services

Yet, detecting potential shortcomings within the metadata or the dataset as such is only half of the story. To support potential data providers in the process of releasing their data, the DMA quality improvement services try to tackle common issues of datasets in an automated way, such as encoding errors, or proprietary usage of data formats. In addition, semantic enrichment and the use of linked data provide a high potential of creating an added value for data consumers and providers alike. For example, semantics can be used to get a deeper understanding of the context of a given part of a datasets (e.g., a CSV file) and therefore provide the required information for even adding missing pieces of information within a given dataset.

Prototype

WP5 developed a first prototype of the data ingest and management component which will serve as the core platform for additional services to be hooked into extensible data processing workfklows. The purpose is to integrate components for persistent identifiers, blockchain data provenance and membership management, metadata creation and management, metadata enhancement, and metadata quality improvement into a single WP5 data management tool.