Data interoperability in ER-flow

In many scientific areas, improved digital acquisition capabilities combined with online availability of data led to a drastic increase of digital data volume and complexity. This data is commonly distributed worldwide over the Internet. In addition, there exist strong incentives towards the wide spreading and globalization of scientific data such as the growing trend to archive data in open databases freely accessible to researchers and more generally to all society, and the need to reproduce scientific results or re-analyse past data. Scientific disciplines thus encounter data interoperability issues related to data publication, data interpretation, and cross-platforms data transfers. Scientific data interoperability challenges may arise from different data representations in use, different file formats for a same type of data, different data storage and indexing means and different data exchange means. In the context of ER-flow the focus is more specifically put on workflow parameter files representation and transfer across DCIs.  

What is data interoperability?

Data interoperability covers all aspects related to data sharing among different scientists exploiting the same data sources. Ensuring data interoperability becomes a complex problem as soon as data is distributed over different locations, and administrated by different organisations.

Why is data interoperability important?

Scientific experiments nowadays collect tons of data which are typically stored in different places. Scientific consortia are often world-wide collaborations with many actors having an interest in exploiting this data. Furthermore, scientific data is increasingly reused and repurposed in contexts it was not always originally thought it could be of much value.

What is the connection between workflows and data interoperability?

Scientific workflow environments are used to describe and enact scientific experiment. They consume raw data, process it and generate transformed data at a large scale. They are thus directly impacted by data interoperability problems. Furthermore, the workflow data transformation process changes the representation and the nature of data, yet preserving a link between input and outputs. This traceability information is particularly important to ensure produced data interoperability.

What is the connection between DCIs and data interoperability?

Different Distributed Computing Infrastructures are typically using different storage formats and different data exchange protocoles. Complex scientific workflows executing over several DCIs jointly face problems of heterogeneity among the infrastructures that deliver data to be processed or store temporary results.

        

   This project has received funding from the European Union's Seventh Framework Programme for research, technological development and demonstration under grant agreement no 312579.