Mobile Menu

Considerations for starting a FAIR data library

The FAIR principles have been defined to enable data producers and publishers to make their data findable, accessible, interoperable and re-usable. They act as guidelines that could be followed in any combination as situations evolve.

Before embarking on a journey towards a FAIR data library, there are a number of questions that need to be considered: Who is needed to implement FAIR within an organisation? What resources are needed to generate a FAIR library?

Who is needed to implement FAIR?

Large teams are not always necessary to FAIRify the data landscape. The minimum number of people required is typically between 6 and 10 – depending on their skills and how well they work together.

  • Stakeholders: The most successful companies tend to be those where stakeholders have business acumen as well as scientific knowledge, allowing them to understand the most effective logistics for that particular organisation.
  • Modellers: Modellers are needed for overviewing the data-side, such as running semantic models. Semantic data models are designed to capture information about the meaning of the application environment, enabling the same information in a database to be viewed in several ways.
  • Data engineers: Data-engineers are required to construct the infrastructure necessary to construct a FAIR data library.

Although small teams enable immediate feedback and clarity on how successful the transition is, scaling up is not easy. Larger data-use cases usually become less agile and run into several issues, such as consent and data protection challenges. This then requires more specialist expertise.

A Guide to the FAIR Principles in Biopharma

FAIR principles should be weaved into data processes as seamlessly and effortlessly as possible, without being overly time-consuming. Therefore, collaboration with data producers should be encouraged from the very beginning of the transition. This will enable education throughout the process, rather than having to enforce procedures later on.

Incentives can be used to support the FAIR data shift – peer recognition, reward schemes and financial bonuses are all effective options. Data governance policies should be used to bridge the gap between data producers and executive level individuals, to create a data-centric model that is incorporated by all departments.

What resources are needed to implement FAIR?

  • Persistent identifiers
  • Domain-specific descriptors
  • Metadata identifiers
  • Standardization access protocol
  • Authentication and authorization protocols
  • Broadly applicable language
  • Broadly applicable vocabulary
  • Data licence
  • Provenance information

Persistent identifiers

Globally unique persistent identifiers need to be assigned to both data and metadata. A third party is often required to generate these as this guarantees longevity and ensures that they are organization independent. This means that the identifiers are maintained even if the project or community is terminated. For example, Digital Object Identifier (DOI) registration agencies collect metadata and assign persistent identifiers to meet an organization’s needs.

Domain-specific descriptors

These enable search engines to locate a resource. It is a challenge for communities to define their own metadata domain-specific descriptors that optimise findability. Therefore, metadata schemata have been invented, such as the Data Documentation Initiative (DDI), which can manage different stages in the research data lifecycle.

Metadata identifiers

If the resource and metadata are stored independently, its crucial that the descriptor contains the identifier of the publication being described. This means that a machine actionable model is needed to link a resource to its metadata. Technologies, such as FAIR Data Point, can be used to provide unique identifiers to multiple layers of metadata and a searchable path through descriptors.

Standardization access protocol

To permit broad access, identifiers need to follow a globally accepted schema that is part of a standardized protocol. An example is the Hypertext Transfer Protocol (HTTP).

Authentication and authorization protocols

Some digital resources have restrictions or need additional measures to be accessed. Therefore, an authentication and authorization procedure must be specified. The Internet of FAIR Data and Services functions by implementing an Authentication and Authorization Infrastructure (AAI) protocol.

Broadly applicable language

Organisations need to consider that FAIR principles highlight the ability of data to be re-used by a generic agent. The Resource Description Framework adheres to this guideline. On the other hand, choices may be available that are widespread in specific communities. 

Broadly applicable vocabulary

Terminologies need to be consistent in terms of units of measure, classifications and relationship definitions. The BioPortal represents shared knowledge about broadly applicable language for life science ontologies.

Data licence

Digital resources must always include a licence that describes the conditions under which it can be used, for example a CC0 licence, which is a universally public domain.

Provenance information

This helps to assess whether a resource meets a criterion for intended use. An example of a provenance descriptor template is the PROV-template, which are used to pre-define the structure of information collection, reducing the burden on data producers.

Tips for starting a FAIR data library

The Association of European Research Libraries (LIBER) is the ‘voice of Europe’s research library community’. LIBER is made up of about 450 libraries and various partnerships with international organisations. The aim of their work for nearly 50 years has been to enable world class research. A factsheet about the FAIR data principles has been produced by LIBER. It includes a section about how to start constructing a FAIR data library. Some of the points are summarised below:

  • Promote FAIR principles to local research and IT staff.
  • Incorporate FAIR principles in data preservation, practices and policies.
  • Curate, enrich, capture and preserve as much research as possible that will help to make data findable, accessible, interoperable and re-usable.
  • Train data librarians on relevant metadata, vocabulary and tools to make data FAIR.
  • Encourage researchers to deposit data with FAIRified archives.
  • Evaluate the organisation’s data management practices against the FAIR principles.

A fundamental and collective understanding of why it is important to be involved in FAIR principles is crucial in the shift towards a data centric information system. The relevance of FAIR must be quantifiably demonstrated with clear benchmarks, and organisational capabilities need to be highly coordinated to ensure that the same code lists and the same terminologies can be adopted across organisations. For more information about moving from an application-centric perspective to a data-centric one, check out the Driving FAIR in Biopharma report. It includes contributions from FAIR pioneers and highlights their insights about FAIRification efforts within the industry:

Hero Image credit: Towards Data Science

More on these topics

Big Data / BioData / Data Sharing