This blog post is the third in a series of cases with semantics (for the first one, see “Delivering semantics“, the second one is at “Collibra applied at SCA Packaging“). For a more detailed description of this case, see the recently approved workshop paper: “Business Semantics Management supports government innovation information portal” by Geert Van Grootel, Peter Spyns, Stijn Christiaens and Brigitte Jörg, published at the 2009 ORM workshop, Vilamoura, Portugal.
Introduction
As I mentioned in an earlier blog post on CERIF, the aim of this Common European Research Information Format (CERIF) is to facilitate information sharing among the research communities. CERIF is designed for the exchange of skills and knowledge, and brings both together. It has been set up in the early nineties, and has been under the official management (i.e. as authorized by the European Commission) of the euroCRIS organization.
In Belgium, the Flemish department of Economy, Science and Innovation (EWI) launched the Flanders Research Information Space Programme (FRIS)* to refer both to
- the virtual environment of research information;
- the programme that is being set up in order to create this research information space.
A first realisation is the FRIS research portal that exhibits current research information on projects, researchers and organisations of the Flemish universities. Another example of such a portal is that of the US government.

- Overview of the FRIS
A key feature is that data can immediately be collected at the point of creation in the operational processes of the data providers (e.g., universities, funding bodies, …). As such, the data are up-to-date and expected to be more accurate (as opposed to second-hand information entered by non or less related people elsewhere). Also, parallel data gathering processes can be eliminated, resulting in reduction of a lot of administrative work.
* Play of words on the innovating and fresh (’fris’ = ‘fresh’ in Dutch) approach.
Technicalities
CERIF is the standard of choice for the information interchange between the systems that will be part of FRIS (see the euroCRIS website for tutorials). It is centred on a limited set of basic concepts:
- the business objects of the research and innovation domain: project, person, organisational unit, publication, event, etc.;
- the relations through time that exist between these objects;
- support for multilingual text attributes;
- a separated semantics storage layer.
The presence of formalised semantic links enables information about researchers in their research context: projects, organisations, and publications can be linked according to the CERIF semantics. Next to CERIF as the exchange format, there is a variety of access protocols at the Current Research Information Systems (CRIS) of the data providers. Many of these systems make use of the Open Archives Initiative, which develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content. For example, their OAI-PMH protocol specifies how metadata can be harvested from the repository. These systems vary in their implementation: some of them are built in-house, others are product based (e.g., Pure from Atira).
The core of the metadata in the repositories is frequently based on Dublin Core, which is not always trivial given DC’s original narrow scope of library cards. Another example of an encountered format is the Metadata Object Description Schema (MODS). The use of these niche formats results in increased integration complexity, as the fine-grained information (elegantly captured in CERIF) has to be modeled in such a narrow metadata straightjacket.
Business Semantics
Traditionally, CERIF has been modelled using Entity-Relationship (E-R) modelling techniques. This caused some difficulties in communication with domain experts as the learning curve to understand the E-R model and translate it back to their conceptual knowledge is quite steep. On the other hand, CERIF experts at EWI have been struggling for quite some time with the problem of how to express and adequately explain conceptual models underlying the FRIS application to the other stakeholders involved (mainly non technical persons). Domain experts and stakeholders should not be bothered with how to think in a (new) formal language: adequately capturing and organising domain knowledge is a task sufficiently demanding as by itself and mainly happens via the use of natural language.
To overcome this problem, EWI has decided to express the business facts in the domain concerned by the use of Business Semantics Management (BSM). The current scope of the work at EWI is at Semantic Reconciliation, details on Semantic Application will follow in a later post:
- Scope: defining the borders of the current iteration of BSM, which (amongst other things) helps in grounding discussions. One can always refer back to the source documents (or other boundaries) to bring a difficult (and sometimes philosophizing) discussion back on track. Given CERIF’s level of detail, we used the core entities as starting point for the different iterations.
- Create: generate fact types from the collected sources in the scoping activity. The focus of the activity should be on getting all the facts, rather than discussing them. Convergence will be tackled later on.
- Refine: clean the collected facts by following some simple rules. For instance, decide where typing is relevant by determining which concepts share the same kinds of relationships, or split a semantic pattern in smaller, more reusable patterns (e.g., a Person pattern, an Address pattern).
- Articulate: create informal meaning descriptions as extra documentation, to serve as anchoring points when stakeholders use different terms for the same concepts.
- Unify: collect the input from various stakeholders and combine them in agreed upon and shared semantic patterns.
The Collibra Studio (table view depicted below) supports many of the activities described above, thereby further assisting the domain experts in capturing their business semantics. I give a brief sampler, and refer to the Collibra Studio documentation (or its download) for more details:
- a fact editor that allows the domain expert to simply key in the facts in natural language;
- a visual editor providing a graphical way of presenting and browsing through the collection of facts;
- a concept editor with a built-in browser for searching the web for already existing meaning descriptions.

CERIF semantics
What’s next?
The standard conceptual schema (i.e., the CERIF model) is not enough to guarantee flawless interoperability and information exchange. Local agents should be able to function autonomously while having the ability to exchange “meaningful” messages. To achieve this interoperability with local autonomy, confusion should be removed about which data are labelled as belonging to which category or being associated with which label. Also, agreement is necessary to capture what is meant by the names of entities and relationships. Most importantly, interoperability requires a precise and formal definition of the intended semantics of an ontology. Not only does this presuppose a definition in natural language, also a formalisation is needed to impose additional restrictions on the relationships between entities.
The CERIF standard is, by its nature and status of standard, an ideal candidate to be “ontologised”. For reasons of data quality and integrity as few ambiguities as possible concerning the meaning and use of domain terminology should occur. This requirement holds in particular in a context of a networked configuration, consisting of a limited number of data providers, a central portal and an unlimited number of potential data users (human and/or artificial) of various types of organisations (see Fig. 1). Business semantics are exactly meant for this purpose.
In the EWI case, our approach already resulted in:
- a set of elegant semantic patterns that clearly capture information in the research domain;
- detailed analysis of how entities are related in the research domain;
- a sustainable integration between CERIF and a wide variety of differently interpreted niche formats;
- a bridge between the world-as-is (i.e., with all its legacy in place) and the vision of the FRIS programme (which reminds us of what the Semantic Web has in mind);
- a way of overcoming the gap between domain experts and technical people.
Stay tuned for more posts in the case series…

One Trackback
[...] Economy and Science (one of Collibra’s customers for business semantics, e.g., see here) is actively looking into Okkam as a technology to help them realize their identifier issues. [...]