Object model and RDF ontology

This document describes in detail, Survol internal data model, how it can be mapped to RDF, and how these data can be imported and used in other tools.

It starts by a short reminder of the architecture, especially details about CIM.

COMMON INFORMATION MODEL

Common Information Model (CIM) is an open standard that defines how managed elements in an IT environment are represented as a common set of objects and relationships between them.

It is implemented by WMI on Windows operating system, and by WBEM on all platforms.

Survol proposes its own very lightweight implementation:


  • Written in Pure Python, very light and portable.
  • Providers are much easier to write, just Python script. Our ontology has a subset 100% compatible with CIM. It adds many classes, based on software objects, proprietary and open-source.
  • It is very restricted because it does not have to be exposed.

RDF: Resource Description Framework

Resource Description Framework (RDF) is a  World Wide Web Consortium (W3C) specification designed as a metadata data model. It is based on the concept statements about resources  in expressions of the form subjectpredicateobject, known as triples. RDF is primarily used to provide information or metadata for data of any type, present on multiple machines. An RDF ontology describes the concepts and relationships representing an area of concern.

RDF is the reference language to export Semantic Web data to software such as Protégé or Apache Jena, or graph databases like Neo4J, GraphDB, Tigergraph etc...

Mapping CIM model into an ontology

Because RDF can describe any type of information in a minimally structured way, it is perfectly suited to represent Survol internal data, which exposes information from multiple sources, about object of multiple types, hosted by different machines. Therefore, a RDF ontology is needed to describe CIM classes and properties and export these data to external tools.

As CIM and RDF are well-established standards, there have been several projects to create this mapping, some more ambitious than others:

They all succeed to unambiguously define a one-to-one mapping between CIM and RDF properties and objects, which is all than Survol needs. Although several ontology languages exist, RDF-Schema (RDFS) is the simplest and most common one, and more appropriate than OWL because Survol does not constraint data created by its very diverse data providers. Survol data providers are simple Python scripts which can create a very rich graph of data on all sorts of objects classes and relationships. These Python scripts, which are typically a customization for a specific purpose, are a-priori trusted.

CIM presents a view of logical and physical objects, using an object-oriented construct called a class. A CIM class can include properties to describe data and methods to describe behavior. Each instance of this class is uniquely identified by a subset of these properties, but they can optionally have more properties. CIM also defines associations of objects.


Examples:

  • A CIM_DataFile instance is practically defined by its name (Or the inode on Unix-like operating systems), and its has other attributes such as the size, the protection mask etc...
  • A CIM_Process instance is uniquely defined by its process identifier, but also has a user, a priority etc...


The RDF ontology and data model is created on-the-fly by examining attributes of CIM objects and associations and creating the RDFS triples where properties and associations definitions are mapped to predicates, which ensures that no information is ever lost without enforcing any structural constraint on the data providers.

IMPORTING RDF DATA INTO PROTEGE

Protégé logo Protégé is a free, open source ontology editor and a knowledge management system. Protégé provides a graphic user interface to define ontologies. Protégé offers the features of importing RDF ontologies from a text file or located on the Web by specifying the http:// URL (address) of the ontology document. This section demonstrates how this can be done with any Survol RDF endpoint.


Protégé: Enter URLThe first step is to open the Protégé "File" menu for opening or saving ontologies. One of these options, "Open from URL" expects a HTTP link. Survol uses the RDFS ontology, as it is far more flexible than OWL and does not enforce any constraint on the Python providers scripts.

Provider scripts are specialized for a specific set of data describing the state, structure, architecture of a running machine. It can be the files in a specific directory, the TCP/IP sockets of a process, the DLL (shared library) linked to an executable etc... Survol comes with many different providers, but there is no limit on the number and types of provider scripts, and they are very easy to create.

When Protégé loads an RDF URL from Survol, it executes a provider script which dynamically creates a triple-store: This is therefore a data snapshot. Each time this URL is loaded, its content might be different.


Entering a RDF URLThe HTTP link to enter is an ontology URL defined by Survol. Any Survol URL can provide RDF triples: RDF is one of the many output files Survol can automatically convert its data too. A simple hint is to append to any URL, the CGI key-value pair: "&mode=rdf".

Survol comes with its own model describing software ( and also  hardware objects). This model is based on CIM, which is a defacto standard  that defines how managed elements in an IT environment are represented as a common set of objects and relationships between them. There is no standard mapping of CIM into a semantic web ontology. However, this translation is straightforward into RDFS: Both models manipulate classes and properties, with no added constraints. Survol does this translation on-the-fly, and represents only the CIM classes which are instantiated in the RDF triple-store. The reason for this restriction is the huge amount of CIM classes (Several hundreds) which are not needed if they bring no instance. On top of CIM classes, Survol models other objects which do not have an official CIM definition yet.  It is very simple to create a new class in the Survol model, which is handy when adding objects of a niche software or not-yet-widely-known application. Survol is therefore well suited to describe an application or an IT system based on very heterogeneous software objects.



Ontology metrics One the data are loaded; they are visible in the "Ontology metrics" view which displays various for the axioms in the active ontology and its imports closure. These counts can be for example the combined logical and non-logical axiom count, the number of logical axioms, the number of classes, object properties, data properties and individuals in the signature of the imports closure of the active ontology.

Several scripts can be merged, possibly from different Survol agents. Their classes instances and properties will be transparently merged as if coming from a single URL.

The resources rdfs:seeAlso and rdfs:isDefinedBy can be plain HTTP URLS, but most of times, they will be RDF Survol URLs providing more RDF data. For example, if an object is present in the RDF triplestore, it will come with different resources pointing to provider scripts, each of them returning a triplestore related to the object.



After loading, all objects created by Survol are now visible in Protégé with their properties: The mandatory properties which are part of its URL, and any predicate generated by the Survol script. All classes and properties are defined in RDFS ontology. 

Annotations


IMPORTING RDF DATA INTO GRAPHDB

GraphDB is an enterprise ready Semantic Graph Database, i.e. RDF triplestores, compliant with W3C Standards.

The "Upload RDF files" menu allows to select, configure and import data from various formats. One can import from a URL with RDF data. Each endpoint that returns RDF data may be used. This is first step is similar to Protégé RDF import: Most of Survol URL can output their data in RDF format.


The "Import" view loads triplestores from RDF URLs.

GraphDB: Importing RDF

This view displays the classes of the instances loaded from the RDF URL.

GraphDB class hierarchy

Each individual resource can be displayed with its predicates and objects.

Description of Linux cgroup cpuacct