Partnership_and_methodology

Methodology and expected results

We use an exclusive methodology, which takes place in four stages:

Define the terminology. This is an ongoing process which starts with usual software components, and gets more and more elaborated.
Gather sources of information and static data.
Generate dynamic data.
Refactoring to target platforms such as Docker or Kubernetes.

This approach is cyclic and is refined from the start to reach a satisfying goal.

Survol is designed for this analysis and investigation step. Not only as a software tool, but also with the formalization it uses, based on semantic technologies.

BUSINESS TERMINOLOGY

Finding existing models

The aim is to build a model based on existing terminologies in the semantic web world. This ultimately allows to store all knowledge about the information system, in a graph database, where abstract, business and technical information can coexist. Notably, its role is to associate conceptual entities visible to the user, to physical entities handled by software. By relying on standard models, several distinct domains of knowledge can be merged into one repository.

This approach is fundamentally different of defining a stack of different levels of abstraction, from the most concrete at the bottom and the most abstract at the top. Contrary to this vision, we assert that the notion of abstraction can be equally applied to all objects in an Information System, that this abstraction and concreteness are purely subjective and do not contain any information, especially in an legacy system where concepts and objects are already implemented and all have a concrete life cycle. Defining all objects in the same level allow to express very precise and clear relations describing a concrete implementation.

Existing terminologies must be used first. This allows to have a common vocabulary between different teams, and have different source of information communicate seamlessly.

There are many models in the semantic web terminology. For example, in the finance world:

The Financial Industry Business Ontology (FIBO) defines concepts in financial business applications and their relations.
Banking Industry Architecture Network (BIAN) establishes a semantic framework to define IT services in the banking industry.

Technical descriptions should also be based on existing ontologies. WikiProject Informatics has a project dedicated to defining terms related to sofware, and lists other ontologies of this domain:

DOAP (Description of a Project) is an RDF Schema and XML vocabulary to describe software projects, in particular free and open source software.
Software Evolution ONtologies (SEON) is a formal description of knowledge from the domain of software evolution analysis and mining software repositories.

Last but not least: Common Information Model is an open standard which represents IT elements and their relationships. Although it is far from covering all aspects of an information system, it has a concrete implementation on most operating systems (WMI etc...)

Updating models with custom Technologies

Existing models will certainly not be enough to fully describe an Information System. Therefore, the last step attempts to define concepts and relations specific to this application. Many tools are made to create and update an ontology, for example Protégé or Stardog.

"STATIC" INFORMATION

Once the application's formal model is clearly defined, the next step is to fill it by any possible mean: That is, collect, or create tools to collect, as much and diverse information as possible, all based in the model previously defined. The final intention is to merge these information sources into a single repository, giving the full picture of the Information System. Possible information sources are:

Existing legacy documentation, formalized with RDF
Database design.
Static code analysis with specific tools:, for example parsing of shell scripts, SQL queries or stored procedures, Doxygen etc.
Software API, identifiers of application objects: They provide valid information about the base concepts of the system.

Information must be formalized based on this terminology defined in the previous step. At this stage, this information only addresses business concepts; this is not sufficient: The resulting semantic repository created here should also expresses relationships with technical entities.

Some technical objects will have a non-ambiguous definitions: This database, that server etc... but quite probably are defined with their attributes, defined in the terminology:

"The process running this concrete application".
"The database row modelling that object"

These attributes will be added when merging "physical" data.

This work is by nature incomplete and is cyclically enhanced..

MERGE DYNAMIC DATA TO COMMON REPOSITORY

Creation of dynamic data generated by the IS, with development tools. These technical data are merged to the semantic repository.

This merge creates valuable relations because there are relations between functional and technical objects. These dynamic data might have to be completed with attributes to associate them with static information.

Database monitor files, logging each and every SQL statement run by clients.
Database audits (V$SESSION, ALTER SESSION SET, SQL Trace_trace and TKPROF)
Application log files, where technical and business information can be extracted from. This approach is similar to process mining. Userspace or system calls monitored with tools like strace or ltrace. Detect command lines.
tcpdump on some relevant network traffic. packet captures in production

Other techniques

Detecting patterns in the RDF graph database:

AI and the knowledge graph..
AI-powered graph computing and semantic big data.
Rebuild data dependencies by using timestamps.

Analyse technical documentation with NLP (Natural Language Processing) techniques. This technology is used in the pharmaceutical industry. Gate is the tool of choice for parsing free-form text and extracting semantic data.

Process mining: Automated business process discovery.

Reverse-engineering: Radare2 etc...

REFACTORING

Based on the information extracted in the previous step, define the functional building blocs using the terminology defined in the first step. The size and number of these building blocks is not important as long as they are defined in the terminology of the application. This is a common stage for all refactoring tasks..

PARTNERSHIP

This approach is well-suited to merge knowledge brought by several teams with different skills, and also in partnership with different companies with distinct skills.

Partnership and methodology