Partnership and methodology

Methodology and expected results

This methodology and the aim to be achieved is done in two steps:

A common step of analyzing the business processes. It does not attempt to supersede existing and validated methods, but rather adds some steps, abouth the reformulation of the key concepts of the application, and their mapping to technical objects detected with software tools.

Then, in a second stage, use this formalization to migrate key components to AC2/XComponent, Docker or other target platforms.

Survol is designed for this analysis and investigation step. Not only as a software tool, but also with the formalization it uses, based on semantic technologies.












FOUR STAGES


The analysis stage is made of four different steps:

  • Define the terminology. This is an ongoing process which starts with usual software components, and gets more and more elaborated.
  • Gather sources of information.
  • Define functional blocks.
  • Extract and rebuild.


BUSINESS TERMINOLOGY

Finding existing models


The aim is to build a model mostly based on existing terminologies in the semantic web world. This ultimately allows to store all the existing information about the information system, in a graph datase, where abstract, business and technical information can coexist.


This approach is fundamentally different of defining different levels of abstraction, the most concrete at the bottom and the most abstract at the top. Contrary to this vision, we assert that the notion of abstraction can be equally applied to all objects in an Information System, that this notion of abstraction is purely subjective and does not bring any information, especially in an legacy system where concepts and objects are already implemented and all have a concrete life cycle. Defining all objects in the same level allow to express very precise and clear relations describing a concrete implementation.


Existing terminologies must be used first. This allows to have a common vocabulary between different teams, and have different source of information communicate seamlessly.


There are many models in the semantic web terminology. For example, in the finance world:

  • The Financial Industry Business Ontology (FIBO) defines concepts in financial business applications and their relations.
  • Banking Industry Architecture Network (BIAN) establishes a semantic framework to define IT services in the banking industry.

Technical descriptions should also be based on existing ontologies. WikiProject Informatics has a project dedicated to defining terms related to sofware, and lists other ontologies of this domain:

  • DOAP (Description of a Project) is an RDF Schema and XML vocabulary to describe software projects, in particular free and open source software.
  • Software Evolution ONtologies (SEON) is a formal description of knowledge from the domain of software evolution analysis and mining software repositories.

Last but not least: Common Information Model  is an open standard which represents the representation of IT elements and their relationships. Although it is far from covering all aspects of an information system, it has a concrete implementation on most operating systems (WMI etc...)


Updating models with custom ontologies

Existing models will certainly not be enough to fully describe an Information System. Therefore, the last step attempts to define concepts and relations specific to this application. Many tools are made to create and update an ontology, for example Protégé or Stardog.

LEGACY DOCUMENTATION, SOURCE CODE, "OTHER" SOURCE OF INFORMATION

Once the application's formal model is clearly defined, the next step is to fill it by any possible mean: That is, collect, or create tools to collect, as much and diverse information as possible, all based in the model previously defined. The final intention is to merge these information sources into a single repository, giving the full picture of the Information System. Possible information sources are:
  • Existing documentation, formalised with RDF
  • Formulate databases, external data sources ... based on the terminology previously defined.
  • Static code analysis with specific tools:, for example parsing of shell scripts, SQL queries or stored procedures.
  • Software API, identifiers of application objects: They provide valid information about the base concepts of the system.

DISCOVER INTERFACES

Based on the information extracted in the previous step, define the functional building blocs using the terminology defined in the first step. The size and number of these building blocks is not important as long as they are defined in the terminology of the application.


The aim of the next step will be to technically identify these steps in the execution flow, using:

  • The functional description.
  • The relations between functional and technical objects.

EXTRACTING DYNAMIC DATA


Last phase: Extraction of dynamic data. Monitoring.



tcpdump on some relevant network traffic. packet captures in production
  • Database monitor files, logging each and every SQL statement run by clients.
  • Database audits (V$SESSION, ALTER SESSION SET, SQL Trace_trace and TKPROF)
  • Application log files, where technical and business information can be extracted from. This approach is similar to process mining. Userspace or system calls monitored with tools like strace or ltrace. Detect command lines

  • MERGE DATA, REBUILDING


    Merge static and dynamic data in a single RDF repository.

    Sparql



    OTHER TECHNIQUES


    Detecting patterns in the RDF graph database:


    Analyse technical documentation with NLP (Natural Language Processing) techniques. This technology is used in the
    pharmaceutical industry. Gate is the tool of choice for parsing free-form text and extracting semantic data.
    Process mining: Automated business process discovery.
    Reverse-engineering: Radare2 etc...

    OUR PARTNER


    Invivoo.


    Our fields of expertise are

    System
    renovations
    Software modernization
    & reengineering
    Global performance
    enhancement
    Reverse-engineering
    analysis of existing app