We use an exclusive methodology, which takes place in four stages:
This approach is cyclic and is refined from the start to reach a satisfying goal.
Survol is designed for this analysis and investigation step. Not only as a software tool, but also with the formalization it uses, based on semantic technologies.
Finding existing models
The aim is to build a model based on existing terminologies in the semantic web world. This ultimately allows to store all knowledge about the information system, in a graph database, where abstract, business and technical information can coexist. Notably, its role is to associate conceptual entities visible to the user, to physical entities handled by software. By relying on standard models, several distinct domains of knowledge can be merged into one repository.
This approach is fundamentally different of defining a stack of different levels of abstraction, from the most concrete at the bottom and the most abstract at the top. Contrary to this vision, we assert that the notion of abstraction can be equally applied to all objects in an Information System, that this abstraction and concreteness are purely subjective and do not contain any information, especially in an legacy system where concepts and objects are already implemented and all have a concrete life cycle. Defining all objects in the same level allow to express very precise and clear relations describing a concrete implementation.
Existing terminologies must be used first. This allows to have a common vocabulary between different teams, and have different source of information communicate seamlessly.
There are many models in the semantic web terminology. For example, in the finance world:
Technical descriptions should also be based on existing ontologies. WikiProject Informatics has a project dedicated to defining terms related to sofware, and lists other ontologies of this domain:
Last but not least: Common Information Model is an open standard which represents IT elements and their relationships. Although it is far from covering all aspects of an information system, it has a concrete implementation on most operating systems (WMI etc...)
Updating models with custom Technologies
Existing models will certainly not be enough to fully describe an Information System. Therefore, the last step attempts to define concepts and relations specific to this application. Many tools are made to create and update an ontology, for example Protégé or Stardog.
Once the application's formal model is clearly defined, the next step is to fill it by any possible mean: That is, collect, or create tools to collect, as much and diverse information as possible, all based in the model previously defined. The final intention is to merge these information sources into a single repository, giving the full picture of the Information System. Possible information sources are:
Information must be formalized based on this terminology defined in the previous step. At this stage, this information only addresses business concepts; this is not sufficient: The resulting semantic repository created here should also expresses relationships with technical entities.
Some technical objects will have a non-ambiguous definitions: This database, that server etc... but quite probably are defined with their attributes, defined in the terminology:
These attributes will be added when merging "physical" data.
This work is by nature incomplete and is cyclically enhanced..
Creation of dynamic data generated by the IS, with development tools. These technical data are merged to the semantic repository.
This merge creates valuable relations because there are relations between functional and technical objects. These dynamic data might have to be completed with attributes to associate them with static information.
Detecting patterns in the RDF graph database:
Analyse technical documentation with NLP (Natural Language Processing) techniques. This technology is used in the pharmaceutical industry. Gate is the tool of choice for parsing free-form text and extracting semantic data.
Process mining: Automated business process discovery.
Reverse-engineering: Radare2 etc...
Based on the information extracted in the previous step, define the functional building blocs using the terminology defined in the first step. The size and number of these building blocks is not important as long as they are defined in the terminology of the application. This is a common stage for all refactoring tasks..
This approach is well-suited to merge knowledge brought by several teams with different skills, and also in partnership with different companies with distinct skills.