Survol Architecture

ARCHITECTURE

Survol needs to run one agent per machine: a plain HTTP server runs Common Gateway Interface (CGI) Python scripts. This agent and its port number define the identity of each object of this machine, that Survol is able to provide information about. This is the same concept as with WMI and WBEM, both inter-operate completely. Therefore, Survol is able to get information with another machine with no Survol agent: see OpenLMI, WMI etc...

If your browser runs ActiveX, it is also possible to use Survol web pages with no agent.

Survol is based on a tree of Python scripts, all of them display information of the system the agent is running on. Some scripts do not need parameters: they will return information with no context: for example, all available databases, all detected machines on the network, all installed Python modules etc... Other scripts need parameters, as they display information about a specific object. For example: files opened by a specific process, columns of a SQL Server™ table, processes connected to an Oracle™ database.

As a benefit of a CGI-based architecture, it is also possible to run any Python scripts as command-line programs. This allows to test them in isolation with full control on their environment and input, and full visibility on the output. This greatly helps when creating and testing scripts.

DISPLAY MODES AND INTERNAL DATA REPRESENTATION

The internal data model built by Survol, is based on RDF, a standard model for data interchange in the Web. The data representation of Survol is a RDF graph: a set of triples: subject, relation and object. It is also the core data type found in the Semantic web. It is successfully used in Artificial Intelligence applications. This abstract representation is independent of the display mode. The same set of data, extracted from an information system, is shown several ways:

Interactive mode, fully interactive, designed for investigating, browsing and drilling into applications internals. It uses D3, a practical JavaScript library for manipulating graphs.
The print mode based on SVG: the rendered view cannot be modified but is designed to be printed for reports or presentations. The appearance is very close to the interactive one, but the layout is always identical, and the users cannot change it. It uses Graphviz, an open source graph visualization software. However, using Graphviz is optional.
The plain HTML mode, convenient to generate reports in plain text. This reports can be saved and compared between several runs.
A RDF mode which can be used as an input such as Protégé or Apache Jena. Protégé is an ontology editor and framework. Its purpose is to build intelligent systems and knowledge-based solutions. When importing Survol data, it can apply its deduction capabilities on top of Survol investigation.

There are also another mode, used internally:

A JSON mode of use JavaScript-based display. At the moment, the client-side graphic library is D3, with projects to use also Cytoscape, another software platform for visualizing complex network. Survol is extremely modular, and adding a new front-end is an independent task of the application core.

WMI, WBEM, CIM

To start with, some acronyms. WBEM™ stands for Web-Based Enterprise Management, an industry initiative to develop a standard technology for accessing management information in an enterprise environment. This standard is governed by DMTF, Distributed Management Task Force. WMI is the Microsoft implementation of WBEM. In the Linux world, a common implementation of WBEM is OpenLMI. It is based on an open-source implementation named OpenPegasus™. WBEM standard information is delivered with CIM (Common Information Model). The CIM schema is the model for delivering this standard information.

To summarize, WBEM defines classes of computer-related objects: processes, files, network cards etc... have a valid and detailed definition in CIM. Here are some examples:

CIM_Process	This is the base class of a process, which is derived into Win32_Process
CIM_Datafile	A normal file: the concept and its implementation are similar in Windows and Linux.
CIM_Directory	File directories
Win32_UserAccount	A Windows user account. Linux has a specific definition, unrelated to this one.

SURVOL DATA MODEL

One of the core aspects of Survol is its data model which attempts to take the best of the worlds it is working in. A first approach is WBEM. However, the data description offered by WBEM suffers from several drawbacks:

It does not define all the objects we need, because CIM is very neutral, not oriented to applications management. For example, database tables, or sockets, or Python packages or Samba™ servers have no definition. It is indeed possible to add new classes into WIM or WMI, but not in a portable way, and it requires C++ code compilation etc... this is quite a heavy process.
Also: WQL, the CIM query language, has some limitations: It does not allow joins between several classes, and a query can return objects of a single class only. Whereas some information snapshot would require of all sorts of objects to completely describe a situation: Some examples which cannot be described by WQL queries could be: All network files accessed by a process, plus their sockets. Or linkable symbols from a DLL file. Or SQL queries detected in a process memory, plus their tables. These are quite common situations when investigating an application. However, it can be represented with WBEM.
On top of that, WMI and WBEM suffer from several specific performance problems which are difficult to overcome. These problems come from the indistinct processing of objects, whatever their behaviour. For example, selecting all CMI_Datafile objects will return all files on a given machine: This is obviously very difficult to handle.

Despite this, building a software on an industrial standard is invaluably convenient. Therefore, Survol attempts to combine the best of both worlds:

Existing WBEM classes are always reused. When the classes are different in Linux and Windows, but a common base class exist, this base class is chosen. Survol uses only a subset of WBEM ontologies: That is, it uses only the minimal number of WBEM class properties which are necessary to uniquely define an object. However, all detected properties are displayed, when available.
Survol comes with a lightweight ontology and data definition where a class is only defined by a directory in a Python source code tree.
When an object is defined by Survol, its equivalent link into WBEM or WMI is calculated and accessible. WBEM, WMI and Survol use the same moniker strings (Object path).
It is possible to display any WMI or WBEM object and navigate into their repository. Survol is -also- a complete read-only WBEM objects browser. Notably, Survol can "talk" with WBEM or WMI running on distant machines.

The ontologies describing the data model of WMI and WBEM are automatically created and completed by the Survol'specific classes. Primhill Computers freely rovides these ontologies which can be used by any ontology editor. Being created automatically, they are the most accurate semantic model of the internal architecture of the operating system.

WMI
WBEM (not available yet)
Survol (not available yet)

SURVOL CLASSES AND DOMAINS. ADDING NEW CLASSES.

Survol library is made of packages and subpackages which represents its classes model. A class is a module where the function EntityOntology() is present, whether it is defined by the module or one of its parents modules. The return value of EntityOntology() is a list of string, each representing an attribute of the class. If a module nor none of its parent modules define the function EntityOntology(), it is a domain.

A class or a domain can define other functions in its __init__.py file. All of them are optional:

Function	What it does
EntityOntology	This returns an array of strings which are the name of the attributes of an object of this class. This does not apply to domains.
Graphic_colorbg	This returns a RBG colour coded as a hexadecimal string. This colour is used by any UI mode, for this class/domain, unless it is superseded by a subclass or subdomain.
EntityName	This returns a display name for this object, given its attributes passed as an array, in the order specified by the ontology. The hostname is also passed as an argument. The resulting name does have to be unique, but must be printable. This applies to classes only.
AddInfo	When displaying an object, adds some extra information related to it.
Usable	This tells if the module or the script, is usable in this context, given the platform, available packages etc... It can be defined for any module or any script.

SCRIPTS REDUNDANCY

Survol is designed to run in a harsh environment, uncertain platforms, broken libraries, incompatible interfaces, and return as much information as possible about unknown applications. It has to be robust and adapt to any situation. Therefore, it is built, at all stages, on the concept of redundancy.

Redundancy of data display:

Survol can display its results in:

Plain HTML text: Results of the same scripts are guaranteed to be return in the same order, allowing automatic comparison. It can be displayed on a very simple browser.
Static SVG generated by Graphviz, on the server side. It does not need a powerful client machine.
Client-side D3 display: It is not demanding on the server side, but needs a more powerful client.
RDF output, for Semantic Web engine.
JSON output: this is used by D3, its API is stable and can be used by client software too.

Redundancy of information sources:

The core information might come from any source: win32 Python library, any Linux package, WMI and WBEM select statements.

Redundancy of Web hosting:

Scripts can be hosted by Apache, IIS™, the dedicated ad hoc CGI server. Any reasonable HTTP server with CGI or WSGI capabilities should suffice.

Redundancy of Python interpreter and libraries:

It can use Python2 and Python3, over many releases. Survol is not very demanding in terms of Python features.
Packages: Python requires a very small set of modules. If a script needs an optional Python module which is not installed, the script is simply disabled, doing no harm.
Scripts might return information with a significant overlap, from different libraries or mechanisms. This is helpful if one script does not work, as a work-around might exist. And, because objects in Survol, always have the same URL, this does not yield duplicate, when combining, merging the results of several scripts into one: duplicate information is automatically removed when at this stage. Even objects created by ActiveX™ are smoothly paired with the ones created by Python agents. This is the reason why objects URLs do not accept any CGI parameter except the display mode.

CREDENTIALS FILES

Survol scripts access many software resources, some of them being protected with credentials, typically a username and a password. These credentials are made available to Survol with a JSON file, SurvolCredentials.json, residing in the users' home directory. Depending on the user running the HTTP server (Apache, IIS etc...), this home directory may vary. This file can be edited with the "Credentials" page link.

It contains several types of credentials, depending on the type of access: Database software, middleware library etc... For each of this access type, it is possible to add, delete, update any credentials. On the other hand, it is not possible, without extra development, to add new types of credentials as these need specific software to use them.

A typical content of this file might be:

Resource	Account	Password
Azure
Visual Studio Professional	8532-42f4-a66a-c5da7acd
Login
DESKTOP-NI99V8E	johnwin
fedora22	mary
MySql
sqlusertcsrvdb1.mysql.db	sqlusertcsrvdb1
ODBC
MyOracleDataSource	system
Oracle
XE	scott
RabbitMQ
localhost:12345	guest
SqlExpress
192.168.0.14\SQLEXPRESS	jamessql
Survol
http://desktop-ni99v8e/Survol/survol/entity.py
http://win7-hp:8000/survol/entity.py
WBEM
http://192.168.0.17:5988	pegasus
http://WIN7-HP:5988	william
WMI
LAPTOP-B2HEHHF6	user

It is absolutely mandatory to ensure that this file is protected, for security reasons.

Some subcategories of the credentials file are used for static services discovery:

The list of WBEM servers allows to fetch CIM information from thees machines.
The listed Survol agents are fully accessible: Their information can be fully browsed and merged with the information of the agent your are running on.

SERVICES DISCOVERY

A Survol setup can run on several machines, each agent providing a facet of the overall multi-hosted application being investigated. To get an overall vision of this application, it is necessary to discover all the Survol agents available, in order to merge their outputs into a single result, whether this result is a SVG document, a HTML page etc...

There are several mechanisms allowing to discover other Survol agents available in the same network.

Static definition of remote agents in the credentials file, SurvolCredentials.json.
Service Location Protocol (SLP), used for Survol agents discovery, but also for WBEM servers.
The assumption that any Windows machine can run remote WMI requests.