Docker®
is a tool to creates lightweight, portable and self-sufficient
containers for applications, allowing them to run on another
infrastructure, such as a cloud.
Dockerizing an
application, or an entire information system, means converting
it into a docker container image, to run within a Docker
container. A
docker container image is a stand-alone
package of a software including everything needed to run its
application: code, run-time, tools, libraries, settings.
Dockerization is modeled by a
Dockerfile,
a Docker specification text document that contains all the
commands needed to assemble a docker container image. To deploy
the application on another infrastructure, only the image is
needed.
Dockerizing needs knowledge of the internal structure of the
application, or the set of applications to dockerize. It needs to
why which programs are running, their data files, various
connections etc. But, a strategic application may have a long
technical history, made of many different technologies,
undocumented details and complex features. Therefore, estimating
the cost of dockerization is difficult because it needs expert
knowledge in many technical domains.
DockIT
DockIT is Primhill Computer open source tool to help
converting legacy applications into docker container images. It is a
Python command line tool which
monitors running applications and their inputs/outputs, and
generates a Dockerfile, containing all the commands need by Docker,
to build the container image of these applications.
DockIT does not need the source code. Tt can work with any binary
program, whatever its programming language. It detects all
resources, languages, libraries created or accessed by an
application and any of its sub-components, during execution of a
command, or when attaching to a running batch, without stopping it.
The kind of resources detected are for example:
- Files: It is able to classify log files and give a hint of
where they should be moved. Same possibility for stdout and
stderr.
- Concurrent access to the same files by several processes.
- Hidden configuration files and undocumented data files are
also spotted.
- Software libraries and dependencies
- Sub-processes created, and their executable: This inspection
applies recursively to spawned processes.
- Internal, "private" environment variables.
- Network sockets and port numbers.
- Content of IO buffers. For example, DockIT can detect and
parse SQL queries sent to database servers.
Benefits of using DockIT
Documentation or technical
expertise are no longer necessary to have an accurate description of
all IT resources needed by the target application. This result is a
standard and documented Dockerfile, with extra information added as
comments. This file can be updated to adjust to specific needs.
You do not have to spend days studying an application, to know what
is needed to dockerize it: You just need to briefly monitor it with
DockIT:
- Many tedious tasks are done automatically: Enumerating
libraries, modules etc... are all done in one go. One can then
focus on the really difficult tasks of dockerizing an
application.
- DockIT analyses what the code actually does, not only what a
static analysis suggests. Therefore, it lists only the
required libraries and code modules: The Docker images
can therefore be much smaller.
- It is no longer necessary to be skilled in all technologies
used by the application: DockIT can examine programs written in
any language, because it only relies on system calls.
Proprietary libraries without source code are not an obstacle
for reverse-engineering.
- DockIt can formalize the behavior of hidden scripts, than
cannot be detected with static code inspection. Scripts started
dynamically from a sub-process are inspected with their
resources.
When an user is discovering a legacy application, DockIT gives the
significant advantage of understanding the overall scenario of its
execution. This makes software design recovery much simpler.
Scenarios
DockIT is a command-line tool, which can be used two scenarios:
- Executing a command: It then inspects each and every call to
a system function, from the started process and any of its
sub-processes. It stops when the command naturally ends.
- Or it can attach to a running process, and have the same
behavior. It quits when the key Control-C is typed from the
console.
Although DockIT brings a noticeable slow-down to the target
process execution, it is still usable in production context, because
only some system calls are monitored.
There are many command-line options:
-h,--help
This
message.
-v,--verbose
Verbose
mode (Cumulative).
-w,--warning
Display
warnings (Cumulative).
-s,--summary <CIM class>
Prints a summary at the end: Start end end time stamps, executable
name,
loaded
libraries, read/written/created files and timestamps, subprocesses
tree.
Examples:
-s 'Win32_LogicalDisk.DeviceID="C:",Prop1="Value1",Prop2="Value2"'
-s
'CIM_DataFile:Category=["Others","Shared libraries"]'
-D,--dockerfile
Generates
a dockerfile.
-p,--pid
<pid>
Monitors
a running process instead of starting an executable.
-f,--format TXT|CSV|JSON Output
format. Default is TXT.
-F,--summary-format TXT|XML Summary output format.
Default is XML.
-i,--input <file
name> trace command
input file.
-l,--log <filename prefix> trace command
log output file.
-t,--tracer strace|ltrace|cdb command for generating trace log
-S,--server
<Url>
Survol
url for CIM objects updates. Ex:
http://127.0.0.1:80/survol/event_put.py
The execution outputs are:
- A Dockerfile plus all files needed to build a Docker image.
- A log file allowing to reproduce the target process execution.
- An XML file
describing all IT resources and their relations: Which process
created which file, when etc...
- A text file summarizing the execution blocks.
- Events sent to a Survol server, for real-time visualisation.
Dockerfile generation
DockIT is able to generate a Dockerfile skeleton out of any
execution of a process. Depending on the target batch or process,
this skeleton can be just a draft or a complete enumeration of the
resources to dockerize. It enumerates all used resources at the
lowest possible level, and cannot miss one. On the other hand, it
might misinterpret some resources usage: one of the reason is that
it does not have the source code. Despite this, this exhaustive list
of resources, properly catalogued in a Dockerfile, makes
dockerization much easier and reliable. For example, DockIT
provides:
- Enumerating TCP/IP
port numbers.
- Enumerating files: It is able to classify log files and give a
hint of where they should be moved. It is also able to detect
which files are used by several processes. For examples, log
files are detected as such.
- Used libraries and only these ones
- Subprocesses created.
- Environment variables internally used.
Once resources are properly identified, some manual adjustments to
another infrastructure, such as a grid, are possible. For example:
- Because log files are identified as such, the right strategy
to handle logs can be applied: Inside / outside the
container, centralized logging system etc.
- SQL queries
are identified, hinting SQL database connections which have to
be redirected
- File accesses are displaced to a file server. Files statistics
(Accesses vs volume) help choosing the right storage hardware
such as
SSD.
- Parallelism is exposed, helping hardware deployment.
No competition
Dockerization of
application is not an exact science. There are several tools which
share the same purpose. They all have their pro and cons, and their
own specific technologies:
- Static code analysis of the target software.
- Specific programming languages
- Specific software framework or services types, such as
databases.
None of them brings a
general solution because the problem is extremely complicated and
relies on many arbitrary choices.
Because DockIT analyse is
as close as possible to the operating system, it provides results
that cannot be obtained with other tools, and it always provides
them. This unique perspective - analyzing system calls on the fly -
are complementary with other Dockerfile parameters.
This is why it has an
option to edit an existing Dockerfile, adding what is missing and
was not detected before.
Platform
At the moment, DockIT runs on Linux®, and is being ported to
Windows®. It can be ported to other platforms as long as system
calls can be intercepted, for example by some hooking feature of the
target operating system
And Survol ?
DockIT is a distinct tool from Survol. Internally, they handle
the same type of objects and resources as described by the
CIM
industrial standard. Survol and DockIT are two orthogonal
technologies, based on the same concepts, to address and understand
the behavior of running applications in-situ and in-vivo.
Survol displays snapshots,
whereas DockIT traces a temporal behaviour on a time scale: DockIT,
during its execution, reports the life-cycle of detected objects:
Their creation, how they are used by system calls, and their
destruction.