Skip to content

WPS Remote ProcessFactory Orchestrator

afabiani edited this page Oct 10, 2014 · 4 revisions

A Remote Process Factory

The picture above represents the case study schema which has been taken as reference for the analysis of the architecture we are going to propose here.

The aim of this new WPS module, is to make available a ProcessFactory able to recognize external services and generate the WPS Processes acting as interfaces at runtime on GeoServer. In other words, the module will wrap the remote processes specific protocol on a standard WPS Process.

What we call here remote process can be almost anything running some algorythm or process on a remote machine. As an instance this could be a Python script or a command line executable. The only contraint is to have a remote component able to handle few RPCs, like run, progress, complete (which menas collect and send the outcome to the GeoServer machine), execution error (which means if any error occurs report the exception) and kill. On GeoServer side there must be a remote client implementation able to manage the RPCs of the external processes and respecting an Interface for the integration with the WPS.

Going back to the picture, and the implementation of a remote client provided with the module, the remote processes operations are managed by a Service.py Python script which is able to communicate thrgout the XMPP protocol following a simple contract. The Service.py resides into the remote machine and is able to send a presentation of the remote process through an XMPP message by JSON-encoding into the body the process inputs/outputs parameter descriptors along with their type. On the GeoServer side the WPS Remote module automatically recognizes and load and XMPP implementation of the RemoteClient. The client is able to ask for new available services, unmarshall their inputs and outputs and allow the RemoteProcessFactory to automatically create the appropriate WPS Process wrapper. At execution time, the new Process is able to interact with the RemoteClient plugin implementation in order to send a request to the Service.py, follow the status of the remote process and get the outputs at the end.

WPS Remote Package UML schema

Class Diagram

Summarizing

The proposed improvement provides a GeoServer ProcessFactory capable of discovering remote executables at runtime through XMPP and dynamically turn them into new WPS Processes. It will also act as an orchestrator for remote executables, taking advantage of the XMPP Server in order to orchestrate the interactions between them and the GeoServer WPS which has invoked them remotely.

In the proposed infrastructure, the XMPP Server will be implemented by Openfire XMPP , an OpenSource package available on Linux/CentOS distributions, deployed on a physical machine accessible from the GeoServer WPS and the Remote Python Processing Nodes/Hosts. XMPP Server implementations are available for other operating systems as well.

To run a new executable via GeoServer WPS it is enough to deploy it on one or more remote hosts and add a new Properties File in a location known to the Service.py. In this way the new WPS Process will be directly exposed to GeoServer without restarting the GeoServer itself. The end user will see as many WPS process as properties configuration files on the remote machines thanks to the functionalities implemented by the Service.py script. Orchestration will be performed by the newly created GeoServer ProcessFactory upon a WPS execution request with the help of the XMPP Server which provides out-of-the-box nodes presence discovery and load balancing capabilities for the remote node through the interaction with the Service.py script.

The orchestrator will also be responsible for redirecting the messages generated by running executables on the remote machines to the correct GeoServer process and vice versa for control messages. Executables running on the remote hosts through the Service.py wrapper will generate progress information which will be sent back to the orchestrator via XMPP. The orchestrator will handle the way remote the scripting processes are called, e.g. one process per user, one process per remote host, etc.

It is worth to remark on the fact that we will account for two levels of load balancing. One on the GeoServer side and one on the remote processing nodes side. The GeoServer load balancer will know which user has requested the processing and will be able to enforce system and resources limitations associated to him; as an instance, a user won’t be able to run more than M processes in parallel and a quota for both the inputs and outputs will be assigned to its executions. The Orchestrator, will be responsible in cooperation with the XMPP service, balancing the load for a certain WPS process on the remote hosts and for deciding where to send a new execution request in case a certain executable has been deployed on multiple remote machines.

It is worth to point out that the GeoServer instances comprised in the cluster will share a specific directory where the WPS specific resources will be stored. However the remote hosts where the executable will run (in general) cannot share the file system with the GeoServer machines, hence a data exchange mechanism should be provided (see the following sections).

Invoking remote processes and consuming the results

The executables on the remote hosts will be invoked through the command line by the Service.py Python script (see Illustration 14) deployed onto them in response to execution commands received from the Orchestrator via XMPP messages as the result for WPS Execute call for GeoServer Processes.

The executable output that will be reported to the GeoServer Ochestrator will be a file containing a description of the outputs generated as well as an indication of the actions to be performed on them following and extending the WPS syntax. As an instance a process could generate multiple raster dataset to be ingested back in GeoServer as multiple layers; the executable output to be reported to the orchestrator will be a descriptor describing the mentioned raster plus the actions to be performed, e.g. import action.

In case the remote executable would produce additional files, like in the example above, they should be made available to the GeoServer instances (take into account that we cannot assume that a shared file system will be available between them and the machine where the results are produced).

In this proposal we intend to use FTP as a transfer mechanism between the remote machines and the GeoServer instances shared file system. A more advanced solution is the used of shared Network File System among all the nodes and the GeoServers.

Remote process discovery and execution via the Service.py script

On the remote machines the Service.py Python scripts will act as XMPP clients providing the ability to:

  • launch command-line executables using the inputs provided by the end user from the GeoServer process orchestrator
  • send back to the orchestrator status messages with the progress and status of the execution
  • receive eventual control messages from the orchestrator to tentatively kill running executions
  • perform clean-up operations like looking for zombie executions and killing them or removing stale files on the file system from old executions
  • perform runtime discovery of new remote executable

As mentioned above remote executables should adhere to a certain contract in order to allow us to fully support all these functionalities. As an example the ability to kill an existing execution heavily relies on the fact that the current process accounted for this functionality, otherwise we would have to try and kill it using operating system calls. This might require the end use to create wrappers around the legacy executables in certain cases.

Runtime discovery of new remote executable will be supported through adding new Properties files to a location known to the Service.py module. A new WPS Process will be directly exposed to GeoServer without restarting the GeoServer itself for the new remote executable since the Service.py will communicate the existence of a new executable by interacting with the GeoServer Orchestrator via XMPP messages.

The end user will see as many WPS process as properties configuration files on the remote machines thanks to the functionalities implemented by the Service.py script.

An example of such a properties file can be found here below:

[Description]
service = Service
namespace = default
description = A dummy service, replace with your own description
executable = process.py
process_buffer = 0
result_size = 0
active = True

[Options]
customargs = --path=D:\user\
argformat = --key=value
debug = True

[Input]
name = {"type": "string", "description": "A person name", "enum": ["Hans", "Peter", "Alex", "Michi"], "default": "Hans", "max": 1}
surname = {"type": "string", "description": "A persons surname", "max": 1, "default": "Meier"}
child = {"type": "string", "description": "A childs name", "min": 0, "max": 10}

[Output]
welcome = {"type": "string", "description": "A welcome message"}
goodbye = {"type": "string", "description": "A goodbye message"}

Configure the XMPP stack

In order to successfully test the XMPP plugin for the Remote Client, it is necessary to setup and configure an XMPP server and to deploy the Service.py wrapper for a remote script.

Install and configure the XMPP Server

The OpenFire XMPP Server from Ignite Realtime is a very goor choice for an Open Source XMPP implementation. It is developed in Java, and therefore is cross-plataform, and widely used and stable. The installation of the package is quick and easy on every platform. On Linux you can find the packages available directly from the official distributions. In Windows is available a easy installer which configures the XMPP Server folder automatically.

The distribution package can be found here Ignite Realtime OpenFire XMPP Server

The installation requires a connection to a DB, MySQL by default. Once the OpenFire has been installed, start the service. Access to the OpenFire administration console through the browser. By default the management GUI starts at port 9090

Go to the "Server Certificates" menu item, and generate the security certificated for the Server. This passage is important in order to allow the Service.py to authenticate and be trusted by the server. Those certificates must be exported and loaded into the trusted certificated of the JVM used to run GeoServer later. This procedure is detailed on the next sections.

Now move to the "Group Chat" tab item. Assign a name to the service and note it. The same name must be provided into the configuration fo both GeoServer WPS Remote module and Service.py.

The next step is to create three rooms, the management room, the default room and a service room which may have a custom name (in the sample below geosolutions). Those names must also be specified in the services configurations.

The XMPP server configuration is done, at least for a simple demo. It is possible that the setup requires more work on certain systems. For instance it is possible that the service ports are not available and therefore must be changed and it's also possible that the system has some specific security constraints.

Export the OpenFire certificates

This procedure is necessary in order to allow the GeoServer JVM to communicate with the XMPP Server secured connection.

In order to export the certificates run the following command on the OpenFire machine

/opt/openfire/resources/security

/opt/openfire/jre/bin/keytool -list -keystore keystore
/opt/openfire/jre/bin/keytool -export -alias whale.nurc.nato.int_rsa -file whale.nurc.nato.int_rsa -keystore keystore
/opt/openfire/jre/bin/keytool -export -alias whale.nurc.nato.int_dsa -file whale.nurc.nato.int_dsa -keystore keystore

NOTICE that by default the password of the keystore is changeit unless someone has modified it.

On the GeoServer machine run the following commands

keytool -import -alias whale.nurc.nato.int_dsa -file whale.nurc.nato.int_dsa -keystore $JAVA_HOME\jre\lib\security\cacerts
keytool -import -alias whale.nurc.nato.int_rsa -file whale.nurc.nato.int_rsa -keystore $JAVA_HOME\jre\lib\security\cacerts

NOTICE that the certificates must be imported into the cacerts of the JDK used to run GeoServer.

Finalizing the configuration and startup the Service.py

The setup of the service is almost done. The final steps are to modify the .config files of the Service.py service accordingly to the OpenFire XMPP Server setup. The same must be done for the applicationContext.xml file of the GeoServer wps-remote module.

Once the configuration is done, start the Service.py with

python service.py -r

Run GeoServer and execute the process

Start GeoServer. On the WPS Processes list should be possible to find the default:Service Process.

Providing the inputs and running the Execute Process should start the remote processing

The picture above shows the script running on the remote machine, and the following one shows the outcomes reported by GeoServer through the WPS Protocol