Modularization

#Modularization how-to

##Basics

The basic interface that all of the processing components should have in common should be modules.Module. However, since most of the methods required by that interface are implemented within modules.ModuleImpl, it is more convenient to just let your module extend that abstract class.

If you choose this way, you will have minimal work and only have to override the following methods:
public boolean process() --- the part where the actual processing takes place.
protected void applyProperties() --- this is in fact only needed if your module has own properties that can be applied.

Input and output of the modules are done through specifiable ports which can connect to other modules' ports using one of two types of pipes, namely modules.BytePipe and CharPipe. The former hosts both a PipedInputStream and PipedOutputStream (connected to each other), the latter PipedReader and PipedWriter respectively.

Each module has an arbitrary number of input and output ports, which have to be set up in the module's constructor, e.g.

InputPort input = new InputPort("input", "Plain text character input.", this);
input.addSupportedPipe(CharPipe.class);
OutputPort output = new OutputPort("output", "Plain text character output.", this);
output.addSupportedPipe(CharPipe.class);

super.addInputPort(input);
super.addOutputPort(output);

Also in your constructor, you can add meta information about the module and its properties, e.g.

// Add default values
this.getPropertyDefaultValues().put(ModuleImpl.PROPERTYKEY_NAME, "AtomicRangeSuffixTreeBuilder");
this.getPropertyDefaultValues().put(PROPERTYKEY_MAXLENGTH, "10");

// Add module description
this.setDescription("Iterates over a raw and unsegmented string input, building a suffix tree from the data of limited range with each step. Keeps track of how often each node of the suffix tree gets triggered.");

// Add description for properties
this.getPropertyDescriptions().put(PROPERTYKEY_MAXLENGTH,"Define the maximum length of any branch of the tree.");

Adding this information is in fact mandatory because the discovery of a module's accepted properties is done through its list of property descriptions.

I/O

In the process method, you will need to read input data and write your results to output; here's how you can do it smoothly:

Read

You can either use the read(...) method this.getInputPorts().get(String inputId) provides, or you can access the Piped…--classes directly, like this for example (this parses a file array from the input using the Google gson library):

inputFileList = gson.fromJson(this.getInputPorts().get("input").getInputReader(), new File[0].getClass());

Another handy way of using the input reader is to wrap around a Scanner object, which can read segments specified by delimiters:

Scanner inputScanner = new Scanner(this.getInputPorts().get("input").getInputReader()).useDelimiter("\\s+");
if (inputScanner.hasNext()){
    // Determine next segment
    String inputSegment = inputScanner.next();
}

Write

Since there can be many pipes attached to each output port (to serve multiple modules' input ports in one go), you will best use the this.getOutputPorts().get(String outputId).outputToAllCharPipes(String data) or .outputToAllBytePipes(byte[] data) method. It will write your data to all connected pipes in a performance-conscious manner.

Examples

For further reference, do have a look at modules.examples and the basic modules.

Using modules

Modules are meant to be connected by their I/O Ports and started in parallel, so that the processed data trickles from module to module. Details follow below.

Parallelization

The Module interface extends the interface common.parallelization.CallbackProcess (see Parallelization), but all necessary methods are already implemented within modules.ModuleImpl so you can use modules that extend that class in conjunction with a callback receiver without any additional work.

Module networks

To facilitate the whole module-port-pipe-port-anothermodule business, the class modules.ModuleNetwork exists. It keeps track of an arbitrary number of modules that can be arranged in a network-like structure, meaning each module can connect any of its output ports to as many input ports as needed. The other way around this is restricted so that an input port can only connect to one output port, for obvious reasons. Also, please note that noone will stop you from connecting a module to itself or forming circles, but you shouldn't expect this to work in any useful way.

Probably the best way to get an overview of how to handle module trees manually is to have a look at the class modularization.ModuleTreeTest -- it shows the basic usage of the modules.ModuleNetwork class. For all other purposes it might be best to rely on the existing base.workbench.ModuleWorkbenchGui or base.workbench.ModuleBatchRunner programs for module organisation and execution.

Making a module available on the workbench app

If you got your module ready, you will have to make it known within the controller at base.workbench.ModuleWorkbenchController. There you can instantiate and append it to the internal list that organizes the modules available to the user.

You will find a comment line in that file that says INSTANTIATE MODULES BELOW and ADD MODULE INSTANCES TO LIST BELOW which mark the correct place to do just that. An instantiation could look like this:

createAndRegisterModule(FileReaderModule.class);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly