SAMBA (Self-Adaptive Model-BAsed) generates online and executes, at runtime, regression tests on orchestrated SOA applications, where test cases are created from an up-to-date model of the target orchestration.
The Model Generator (MG) and the Model-based Online Test Case Generator (MOT) were developed in NetBeans 8.2, compiled with java 1.8, and hosted in a GlassFish Pyrana 4.2 Server. The source code of the Service Assembly Monitor (SAM) and the Test Service (TS) were written using NetBeans 6.7.1, compiled with java 1.6, and hosted in a GlassFish 2.1.1 Server. Such old versions of Netbeans and GlassFish integrate a BPEL editor and BPEL engine, that run our case study. We selected Glassfish 2.1.1 and the BPEL language to reuse the case study available in our lab, and because of our extensive experience with such technologies. The developed source code is freely available at \cite{b22}.
Since the case study available in our lab uses BPEL orchestrations, the MG we implemented creates the model BPEL orchestrations. Noteworthy, orchestration languages have similar features and are typically described in XML or similar formats, so the same approach applies to non-BPEL orchestrations as BPMN with only syntactical differences.
To create the model, the MG parses the BPEL orchestration to extract information about business process. The business process is converted to a graph with edges and vertices, a simplification of a UML state machine. The edges hold the information required to perform test operations: the name of each edge refers to a service operation used in the orchestration, with the syntax operation_name@service_name. Vertices link the edges in the same order as the operations would be executed in the business process. The model elements and their relations are graphically presented in Fig.~\ref{fig4}.
Fig.~\ref{fig6} is a graphical representation of a model, created from the information available in a json file with the model editor yEd \cite{b23}. Reserved strings (strings with prefixes e_ and v_, like e_init, e_loop and v_INIT) are introduced in the model for two reasons: i) support the test cases generation process of the MOT, and ii) some edges represent operations that are not directly linked to the business process, like operations to store input and output parameters of the orchestration.
The model generated by the MG is used by the TCGen component of the MOT to execute online tests. TCGen was implemented as a REST service running Graphwalker \cite{b24}. Among the open-source solutions for model-based test case generation, Graphwalker was selected as it presents a collection of features that are relevant for our purposes. Graphwalker has built-in REST API with methods to load models, fetch data from the test case generated, restart or abort the test case generation, get and set data to a model. Graphwalker requires a finite state machines (FSM) as input model, and a set of configurations for the test case generation. The model is described in json, and the configurations required for a test case generation are a path generator (it determines the strategy to use when generating a path through a model) and a stop condition (it specifies when the path generation should stop).
The objective of our test campaign is to assess the behavior of SAMBA implementation, to understand if it meets its goals. To plan the test campaign, three cornerstone research questions were defined, and the test campaign was set up specifically to answer them. The questions are:
Q1- Is the solution able to create models representing different orchestrations? This question evaluates if, given a target orchestration, SAMBA is able to create the corresponding model. It also investigates the capability of SAMBA to i) detect updates on the monitored orchestrations, ii) perform the model update, and iii) generate test cases. This question specifically targets SAMBA self-adaptive capability and the behavior of all SAMBA components, with major focus on SAM and MG.
Q2- Is the solution able to detect failures in the orchestration as a whole? This question decides on the capability of SAMBA to detect failures of services (given a correct orchestration). This question specifically refers to test the behavior of MOT and TS.
Q3- Is the solution able to detect orchestrations wrongly implemented? A typical scenario that may occur is that the implementation of the orchestrator does not satisfy functional requirements, for example because of a misunderstanding or a mistake of the developers. SAMBA is in charge of assessing if the orchestration matches a given set of functional requirements i.e., if the implementation of the orchestration adheres to technical specifications. This question specifically refers to test the behavior of MOT and TS.
SAMBA has been applied to a subset of jSeduite \cite{b10}. jSeduite is a free SOA application that deals with information broadcast inside academic institutions. In total, it is composed of thirty-one JAX-WS (Java API for XML Web Services) Web Services representing information sources and eight BPEL orchestrations expressing business processes. These services access data stored in a MySQL database. Some of these services involve external WSs and applications, like Flickr, Twitter, Picasa, RSS Feed Services, News services, weather forecast services. Details on the role and scope of each service can be found in \cite{b10}. Three JSeduite BPEL orchestrations were selected for the test campaign: ImageScraper, HyperTimeTable, FeedReader. ImageScraper relies on six different operations and four WSs, which invoke additional WSs including external WSs as Flickr and Picasa. HyperTimeTable relies on four different operations and two WSs (that invokes additional WSs which manage an online calendar). Finally, FeedReader uses two different operations and two WSs that, amongst the various things, invoke external WSs for feed reading. Table I summarizes the size and complexity of these three different orchestrations.
TABELA1
All experiments are executed on a single laptop. The MOT, MG and the shared knowledge were hosted in a mac OS X EL Captain, which is the main OS of the host machine (MacBok Pro 13-inches late 2011 8Gb RAM). The SAM, TS and jSeduite were hosted in a virtual machine (VM VirtualBox 5.1) running an Ubuntu OS 10.11.6. The communication between services was enabled by the use of a virtual network. All results are available at \cite{b22}.
All tests are performed with the same test case generation parameters, which are set in the MOT. The maximum size for any test case is 30, which means that a test case is composed of at most 30 test events. The test case generator algorithm is a random path generator. Complete edge coverage (which means that all the operations present in the model must be performed) is considered as stop condition for the tests.
The first set of experiments focuses on assessing SAMBA's self-adaptive control loop while stressing the model generation process. We simulate updates of an orchestration on the host server. To do so, different versions of the original orchestration were generated with the aid of the mutant generator tool MuBPEL \cite{b25}, a mutation test tool for BPEL 2.0. The generated mutants are slightly modified versions of the original orchestration, each one with a single syntactic change \cite{b25}. Mutations allow i) evaluating if changes in the orchestration can be detected, and ii) modeling (and testing) a large set of different orchestrations. Each mutant substitutes the original orchestration in the host server. This simulates an evolution of the monitored orchestration that should trigger the self-adaptive control loop of SAMBA. Noteworthy, each mutated orchestration in this experiment is intended to be a correct orchestration: the results of functional tests are evaluated by the TO, and our expectation is that all tests terminate successfully. All three orchestrations ImageScraper, HyperTimeTable, FeedReader are used, and mutants are generated. Each mutant is used for test cases generation, on average, one hundred times: this leads to more than forty-four thousand tests to assess Q1. Table II presents: the total numbers of generated mutants, the models generated, and the percentage of test cases successfully started. SAMBA was able to extract and start test cases for all the models generated, thus matching our expectations.
TABELA2
Table III presents the average size of the test cases for all mutants of each orchestration. It is interesting to notice that some test cases aborted: these tests halted because the maximum test case size was reached (in our configuration, it is 30 test events). This is due to the complexity of the mutated orchestrations. In fact, FeedReader has only two operations on its business process, and it presents 1.79 as the average test case size of its mutants: all the test cases generated by the mutants finished. Instead, ImageScraper has six operations on its business process, and presents 10.19 as average test case size of its mutants: some operations might be performed more than once, which indicates the possibility of a test case generation that reaches the maximum test case size. The percentage of 7.7 aborted test cases supports this claim.
TABELA3
Summarizing, to answer Q1 we generated a total of 447 mutants that represents different but yet correct orchestrations. Each orchestration was individually deployed, and its deployment was correctly detected by the SAM component. For each orchestration, a model was generated by the MG component and a set of tests cases was created by the MOT component. The TS executed the test cases. This confirms our expectations on Q1.
We simulate misbehaviors in the invoked services, so that the intended behavior of the orchestration is altered. For example, this may be caused by a service update, which modifies the behavior of one of the services composing the orchestration. Our objective is to prove that SAMBA can detect such faulty services.
We proceed as follows. We create mutated versions of services, where services are modified in different ways. To create credible mutations of web services, we apply a set of mutation types from the suite µJAVA \cite{b26}: we introduce intra-method, inter-method, and intra-class level faults. Respectively, these represent i) methods implemented incorrectly, ii) faults at the interaction between pairs of methods belonging to the same class, and iii) faults at the interaction between pairs of methods belonging to different classes \cite{b27}. Inter-class level faults are not considered for this experiment, because this kind of mutation focuses on specific characteristics of object-oriented languages \cite{b27}, while our tests mainly focus on the interfaces of web services: situations like encapsulation, inheritance, polymorphism and dynamic binding would only complicate the experiment and not help to the conclusions.
The mutations we generate focus on the interfaces and on the internal behavior of web services operations. Table IV lists all the mutation types and mutation targets used, for the different operations of selected web services. All three orchestrations ImageScraper, HyperTimeTable, FeedReader are used. A total of twelve mutations were introduced in eight different services: the number of mutations is limited with respect to Q1 and Q3, because of the limited possibility that was offered to fully automate service compiling and deployment for the target platform.
TABELA4
Each mutant was deployed and tested three times. All the test cases performed detected misbehavior: the tests cases generated by SAMBA were very effective in detecting failures of orchestrations. In fact, SAMBA uses online test case generation and performs the tests at runtime; the TO and TCGen work alongside the target orchestration, and the test case generation uses responses of test events.
Changes in the interfaces and internal logic of the services are detected, either thanks to the verification of the assertions, or to the usage of services output in the successive test events. For example, Fig.~\ref{fig7} is a sample report of a test case execution. The mutated service modified the output of the get operation of the key_registry web service. The report shows that the TO is not able to detect the misbehavior of the get operation; however, the successive test operation relies on the output of the get operation, and the wrong behavior is here detected by the TO. Summarizing, from our experiments we get encouraging feedback also on Q2.
The third experimental campaign simulates an orchestration that does not satisfy the functional requirements of the application. The three orchestrations ImageScraper, HyperTimeTable, and FeedReader, and their mutations generated in Q1, are used as basis for this experiment. However, this time, the test events of mutated models are compared to the original oracle of the target orchestration i.e., the oracles of ImageScraper, HyperTimeTable, and FeedReader.
In fact, such oracles contain the information on the intended behavior of the orchestration. An orchestration that does not match such information is considered wrongly implemented. As such, in this experiment, mutations of ImageScraper, HyperTimeTable and FeedReader are considered wrong implementations. We expect that SAMBA, thanks to the action of the MG, MOT and TS, will be able to detect mutated orchestrations which do not realize the intended behaviour of the orchestration.
The test case generation process is repeated three times for each available mutant, to guarantee consistency of results. Table V presents the total number of mutants used, and the number of mutants in which one or more test cases generated by the MOT failed (this means that at least one test event failed). For such mutants, it is immediate for SAMBA to notify that a problem is occurring.
Details of the test results are presented in Table VI. It reports the numbers of mutants that: i) failed all tests cases (column only fail), ii) had failed test cases and successful test cases (column fail and complete), iii) included failed test cases and aborted test cases i.e., test cases that halted (column fail and abort).
TABELA5
TABELA6
The results for ImageScraper reveal that some mutated models were able to generate both complete and failed test cases (fail and complete). This indicates that, depending on the test case being generated, test results could be different. By repeating the test case generation process three times for each mutant, we confirmed that the test case generation algorithm is very important to the efficiency of SAMBA in detecting wrong orchestrations. Table VII presents also the average test execution time, and the average size of the test cases generated by the mutated orchestrations.
TABELA7
The remaining mutants (205 for ImageScraper, 140 for HyperTimeTable, 41 for FeedReader) generated test cases that matched the original functional requirements of their respective orchestrations. The reason is that mutations generated by MuBPEL might not affect the business process of the orchestration: mutated orchestrations could generate models that do not lead to failed test executions. Analyzing the results, it is possible to say that SAMBA is able to detect orchestrations wrongly implemented only for specific fault effects. Most precisely, if the effect of the fault alters the functional behavior of the target orchestration, SAMBA is able to detect such deviations. Consequently, we can conclude that the results we get for Q3 are satisfactorily under the above restrictions.