A Diffbot client for RapidMiner 6.1 or above to analyze web pages. It supports the following Diffbot automatic APIs: analyze (general automatic API wrapper) and article (analysis of article webpages; support of it in RapidMiner is experimental).
###Prerequisites RapidMiner Studio 6.1 with Text Processing. The Starter license is sufficient.
-
Install the Diffbot extension from the RapidMiner marketplace (or by copying the plugin to the
lib/plugins
folder of RapidMiner Studio or RapidMiner Server). It requires the Text Processing (and the included Cloud Connectivity) plugins. -
Use the token you got from Diffbot to analyze web pages using the operators available under
Text Processing/Diffbot
. The results are presented as JSON documents. You might prefer to use theJSON To Data
operator to extract information in tabular form.
-
Checkout RapidMiner (e.g. to ~/git/rapidminer; https://github.com/aborg0/rapidminer/tree/extension_java7 is the preferred branch).
-
Install RapidMiner Studio 6.1 (e.g. to ~/rapidminer-studio).
-
Execute the
./setup.sh
script like this:RM_SOURCES=$HOME/git/rapidminer RAPIDMINER_HOME=$HOME/rapidminer-studio ./setup.sh
. It will create a folder named RM_61 in the parent folder (required forant install
). -
Build and install your extension by executing the Ant target "install"
-
Start RapidMiner and check whether your extension has been loaded
##Optional steps
If you prefer, you can update (the file lib/diffbot-java-1.0-SNAPSHOT.jar
) the diffbot-java version you want to use. The current version was compiled from https://github.com/aborg0/diffbot-java-client/releases/tag/vknime0.1