Skip to content
This repository has been archived by the owner on May 6, 2021. It is now read-only.

Latest commit

 

History

History
822 lines (644 loc) · 38.4 KB

README.md

File metadata and controls

822 lines (644 loc) · 38.4 KB

GraphAware Neo4j Recommendation Engine - RETIRED

GraphAware Neo4j Recommendation Engine Has Been Retired

As of May 2021, this repository has been retired.

GraphAware Neo4j Recommendation Engine is a library for building high-performance complex recommendation engines atop Neo4j. It is in production at a number of GraphAware's clients producing real-time recommendations on graphs with hundreds of millions of nodes.

Key Features:

  • Clean and flexible design
  • High performance
  • Ability to trade off recommendation quality for speed
  • Ability to pre-compute recommendations
  • Built-in algorithms and functions
  • Ability to measure recommendation quality
  • Ability to easily run in A/B test environments

The library imposes a specific recommendation engine architecture, which has emerged from our experience building recommendation engines on top of Neo4j. In return, it offers high performance and handles most of the plumbing so that you only write the recommendation business logic specific to your use case.

Besides computing recommendations in real-time, it also allows for pre-computing recommendations that are perhaps too complex to compute in real-time. The pre-computing happens on best-effort basis during quiet periods, so that it does not interfere with regular transaction processing that your Neo4j database is performing.

Community vs Enterprise

This open-source (GPL) version of the module is compatible with GraphAware Framework Community (GPL), which in turn is compatible with Neo4j Community Edition (GPL) only. It will not work with Neo4j Enterprise Edition, which is a proprietary and commercial software product of Neo4j, Inc..

GraphAware offers an Enterprise version of the GraphAware Framework to licensed users of Neo4j Enterprise Edition. Please get in touch to receive access.

Getting the Software

Server Mode

When using Neo4j in the standalone server mode, you will need the GraphAware Neo4j Framework and GraphAware Neo4j Recommendation Engine .jar files (both of which you can download here) dropped into the plugins directory of your Neo4j installation.

Unlike with other GraphAware Framework Modules, you will need to write at least a few lines of your own Java code (read on).

Embedded Mode / Java Development

Java developers that use Neo4j in embedded mode and those developing Neo4j server plugins, unmanaged extensions, GraphAware Runtime Modules, or Spring MVC Controllers can include the module as a dependency for their Java project.

Releases

Releases are synced to Maven Central repository. When using Maven for dependency management, include the following dependency in your pom.xml and edit the version number.

<dependencies>
    ...
    <dependency>
        <groupId>com.graphaware.neo4j</groupId>
        <artifactId>recommendation-engine</artifactId>
        <version>A.B.C.D.E</version>
    </dependency>
    ...
</dependencies>

Snapshots

To use the latest development version, just clone this repository, run mvn clean install and change the version in the dependency above to A.B.C.D.E-SNAPSHOT.

Note on Versioning Scheme

The version number has two parts. The first four numbers indicate compatibility with Neo4j GraphAware Framework. The last number is the version of the Recommendation Engine library. For example, version 2.1.6.26.1 is version 1 of the Recommendation Engine compatible with GraphAware Neo4j Framework 2.1.6.26.

Introduction to GraphAware Recommendation Engine

The purpose of a recommendation engine is (unsurprisingly) to recommend something to users. This could be products they should buy, users they should connect with, artists they should follow, etc. It turns out graph is a really good data structure for representing users' interests, behaviours, and other characteristics that might be useful for finding recommendations. More importantly, graph databases, and Neo4j especially, provide a natural way of expressing queries on this data in order to find relevant recommendations, and executing these queries very fast.

There are three main challenges when building a recommendation engine. The first is to discover the items to recommend. The second is to choose the most relevant ones to present to the user. Finally, the third challenge is to find relevant recommendations as quickly as possible. Preferably, this should happen in real-time, i.e. using the most up to date information we have. The last thing we want to do is to recommend something the user has already purchased, or a person we know she isn't interested in.

The first two points above are business rather than technical challenges. Typically, when you start building a recommendation engine, you have some idea about how the recommended items will be discovered. For instance, you might want to recommend items that other people with similar interests have bought. You also know, which items you absolutely do not want to recommend, for example, items the user has already purchased, or people that we know are married as a potential match for a date.

The issue with recommendation relevance is usually something that needs to be experimented with. When building the first recommendation engine, or perhaps even a proof of concept, one feature that shouldn't be missing is the ability to configure how the recommendation relevance is computed and, perhaps more importantly, measure how users react to recommendations produced by different relevance-computing configurations.

Finally, let's address the issue of speed, which is of a technical nature. When serving real-time recommendations, users shouldn't need to wait for more than, let's say, a couple hundred milliseconds. With Neo4j, we will be able to build many different recommendation queries that take milliseconds to execute. However, there are situations (large graphs with some very dense nodes) where we will need to take extra care in order not to slow the recommendation process down. Finally, in situations where the recommendation logic and the size of the graph simply don't allow real-time computation, we will need to look at pre-computing some recommendations, whilst avoiding the dangers of serving out of date recommendations.

Recommendation Engine Architecture

The architecture of GraphAware Neo4j Recommendation Engine has been designed to address, or easily allow you to address, all of the above challenges. The library works with the following concepts:

Recommendation Engines and Recommendations

A Recommendation Engine is a component that produces Recommendations, given an Input. Whilst the architecture is generic enough to support other persistence mechanisms, we focus on Neo4j and so the input will typically be a Neo4j Node representing a user for whom we want to find recommendations, a product for which we want to find buyers, etc.

A RecommendationEngine, as in the case of DelegatingRecommendationEngine can be composed of other RecommendationEngines that it delegates to. Usually, however, a RecommendationEngine will encapsulate the querying and relevance-computing logic for discovering recommendations based on a single logical criterion. Such engine typically extends SingleScoreRecommendationEngine. For example, we could have one engine that discovers items a user may want to buy based on what other users with similar tastes have bought. Another engine would discover items based on user's expressed preferences. Yet another one could discover items to be recommended based on what is currently trending.

For performance reasons as well as to achieve good encapsulation, RecommendationEngines are only concerned with discovering and scoring all potential recommendations, without caring about the fact that some recommendations discovered this way may not be suitable, perhaps because the user has already purchased the discovered item. Removing irrelevant recommendations will be discussed shortly.

Scores and Score Transformers

Recommendations are a collection of tuples/pairs, where each pair is composed of a recommended item (again, typically a Node) and associated relevance Score. The Score is composed of named Partial Scores. Each Partial Score has a float value and optionally some extra details about how and why it has been computed that we with to expose to the users. Typically, a single SingleScoreRecommendationEngine, as the name suggest, is responsible for a single Partial Score.

When an item has been discovered as a potential recommendation by multiple SingleScoreRecommendationEngines, its Parial Scores will be tallied by the Score object. For example, an item that is currently trending and matches the user's preferred tastes will have a total relevance Score composed of two Partial Scores, one due to the fact that it is trending, and another one because it is a preferred item.

In some cases, an item might be discovered multiple times by the same SingleScoreRecommendationEngine. For example, we may have an engine that suggests people a user should be friends with based on the fact that they have some friends in common. Assuming an easy-to-imagine graph traversal that discovers these recommendations, a potential friend will be discovered three times if he has three friends in common with the user we're computing recommendations for. However, each additional friend in common might not bear the same relevance for the recommendation. Thus, each Partial Score can have a Score Transformer applied to it. A ScoreTransformer can apply an arbitrary mathematical function to the Partial Score computed by a SingleScoreRecommendationEngine.

Context

Recommendations are always computed within a Context. Whilst each recommendation-computing process for a single input might involve multiple RecommendationEngines and other components, there is usually a single Context per computation that encapsulates information relevant to the process. For example, the Context knows whether a potential recommendation discovered by a RecommendationEngine is allowed to be served to the user. For each computation, a new Context is produced by TopLevelRecommendationEngine.

Config

The Context also encapsulates a Config for each recommendation-computing process. This is a set of user-defined values. By default, a Config knows how many recommendations should be produce and what is the maximum time the recommendation-computing process should take. Optionally, arbitrary key-value pairs can be passed in, which is useful for scenarios when score values, rewards, penalties, and other variables should not be hard-coded and differ per computation.

Blacklist Builders and Filters

Rather than requiring all RecommendationEngines to know how to detect irrelevant recommendations (thus slowing the computation down and scattering a single concern), the logic is centralised into Blacklist Builders and Filters. BlacklistBuilders, as the name suggests, are responsible for building "blacklists" of items that must not be recommended for a given input.

Assuming that the input is a Node representing a person, an example of a BlacklistBuilder could be AlreadyPurchasedItems which builds a blacklist of items that the person has already purchased. BlacklistBuilders are most efficient in situations where a small number of irrelevant recommendations (let's say up to 100) can be discovered with a single query before the recommendation process begins.

Filters, on the other hand, can tell whether a recommendation is relevant or not by looking at the recommendation itself once it has been discovered. An obvious example of a Filter could be a class called ExcludeSelf, which would make sure that (for example) a recommended friend isn't the same Node that the recommendations are being computed for. Another example of a Filter could be ExcludeItemsOutOfStock, or ExcludeMarriedPeople.

Blacklists produced by BlacklistBuilders and Filters are typically passed to an instance of Context (usually FilteringContext), which uses them to exclude irrelevant recommendations.

Post Processors

In the presence of "supernodes", i.e. nodes with disproportionately many relationships, it would too expensive to compute recommendations using dedicated RecommendationEngines. Imagine, for example, that we would like to boost the score of people living in the same city as the person we're computing recommendations for. Rather than implementing a RecommendationEngine that discovers all people living in the same city (which could be millions!), we can implement a PostProcessor which modifies the score of already computed recommendations. In the example above, a PostProcessor called RewardSameCity could add 50 points to each recommendation if the person we're recommending to and the recommended person live in the same city. It is much quicker to perform this check for each recommendation than discovering all people living in the same city.

Other examples of a PostProcessor could include RewardSameGender, PenalizeAgeDifference, etc.

Pre-Computation

Once we've built a RecommendationEngine, we could use it to continuously pre-compute recommendations when the database isn't busy, using GraphAware Timer-Driven Module. For each potential input, we could pre-compute a number of recommendations and link them to the input using a RECOMMEND relationship. When serving recommendations, we could read them directly from the database, rather than computing them in real-time. Blacklists and Filters are still consulted in case the situation has changed since the time recommendations were pre-computed.

Logging

Each produced Recommendation has a String UUID, so that it can be uniquely identified. This is useful, for example, when we want to measure the quality of recommendations. We can store the Scores of different Recommendations as well as how users reacted to them against their UUIDs. For this purpose, we can use a Logger implementation. A Logger records recommendations for later analysis. There are provided implementation for logging using slf4j, but you can create your own to store the data in Cassandra or wherever you want.

Using GraphAware Neo4j Recommendation Engine

The best place to start is by having a look at the ModuleIntegrationTest class and the other classes it uses. Also, the classes in this library have a decent Javadoc, which should help you get building your first recommendation engine. Feel free to get in touch for support ([email protected]).

We will illustrate how easy it is to build a recommendation using an example. Let's say we have a graph of people, i.e. Nodes with label :Person. Moreover, each :Person also has a :Male or a :Female label, and two properties: a name (String) and an age (integer). We will also have Nodes with label :City and a name property.

The only two relationship types in our simple graph will be FRIEND_OF and LIVES_IN and we will assume friendships are mutual, thus ignore the direction of the FRIEND_OF relationship. A sample graph, expressed in Cypher, could look like this:

    CREATE
    (m:Person:Male {name:'Michal', age:30}),
    (d:Person:Female {name:'Daniela', age:20}),
    (v:Person:Male {name:'Vince', age:40}),
    (a:Person:Male {name:'Adam', age:30}),
    (b:Person:Female {name:'Britney', age:12}),
    (l:Person:Female {name:'Luanne', age:25}),
    (c:Person:Male {name:'Christophe', age:60}),
    (j:Person:Male {name:'Jim', age:40}),

    (lon:City {name:'London'}),
    (mum:City {name:'Mumbai'}),
    (br:City {name:'Bruges'}),

    (m)-[:FRIEND_OF]->(d),
    (m)-[:FRIEND_OF]->(l),
    (m)-[:FRIEND_OF]->(a),
    (m)-[:FRIEND_OF]->(b),
    (m)-[:FRIEND_OF]->(v),
    (d)-[:FRIEND_OF]->(v),
    (b)-[:FRIEND_OF]->(v),
    (j)-[:FRIEND_OF]->(v),
    (j)-[:FRIEND_OF]->(m),
    (j)-[:FRIEND_OF]->(a),
    (a)-[:LIVES_IN]->(lon),
    (d)-[:LIVES_IN]->(lon),
    (v)-[:LIVES_IN]->(lon),
    (m)-[:LIVES_IN]->(lon),
    (j)-[:LIVES_IN]->(lon),
    (c)-[:LIVES_IN]->(br),
    (b)-[:LIVES_IN]->(br),
    (l)-[:LIVES_IN]->(mum);

Our intention will be recommending people a person should be friends with, based on the following requirements:

  1. The more friends in common two people have, the more likely it is they should become friends
  2. The difference between zero and one friends in common should be significant and each additional friend in common should increase the recommendation relevance by a smaller magnitude.
  3. If people live in the same city, the chance of them becoming friends increases
  4. If people are of the same gender, the chance of them becoming friends is greater than if they are of opposite genders
  5. The bigger the age difference between two people, the lower the chance they will become friends
  6. People should not be friends with themselves
  7. People who are already friends should not be recommended as potential friends
  8. Young users should not be recommended to anyone as potential friends. The definition of "young" should be configurable per computation
  9. If we don't have enough recommendations, we will recommend some random people, but only if there is enough time

Let's start tackling the requirements one by one.

Real-Time Recommendations

FriendsInCommon

First, we will build a RecommendationEngine that finds recommendations based on friends in common. For each friend in common, the relevance score will increase by 1. Since this is a single-criterion RecommendationEngine, we will extend SingleScoreRecommendationEngine as follows:

/**
 * {@link com.graphaware.reco.generic.engine.RecommendationEngine} that finds recommendation based on friends in common.
 */
public class FriendsInCommon extends SomethingInCommon {

    @Override
    protected String name() {
        return "friendsInCommon";
    }

    @Override
    protected RelationshipType getType() {
        return Relationships.FRIEND_OF;
    }

    @Override
    protected Direction getDirection() {
        return Direction.BOTH;
    }
}

The code above tackles requirement (1). Let's modify the code to account for requirement (2) as well by providing an exponential ScoreTransformer, called the ParetoScoreTransformer. Please read the Javadoc of the class to find out exactly how it works. For now, it is sufficient to say that it will transform the number of friends in common to a score with a theoretical upper value of 100, with 80% of the total score being achieved by having 10 friends in common.

/**
 * {@link com.graphaware.reco.generic.engine.RecommendationEngine} that finds recommendation based on friends in common.
 * <p/>
 * The score is increasing by Pareto function, achieving 80% score with 10 friends in common. The maximum score is 100.
 */
public class FriendsInCommon extends SomethingInCommon {

    private ScoreTransformer scoreTransformer = new ParetoScoreTransformer(100, 10);

    @Override
    public String name() {
        return "friendsInCommon";
    }

    @Override
    protected ScoreTransformer scoreTransformer() {
        return scoreTransformer;
    }

    @Override
    protected RelationshipType getType() {
        return Relationships.FRIEND_OF;
    }

    @Override
    protected Direction getDirection() {
        return BOTH;
    }

    @Override
    protected Map<String, Object> details(Node thingInCommon, Relationship withInput, Relationship withOutput) {
        return Collections.singletonMap("name", thingInCommon.getProperty("name"));
    }
}

FriendsInCommon

Whilst we're at it, we will also build the other SingleScoreRecommendationEngine that we'll need to satisfy requirement (8). Notice that we are overriding the participationPolicy method to specify that this engine should only be employed if there aren't enough results and there is time left.

 /**
  * {@link com.graphaware.reco.neo4j.engine.RandomRecommendations} selecting random nodes with "Person" label.
  */
 public class RandomPeople extends RandomRecommendations {

     @Override
     public String name() {
         return "random";
     }

     @Override
     protected NodeInclusionPolicy getPolicy() {
         return new BaseNodeInclusionPolicy() {
             @Override
             public boolean include(Node node) {
                 return node.hasLabel(DynamicLabel.label("Person"));
             }
         };
     }

     @Override
     public ParticipationPolicy<Node, Node> participationPolicy(Context context) {
         return ParticipationPolicy.IF_MORE_RESULTS_NEEDED_AND_ENOUGH_TIME;
     }
 }

RewardSameLocation and RewardSameLabels

We will tackle requirements (3) and (4) by implementing some PostProcessors rather than separate RecommendationEngines. The reason is mainly performance; we do not want to suggest everyone who lives in the same city or who is of the same gender. Instead, we will reward already discovered recommendations for living in the same city or being of the same gender, by the following two classes:

/**
 * Rewards same location by 10 points.
 */
public class RewardSameLocation extends RewardSomethingShared {

    @Override
    protected String name() {
        return "sameLocation";
    }

    @Override
    protected RelationshipType type() {
        return LIVES_IN;
    }

    @Override
    protected Direction direction() {
        return OUTGOING;
    }

    @Override
    protected PartialScore partialScore(Node recommendation, Node input, Node sharedThing) {
        return new PartialScore(10, Collections.singletonMap("location", sharedThing.getProperty("name")));
    }
}
/**
 * Rewards same gender (exactly the same labels) by 10 points.
 */
public class RewardSameLabels extends BasePostProcessor<Node, Node> {

    @Override
    protected String name() {
        return "sameGender";
    }

    @Override
    protected void doPostProcess(Recommendations<Node> recommendations, Node input, Context<Node, Node> context) {
        Label[] inputLabels = toArray(Label.class, input.getLabels());

        for (Recommendation<Node> recommendation : recommendations.get()) {
            if (Arrays.equals(inputLabels, toArray(Label.class, recommendation.getItem().getLabels()))) {
                recommendation.add(name(), 10);
            }
        }
    }
}

Please note that we have chosen to provide the shared location's name as details to PartialScore, so that it can be eventually exposed to users.

PenalizeAgeDifference

Another PostProcessor will take care of requirement (5). We will subtract a maximum of 10 points from the relevance score with 80% being subtracted when the difference in age is 20 years.

/**
 * Subtracts points for difference in age. The maximum number of points subtracted is 10 and 80% of that is achieved
 * when the difference is 20 years.
 */
public class PenalizeAgeDifference extends BasePostProcessor<Node, Node> {

    private final TransformationFunction function = new ParetoFunction(10, 20);

    @Override
    protected String name() {
        return "ageDifference";
    }

    @Override
    protected void doPostProcess(Recommendations<Node> recommendations, Node input, Context<Node, Node> context) {
        int age = getInt(input, "age", 40);

        for (Recommendation<Node> reco : recommendations.get()) {
            int diff = Math.abs(getInt(reco.getItem(), "age", 40) - age);
            reco.add(name(), -function.transform(diff));
        }
    }
}

Blacklist Builders and Filters

We could build custom BlacklistBuilders and Filters as well to satisfy requirements (6) and (7), but we will just use classes already provided by the library, as we will see shortly.

Putting it all together

Now that we have all the components that satisfy all 8 requirements, we just need to combine them into a TopLevelRecommendationEngine:

/**
 * {@link com.graphaware.reco.neo4j.engine.Neo4jTopLevelDelegatingRecommendationEngine} that computes friend recommendations.
 */
public class FriendsComputingEngine extends Neo4jTopLevelDelegatingRecommendationEngine {

    @Override
    protected List<RecommendationEngine<Node, Node>> engines() {
        return Arrays.<RecommendationEngine<Node, Node>>asList(
                new FriendsInCommon(),
                new RandomPeople()
        );
    }

    @Override
    protected List<PostProcessor<Node, Node>> postProcessors() {
        return Arrays.<PostProcessor<Node, Node>>asList(
                new RewardSameLabels(),
                new RewardSameLocation(),
                new PenalizeAgeDifference()
        );
    }

    @Override
    protected List<BlacklistBuilder<Node, Node>> blacklistBuilders() {
        return Arrays.<BlacklistBuilder<Node, Node>>asList(
                new ExistingRelationshipBlacklistBuilder(FRIEND_OF, BOTH)
        );
    }

    @Override
    protected List<Filter<Node, Node>> filters() {
        return Arrays.<Filter<Node, Node>>asList(
                new ExcludeSelf()
        );
    }
}

A quick integration test

In this example, we have neglected unit testing altogether, which, of course, you shouldn't do. We will build a simple integration test though in order to smoke-test our brand new recommendation engine.

public class ModuleIntegrationTest extends WrappingServerIntegrationTest {

    private Neo4jTopLevelDelegatingEngine recommendationEngine;
    private RecommendationsRememberingLogger rememberingLogger = new RecommendationsRememberingLogger();

    @Override
    public void setUp() throws Exception {
        super.setUp();
        recommendationEngine = new FriendsRecommendationEngine();
        rememberingLogger.clear();
    }

    @Override
    protected void populateDatabase(GraphDatabaseService database) {
        database.execute(
                "CREATE " +
                        "(m:Person:Male {name:'Michal', age:30})," +
                        "(d:Person:Female {name:'Daniela', age:20})," +
                        "(v:Person:Male {name:'Vince', age:40})," +
                        "(a:Person:Male {name:'Adam', age:30})," +
                        "(l:Person:Female {name:'Luanne', age:25})," +
                        "(b:Person:Male {name:'Christophe', age:60})," +
                        "(j:Person:Male {name:'Jim', age:38})," +

                        "(lon:City {name:'London'})," +
                        "(mum:City {name:'Mumbai'})," +
                        "(br:City {name:'Bruges'})," +

                        "(m)-[:FRIEND_OF]->(d)," +
                        "(m)-[:FRIEND_OF]->(l)," +
                        "(m)-[:FRIEND_OF]->(a)," +
                        "(m)-[:FRIEND_OF]->(v)," +
                        "(d)-[:FRIEND_OF]->(v)," +
                        "(b)-[:FRIEND_OF]->(v)," +
                        "(j)-[:FRIEND_OF]->(v)," +
                        "(j)-[:FRIEND_OF]->(m)," +
                        "(j)-[:FRIEND_OF]->(a)," +
                        "(a)-[:LIVES_IN]->(lon)," +
                        "(d)-[:LIVES_IN]->(lon)," +
                        "(v)-[:LIVES_IN]->(lon)," +
                        "(m)-[:LIVES_IN]->(lon)," +
                        "(j)-[:LIVES_IN]->(lon)," +
                        "(c)-[:LIVES_IN]->(br)," +
                        "(l)-[:LIVES_IN]->(mum)");
    }

    @Test
    public void shouldRecommendRealTime() {
        try (Transaction tx = getDatabase().beginTx()) {

            //verify Vince

            List<Recommendation<Node>> recoForVince = recommendationEngine.recommend(getPersonByName("Vince"), new SimpleConfig(2));

            String expectedForVince = "Computed recommendations for Vince: (Adam {total:41.99417, ageDifference:-5.527864, friendsInCommon: {value:27.522034, {value:1.0, name:Jim}, {value:1.0, name:Michal}}, sameGender:10.0, sameLocation: {value:10.0, {value:10.0, location:London}}}), (Luanne {total:7.856705, ageDifference:-7.0093026, friendsInCommon: {value:14.866008, {value:1.0, name:Michal}}})";

            assertEquals(expectedForVince, rememberingLogger.toString(getPersonByName("Vince"), recoForVince, null));
            assertEquals(expectedForVince, rememberingLogger.get(getPersonByName("Vince")));

            //verify Adam

            List<Recommendation<Node>> recoForAdam = recommendationEngine.recommend(getPersonByName("Adam"), new SimpleConfig(2));

            String expectedForAdam = "Computed recommendations for Adam: (Vince {total:41.99417, ageDifference:-5.527864, friendsInCommon: {value:27.522034, {value:1.0, name:Jim}, {value:1.0, name:Michal}}, sameGender:10.0, sameLocation: {value:10.0, {value:10.0, location:London}}}), (Daniela {total:19.338144, ageDifference:-5.527864, friendsInCommon: {value:14.866008, {value:1.0, name:Michal}}, sameLocation: {value:10.0, {value:10.0, location:London}}})";

            assertEquals(expectedForAdam, rememberingLogger.toString(getPersonByName("Adam"), recoForAdam, null));
            assertEquals(expectedForAdam, rememberingLogger.get(getPersonByName("Adam")));

            //verify Luanne

            List<Recommendation<Node>> recoForLuanne = recommendationEngine.recommend(getPersonByName("Luanne"), new SimpleConfig(4));

            assertEquals("Daniela", recoForLuanne.get(0).getItem().getProperty("name"));
            assertEquals(22, recoForLuanne.get(0).getScore().getTotalScore(), 0.5);

            assertEquals("Adam", recoForLuanne.get(1).getItem().getProperty("name"));
            assertEquals(12, recoForLuanne.get(1).getScore().getTotalScore(), 0.5);

            assertEquals("Jim", recoForLuanne.get(2).getItem().getProperty("name"));
            assertEquals(8, recoForLuanne.get(2).getScore().getTotalScore(), 0.5);

            assertEquals("Vince", recoForLuanne.get(3).getItem().getProperty("name"));
            assertEquals(8, recoForLuanne.get(3).getScore().getTotalScore(), 0.5);

            tx.success();
        }
    }

    private Node getPersonByName(String name) {
        return getDatabase().findNode(DynamicLabel.label("Person"), "name", name);
    }
}

Pre-Computed Recommendations

With FriendsComputingEngine, we have a full-blown recommendation engine and could have stopped right there. However, we would like to demonstrate the capability of using the very same engine to pre-compute recommendations.

It is worth mentioning that in this simple example, the exact same recommendations will be pre-computed as would have been computed in real-time. However, in real-life scenarios, RecommendationEngines can choose to perform a quicker computation in real-time scenarios, but take a more accurate and slower approach for batch computations. The information about how long a computation can take can be passed into the recommend method of a RecommendationEngine as another parameter. It is then available from the Context object, where we can also find the total time already elapsed.

Pre-Computing

In order for our FriendsComputingEngine to be used to pre-compute recommendations when the database isn't busy, the only thing we need to do is modify neo4j.properties. We're assuming that we are running in server mode and that the the following .jar files have been placed into the plugins directory of your Neo4j installation:

  • GraphAware Framework Server (Community / Enterprise)
  • GraphAware Neo4j Reco (this library)
  • Your code developed as part of this tutorial

Add the following lines to neo4j.properties:

#Enable GraphAware Runtime
com.graphaware.runtime.enabled=true

#Register the Recommendation Module
com.graphaware.module.reco.1=com.graphaware.reco.neo4j.module.RecommendationModuleBootstrapper

#Express for which nodes recommendations should be computed
com.graphaware.module.reco.node=hasLabel('Person')

#Define which Recommendation Engine to use
com.graphaware.module.reco.engine=com.graphaware.reco.integration.FriendsComputingEngine

#Optionally, specify how many recommendation to compute (default is 10)
com.graphaware.module.reco.maxRecommendations=5

#Optionally, specify the Relationship Type of the relationship linking people with their recommended friends (default is RECOMMEND)
com.graphaware.module.reco.relationshipType=RECOMMEND

That's all. You can tweak how often the pre-computation kicks in and what it means for your database to be busy. Please refer to the documentation of GraphAware Timer-Driven Modules to learn how to do that.

Using Pre-Computed Recommendations

In order for the pre-computed recommendations to be served first, before we start computing them in real-time, we need to make a few tweaks to our recommendation engine setup. First, we will override one more method in FriendsComputingEngine in order to indicate that it should only be used if there aren't enough pre-computed recommendations:

/**
 * {@link com.graphaware.reco.neo4j.engine.Neo4jTopLevelDelegatingRecommendationEngine} that computes friend recommendations.
 */
public class FriendsComputingEngine extends Neo4jTopLevelDelegatingEngine {

    @Override
    protected List<RecommendationEngine<Node, Node>> engines() {
        return Arrays.<RecommendationEngine<Node, Node>>asList(
                new FriendsInCommon(),
                new RandomPeople()
        );
    }

    @Override
    protected List<PostProcessor<Node, Node>> postProcessors() {
        return Arrays.<PostProcessor<Node, Node>>asList(
                new RewardSameLabels(),
                new RewardSameLocation(),
                new PenalizeAgeDifference()
        );
    }

    @Override
    protected List<BlacklistBuilder<Node, Node>> blacklistBuilders() {
        return Arrays.<BlacklistBuilder<Node, Node>>asList(
                new ExistingRelationshipBlacklistBuilder(FRIEND_OF, BOTH)
        );
    }

    @Override
    protected List<Filter<Node, Node>> filters() {
        return Arrays.<Filter<Node, Node>>asList(
                new ExcludeSelf()
        );
    }

    @Override
    public ParticipationPolicy<Node, Node> participationPolicy(Context<Node, Node> context) {
        return ParticipationPolicy.IF_MORE_RESULTS_NEEDED;
    }
}

Finally, we need a new top-level RecommendationEngine that is exposed to our controllers or whatever component of your application is consuming the recommendations. The new top-level engine will first delegate to a Neo4jPrecomputedEngine, then to our FriendsComputingEngine. BlacklistBuilders and Filters have to be provided to this engine as well, because it will now be responsible for constructing Contexts, since it is a top-level engine.

/**
 * {@link com.graphaware.reco.neo4j.engine.Neo4jTopLevelDelegatingRecommendationEngine} that recommends friends by first trying to
 * read pre-computed recommendations from the graph, then (if there aren't enough results) by computing the friends in
 * real-time using {@link FriendsComputingEngine}.
 */
public final class FriendsRecommendationEngine extends Neo4jTopLevelDelegatingEngine {

    @Override
    protected List<RecommendationEngine<Node, Node>> engines() {
        return Arrays.<RecommendationEngine<Node, Node>>asList(
                new Neo4jPrecomputedEngine(),
                new FriendsComputingEngine()
        );
    }

    @Override
    protected List<BlacklistBuilder<Node, Node>> blacklistBuilders() {
        return Arrays.asList(
                new ExistingRelationshipBlacklistBuilder(FRIEND_OF, BOTH)
        );
    }

    @Override
    protected List<Filter<Node, Node>> filters() {
        return Arrays.<Filter<Node, Node>>asList(
                new ExcludeSelf()
        );
    }
}

Logging

In order to record produced recommendations, you can add provided or your own Logger implementations to the top-level engine, e.g.:

/**
 * {@link com.graphaware.reco.neo4j.engine.Neo4jTopLevelDelegatingRecommendationEngine} that recommends friends by first trying to
 * read pre-computed recommendations from the graph, then (if there aren't enough results) by computing the friends in
 * real-time using {@link FriendsComputingEngine}.
 */
public final class FriendsRecommendationEngine extends Neo4jTopLevelDelegatingRecommendationEngine {

    @Override
    protected List<RecommendationEngine<Node, Node>> engines() {
        return Arrays.<RecommendationEngine<Node, Node>>asList(
                new Neo4jPrecomputedEngine(),
                new FriendsComputingEngine()
        );
    }

    @Override
    protected List<BlacklistBuilder<Node, Node>> blacklistBuilders() {
        return Arrays.asList(
                new ExistingRelationshipBlacklistBuilder(FRIEND_OF, BOTH)
        );
    }

    @Override
    protected List<Filter<Node, Node>> filters() {
        return Arrays.<Filter<Node, Node>>asList(
                new ExcludeSelf()
        );
    }

    @Override
    protected List<Logger<Node, Node>> loggers() {
        return Arrays.<Logger<Node, Node>>asList(
                new Slf4jRecommendationLogger<Node, Node>(),
                new Slf4jStatisticsLogger<Node, Node>()
        );
    }
}

Job done!

License

Copyright (c) 2020 GraphAware

GraphAware is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.