Janusgraph as a productive environment for hundreds of millions of nodes #3781

fhenseleit · 2023-05-18T19:23:28Z

fhenseleit
May 18, 2023

I work in a company which holds hundreds of millions of records for OLAP and other uses currently implemented on AWS Neptune.

It's fair to say that the costs of this operation is HUGE. So i'm currently evaluating different in-house alternatives to aliviate costs and posible implement optimizations for our data structure, could Janusgraph handle efficiently this size of dataset and operate the amount of nodes we currently use, what known limits Janusgraph has?

porunov · 2023-05-18T20:14:20Z

porunov
May 18, 2023
Maintainer

JanusGraph can handle more than quintillion edges and half the amount of vertices. Thus, hundreds of millions isn't that big graph.
Thus, the answer is: Yes.
Current limitations are provided here: https://docs.janusgraph.org/advanced-topics/technical-limitations/
Latest JanusGraph stable version is 0.6.3 which should provide you a good start. That said, I want to mention that soon-ish we will have JanusGraph version 1.0.0 which should be a big step forward especially in queries optimization / easier tuning experience / management.
I would say that you can start with 0.6.3 and then transition to 1.0.0 when it's ready to gain better performance and additional features. It shouldn't be a problem because it's mostly configuration changes without any breaking changes to the data type.
Hope this answer helps you somehow.

0 replies

hadoopmarc · 2023-05-20T13:01:52Z

hadoopmarc
May 20, 2023

Also important to note is that JanuGraph OLAP operations depend on Apache Spark. For OLAP operations to be efficient, the entire graph will have to be temporarily present in memory as java objects, which might be TB's of cluster RAM in your case (assuming you need shuffling because the graph is globally connected). Also, a cql storage backend will perform much better than HBase for OLAP operarions because of the data partition sizes they use.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Janusgraph as a productive environment for hundreds of millions of nodes #3781

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Janusgraph as a productive environment for hundreds of millions of nodes #3781

fhenseleit May 18, 2023

Replies: 2 comments

porunov May 18, 2023 Maintainer

hadoopmarc May 20, 2023

fhenseleit
May 18, 2023

porunov
May 18, 2023
Maintainer

hadoopmarc
May 20, 2023