4

Imagine I have graph data that is beyond the size of a single machine.

How would you shard a graph database?

I asked on Hacker News and people suggested sharding based on a hash of the predicate-subject-object, ala RDF triples.

I could partition the graph across replicas and query all replicas to aggregate answers.

But what about joins across machines?

Do I need to colocate join keys on all replicas?

Or do I just send the query to all replicas and in parallel and aggregate the responses?

  • You haven't mentioned a database vendor, but I imagine sharding should be mostly transparent to you. Joining, querying — all those operations should be handled by the database so your code doesn't need to. Can you [edit] your question to include more information about the database vendor you are using? Bear in mind that questions about how to configure a database are likely off-topic. Tool questions belong on StackOverflow or another relevant community. – Greg Burghardt May 25 '23 at 11:21
  • I am the database vendor, in other words I am experimenting with database internals and I am curious about the implementation techniques of sharding graph data. – Samuel Squire May 26 '23 at 11:17
  • I see. This is more a database architecture question. I see two close-votes for "needing focus". Don't let multiple question marks fool you. You cannot properly shard data unless you consider the other questions in this post. – Greg Burghardt May 26 '23 at 12:11

0 Answers0