Graph databases top 6 setups and configurations.
The core part of web devs and software engineer best efforts when designing distributed applications, would be to avoid cases of highly congested network traffic. This is mainly done to prevent bottlenecks in the public network that the webapp or system runs on and to have flexible data handling. This might also be done to prevent DDoS attempts at an account of a webapp. Usually this is done by utilizing caching and streaming mechanisms provided as part of ready-made webapp/content hosting techstack. A major part of online content hosting infrastructures would be graph-based distributed schemas and their respective graph-based Database technologies that support it. In a brief explanation on how graph databases work that would be described as below set of basic rules -and differences from traditional dbs- Graph databases use nodes and edges to represent data relationships whereas in a relational database keys and columns relations to keys are used to do this. A graph database has a flexible data schema (like NoSQL databases) and data types can easily added/modified whereas in a relational database data set structures cannot change as easily (data types, relations etc.) Graph with dbs are mostly used in recommendation systems, ML and ai-driven systems that have to do with social networks like facebook, fraudulent activity detection etc. Traditional db structures are wider-spread and can be used with stricter data handling domains such as data warehousing and BI/analytics systems. As a learning curve, graph db concepts are more difficult to grasp by early developers but relational dbs can be taught more easily and used by most people in general. An example of how a graph database can be used in a real-life application can be found below: Legend for example: Arrows are indicated as graph db edges, circles are the nodes of the graph and a vertex of the db structure would be denoted by at least three nodes (circles) and two edges (arrows) 'aka' V-shape in the graph. How this works in regards to querying an graph-based entity structure would be as follow: To retrieve the tutor that has as a corresponding student A from STUDENTS_REGISTRY would require classroom-to-tutor edge (with classroom, tutor nodes included as operation1) and tutor-to-studentsregistry edge (with both tutor, studentsregistry nodes included as operation2). The set of individual nodes and edges for operation1 is called a graph link (E1) to indicate linking of a tutor node to a classroom node. The other set that handles the linking of a tutor node to a studentsregistry node is by graph link (E2). This can go up until the length N for the entire length of nodes and vertices in the graph. In noticing the linearity in the relations of the above, usually linear algebraic operations are used to extract such information from a graph-based data structure. This is just an over-simplified way of how this is implemented in an actual graph-based database engine. More info on how graphs work can be found here... ...and here for graph databases NOTE THAT: The example indicates a DiGraph (or Directional Graph) usage of the system. This is also common in binary tree data structures and indicates a more strict layout of information. If a unidirectional graph where to be used, there wouldn't be any arrow indicators to create the data relations in the entities (nodes) of the system. This would indicate a more flexible data relationship schema, which is usually what is applicable to actual graph db-based technologies. Either digraph or unidirectional graph Db is used as an underlying data source, each one has its perks and disadvantages in terms of workload handling, flexibility of data structuring and directed DoS attacks on the data structures. As a main part of this, here is a listing of the top 6 graph-based backing stores that are utilized by most content delivery/hosting stacks for distributing web content online: 1. Neo4j DB One the most popular graph-based database frameworks, has its own language and query engine (known-as Cypher) which also supports some proprietary structures such as SAP HANA Graph. Overall in its functionality supports CSV imports, db interoperability and both Cloud-hosting and Self-hosting deployments to use. Aside from a large knowledge-base with detailed documentation and tutorials, it has also video tutorials and guided learning for understanding how to use its vast range of features. Also supports query and graph visualization via its dashboard capability which is called NeoDash and a local IDE called Neo4j Desktop. General Info Important features: Self-Hosting by Docker, Kubernetes, AWS/GCP/MS Azure Cloud or fully managed Cloud-based hosting via Neo4jAuraDB. Also includes IDE support via Neo4j Desktop and NeoDash for visualization. Also CVE (Common Vulnerability Exploits) supported on its knowledge base and continuously checked by their open-source platform. Where is applicable: Generalized purpos

The core part of web devs and software engineer best efforts when designing distributed applications, would be to avoid cases of highly congested network traffic.
This is mainly done to prevent bottlenecks in the public network that the webapp or system runs on and to have flexible data handling. This might also be done to prevent DDoS attempts at an account of a webapp.
Usually this is done by utilizing caching and streaming mechanisms provided as part of ready-made webapp/content hosting techstack.
A major part of online content hosting infrastructures would be graph-based distributed schemas and their respective graph-based Database technologies that support it.
In a brief explanation on how graph databases work that would be described as below set of basic rules -and differences from traditional dbs-
Graph databases use nodes and edges to represent data relationships whereas in a relational database keys and columns relations to keys are used to do this.
A graph database has a flexible data schema (like NoSQL databases) and data types can easily added/modified whereas in a relational database data set structures cannot change as easily (data types, relations etc.)
Graph with dbs are mostly used in recommendation systems, ML and ai-driven systems that have to do with social networks like facebook, fraudulent activity detection etc. Traditional db structures are wider-spread and can be used with stricter data handling domains such as data warehousing and BI/analytics systems.
As a learning curve, graph db concepts are more difficult to grasp by early developers but relational dbs can be taught more easily and used by most people in general.
An example of how a graph database can be used in a real-life application can be found below:
Legend for example: Arrows are indicated as graph db edges, circles are the nodes of the graph and a vertex of the db structure would be denoted by at least three nodes (circles) and two edges (arrows) 'aka' V-shape in the graph.
How this works in regards to querying an graph-based entity structure would be as follow:
To retrieve the tutor that has as a corresponding student A from STUDENTS_REGISTRY would require classroom-to-tutor edge (with classroom, tutor nodes included as operation1) and tutor-to-studentsregistry edge (with both tutor, studentsregistry nodes included as operation2).
The set of individual nodes and edges for operation1 is called a graph link (E1) to indicate linking of a tutor node to a classroom node. The other set that handles the linking of a tutor node to a studentsregistry node is by graph link (E2). This can go up until the length N for the entire length of nodes and vertices in the graph.
In noticing the linearity in the relations of the above, usually linear algebraic operations are used to extract such information from a graph-based data structure.
This is just an over-simplified way of how this is implemented in an actual graph-based database engine.
More info on how graphs work can be found here...
...and here for graph databases
NOTE THAT: The example indicates a DiGraph (or Directional Graph) usage of the system. This is also common in binary tree data structures and indicates a more strict layout of information.
If a unidirectional graph where to be used, there wouldn't be any arrow indicators to create the data relations in the entities (nodes) of the system. This would indicate a more flexible data relationship schema, which is usually what is applicable to actual graph db-based technologies.
Either digraph or unidirectional graph Db is used as an underlying data source, each one has its perks and disadvantages in terms of workload handling, flexibility of data structuring and directed DoS attacks on the data structures.
As a main part of this, here is a listing of the top 6 graph-based backing stores that are utilized by most content delivery/hosting stacks for distributing web content online:
1. Neo4j DB
One the most popular graph-based database frameworks, has its own language and query engine (known-as Cypher) which also supports some proprietary structures such as SAP HANA Graph.
Overall in its functionality supports CSV imports, db interoperability and both Cloud-hosting and Self-hosting deployments to use.
Aside from a large knowledge-base with detailed documentation and tutorials, it has also video tutorials and guided learning for understanding how to use its vast range of features.
Also supports query and graph visualization via its dashboard capability which is called NeoDash and a local IDE called Neo4j Desktop.
General Info
Important features: Self-Hosting by Docker, Kubernetes, AWS/GCP/MS Azure Cloud or fully managed Cloud-based hosting via Neo4jAuraDB. Also includes IDE support via Neo4j Desktop and NeoDash for visualization.
Also CVE (Common Vulnerability Exploits) supported on its knowledge base and continuously checked by their open-source platform.
Where is applicable: Generalized purpose graph database, preferrable for structuring LLMs and Machine Learning for business-level analysis.
2. GraphDB
One of the most familiar graph databases that has added built-in support for RDF schemas and SPARQL. This can make it easy to use with ontological frameworks/ontologies for more context-driven machine-learning scenarios for supervised learning. Also supports clustered operation with built-in memory caching capabilities and docker/kubernetes/helm deployments. The kubernetes templates allow for a higher level of detail and complexity for self-hosted instances.
It also includes a monitoring panel with partial SPARQL query syntax support called GraphDB Workbench. Alongside coding and administrative features it includes a set of CLI-commands that support vast range of operations such as RDF schema validations etc.
Other important feature is it's high-grained access controls via LDAP/kerberos and other authentication schemes but this is a hassle to be setup and needs to be done manually. Instructions mainly provided by official documentation and vast stackoverflow supportbase on their website.
General Info
Important features: Built-in support for SPARQL and RDF schemas, supports clustered deployments and also self-hosting via kubernetes/docker/helm. Also has a built-in monitoring and administration tool with small IDE features for queries.
Where is applicable: Machine learning with ontological frameworks and a.i./ranking algorithms and relevant taxonomies.
3. ApolloGraphQL + GraphOS
One of the most popular graph database for project working with dev teams and multiple deployments and DevOps scenarios is ApolloGraphQL. It includes a built-in GraphOS Router by which a flexible RESTful API is exposed for handling multiple microservices at once. It's scalable infrastructure allows for ease of use across many development teams and whenever CI/CD automation is required in larger enterprise projects.
Also includes an in-browser IDE setup locally and self-hosting capabilities for GraphOS Router or Apollo Router Core, on Docker or Kubernetes.
General Info
Important features: Full in-Browser IDE with graph query execution plans and visualizations. Flexibility of CI/CDs with SDK and RESTful API connectors via GraphOS Router (with GRPC, SOAP etc). Also it has a good support base and useful documentation.
Where is applicable: Vast range of features that make it useful in literally any Machine Learning/ranking distributed application from medium to larger enterprises. Also useful where you require to have MLOps (Machine Learning Operations) or DevOps in a large web and back-end developers team (either github, gitlab).
4. TigerGraph DB
TigerGraph is a more proprietary and enterprise-level graph database product that is also backed by detailed docs and tutorials knowledgebase.
The paid products on their website already are provided as default enterprise level packages whereas also community editions are included for free.
It also supports Docker self-hosting, with online instructions, for both Ubuntu and CentOS deployments for the corresponding environments. In addition kubernetes self-hosting is available on AWS (EKS), Google's (GKE), RedHat OpenShift and MS Azure (AKS).
Apparently, a lot of detail has been added on the online docs about how to run it directly on Linux H/W with the relevant amount of info and minimum supported hardware. A list of all the Linux distros that currently supports, is as below (note: it does not run on Windows or Macintosh).
Operating System | Supported |
---|---|
RedHat (RHEL) 7.0 to 8.9 | ✅ |
RedHat (RHEL) 9 | ✅ |
CentOs 6.5 to 8.0 | ✅ |
Ubuntu 16.04 LTS | ✅ |
Ubuntu 18.04 LTS | ✅ |
Ubuntu 20.04 LTS | ✅ |
Ubuntu 22.04 LTS | ✅ |
Debian 10 & 11 | ✅ |
Suse 12 | ✅ |
Oracle Linux 8.0 to 8.4 | ✅ |
Windows (all version) | ❌ |
MacOS (Intel and M1 chip) | ❌ |
In addition to that, it supports out-of-the-box a RESTful API for data driven operations (similar to ApolloGraph) and highly-refined built-in RBAC controls that support multiple user roles and access levels. Also has many interesting features via its RESTPP Built-in capability that also include open metrics and status monitoring alongside hyper vertices handling and linear operations with edges/vertex retrieval and the underlying path-finding algorithms that come with it.
Most of the above, are monitored by a fully-blown web IDE aka GraphStudio and a management interface supported by the Admin Portal dashboard which is also web browser accessible.
Overall a decent product for graph-based operations that also supports high-availability clustered configuration wherever required.
General Info
Important features: The most important parts of TigerGraph are its vast documentation resource base such as forums, developer site, documentation with knowledgebases and the highly-refined built-in capabilities that includes such as: RBAC controls, REST API and monitoring capabilities.
Where is applicable: Can support a vast range of Machine learning Models setups for both startups (community edition) and larger companies (enterprise edition), also for algorithmic applicability and usage. It can also be setup and used easily for MLOps and DevOps. Also usable for adding BI alongside mapping applications such as map traversal via routing and relevant path finding algorithms.
5. MongoDB + graph-based lookup pipeline stage
Another graph capable database would be MongoDB. Not only has the NoSQL schema flexibility but can be handled independently via its aggregation pipeline stage that can also be processed as a graph database.
This is done by utilizing MongoDB's $graphLookup aggregation pipeline stage so that it handles the relevant transformations directly on the underlying dataset.
What is supported for this:
- All the high availability and optimization features that MongoDB cluster currently supports on the managed Atlas Cloud.
- All the administration panels and relevant DBMS for Windows, Linux provided by the same platform.
- What is currently supported in terms of CLI and self-hosting for MongoDB.
- Based on its online info, MongoDB supports key-value-pair matching per NoSQL document for building and returning a graph edge or a vertex via its graph-based aggregation pipeline. More info on this can be found here.
Even though MongoDB is good for handling key-value pairs in a graph-traversal manner for NoSQL documents it is still not a fully supported graph database schema. An additional enhancement is the use of labels to tag multiple nodes together to make node matching easier and more in-depth.
There is also recently added extra support for out-of-the box connectivity with a graphQL enabled database and this is also stated in the documentation. Also Mongo Atlas Vector search is supported on managed MongoDB Atlas instances.
General Info
Important features: All features currently supported by MongoDB and additional CLI tools and administration panel and software. A vast knowledge base but not too informative on how to use graphLookups in-depth and for real-world test cases.
Where is applicable: Most types of applications that require graph-based capabilities with a backend that currently runs on NoSQL infrastructure. Not to be recommended for more high level AI or ML frameworks and AI models as it does not support a lot of built-in graph handling features out of the box.
6. Azure CosmosDB for Apache Gremlin
One of the top fully cloud hosted options for graph databases would be Apache Gremlin, managed directly on Azure Cloud.
It is included as part of the NoSQL cloud hosted Cosmos DB that supports direct graph data structures handling. Apache Gremlin's engine is based on Apache Tinkerpop and has its own graph query language named 'Gremlin Query Language'.
The main advantage of being a part of Cosmos DB is that is fully integrated with Azure Cloud and the benefits that come with it. These include full Azure CloudShell interaction and usage for scripting with Gremlin directly on the cloud as well as all the benefits you get from running as a managed DaaS Azure resource.
SDKs and scripting languages supported on Azure so far are Java, Python and .NET with good amount of examples in their online documentation.
Some small open-source only community templates are available also for direct Gremlin Tinkerpop self-hosting via Docker. Another good feature that Gremlin graph uses on Azure Cloud, is that it can perform bulk data ingestion with it's bulk execution library. Although this only works via a built-in .NET library and takes some prerequisites such as Maven/OpenJDK Gremlin API access towards your database.
General Info
Important features: Fully managed web browser scripting environment via Azure CloudShell, huge support base by both Microsoft and open source Apache side and also a decent amount of languages and corresponding SDKs.
Where is applicable: General purpose graph-based features for multiple case scenarios and usages. These can be: map-based path-finding or routing for geospatial applications, ranking or recommendation engines, social network interactions and sentiment analysis for social network apps. Because of the built-in MS capabilities on Azure, can also be used for MLOps, DevOps in larger development teams.
Bonus: Using a traditional relational database schema as a graph database without built-in features. In checking the following online example and another more detailed here, this is the most time-consuming way to use as a last resort if creating a graph schema is a must for your application.
This more challenging approach, involves mapping the nodes and edges to the corresponding key of a relational table so that a pointer record indicates the location (and direction) of the next node in the graph-like structure. This is usually performed in a from -> to
layout.
WARNING - This is only useful if you really must persist graph data in a conventional database schema and there is no other option. Do not count on this for production usage.
This is my exhaustive listing of graph-based database features and relevant usage. Feel free to provide other ideas/tech stacks or constructive criticism where needed.