# Neo4j Tutorial: Mastering Variable Length Relationships and Path Algorithms

Introduction Welcome to this in-depth tutorial on one of Neo4j's most powerful features: working with paths and relationships of variable length. In this guide, we'll explore how to traverse complex networks, find connections between entities, and discover optimal routes through your graph data. Graph databases excel at analyzing connected data, and Neo4j provides specialized syntax and algorithms for exploring these connections efficiently. Whether you're building a social network, knowledge graph, logistics system, or recommendation engine, the techniques we'll cover in this tutorial will be invaluable for uncovering insights from your interconnected data. By the end of this tutorial, you'll understand how to: Query relationships of variable lengths Find the shortest path between nodes Discover all shortest paths between nodes Apply these techniques to solve real-world problems Let's dive in and explore the fascinating world of graph traversal in Neo4j! Setting Up Our Sample Database For this tutorial, we'll work with a transportation network representing cities connected by roads. This example will help us understand how to find routes between locations, a common use case for path finding in graphs. Let's create our sample database: // Create City nodes CREATE (nyc:City {name: "New York", population: 8400000}) CREATE (la:City {name: "Los Angeles", population: 3900000}) CREATE (chicago:City {name: "Chicago", population: 2700000}) CREATE (houston:City {name: "Houston", population: 2300000}) CREATE (phoenix:City {name: "Phoenix", population: 1600000}) CREATE (philly:City {name: "Philadelphia", population: 1500000}) CREATE (dallas:City {name: "Dallas", population: 1300000}) CREATE (sf:City {name: "San Francisco", population: 880000}) CREATE (denver:City {name: "Denver", population: 710000}) CREATE (boston:City {name: "Boston", population: 690000}) CREATE (seattle:City {name: "Seattle", population: 730000}) CREATE (atlanta:City {name: "Atlanta", population: 500000}) CREATE (miami:City {name: "Miami", population: 450000}) CREATE (austin:City {name: "Austin", population: 950000}) CREATE (dc:City {name: "Washington DC", population: 690000}) // Create ROAD_TO relationships with distances in miles MATCH (nyc:City {name: "New York"}), (philly:City {name: "Philadelphia"}) CREATE (nyc)-[:ROAD_TO {distance: 95}]->(philly) CREATE (philly)-[:ROAD_TO {distance: 95}]->(nyc) MATCH (nyc:City {name: "New York"}), (boston:City {name: "Boston"}) CREATE (nyc)-[:ROAD_TO {distance: 215}]->(boston) CREATE (boston)-[:ROAD_TO {distance: 215}]->(nyc) MATCH (philly:City {name: "Philadelphia"}), (dc:City {name: "Washington DC"}) CREATE (philly)-[:ROAD_TO {distance: 140}]->(dc) CREATE (dc)-[:ROAD_TO {distance: 140}]->(philly) MATCH (dc:City {name: "Washington DC"}), (atlanta:City {name: "Atlanta"}) CREATE (dc)-[:ROAD_TO {distance: 600}]->(atlanta) CREATE (atlanta)-[:ROAD_TO {distance: 600}]->(dc) MATCH (atlanta:City {name: "Atlanta"}), (miami:City {name: "Miami"}) CREATE (atlanta)-[:ROAD_TO {distance: 660}]->(miami) CREATE (miami)-[:ROAD_TO {distance: 660}]->(atlanta) MATCH (atlanta:City {name: "Atlanta"}), (dallas:City {name: "Dallas"}) CREATE (atlanta)-[:ROAD_TO {distance: 780}]->(dallas) CREATE (dallas)-[:ROAD_TO {distance: 780}]->(atlanta) MATCH (dallas:City {name: "Dallas"}), (houston:City {name: "Houston"}) CREATE (dallas)-[:ROAD_TO {distance: 240}]->(houston) CREATE (houston)-[:ROAD_TO {distance: 240}]->(dallas) MATCH (houston:City {name: "Houston"}), (austin:City {name: "Austin"}) CREATE (houston)-[:ROAD_TO {distance: 165}]->(austin) CREATE (austin)-[:ROAD_TO {distance: 165}]->(houston) MATCH (dallas:City {name: "Dallas"}), (austin:City {name: "Austin"}) CREATE (dallas)-[:ROAD_TO {distance: 195}]->(austin) CREATE (austin)-[:ROAD_TO {distance: 195}]->(dallas) MATCH (dallas:City {name: "Dallas"}), (phoenix:City {name: "Phoenix"}) CREATE (dallas)-[:ROAD_TO {distance: 1065}]->(phoenix) CREATE (phoenix)-[:ROAD_TO {distance: 1065}]->(dallas) MATCH (phoenix:City {name: "Phoenix"}), (la:City {name: "Los Angeles"}) CREATE (phoenix)-[:ROAD_TO {distance: 370}]->(la) CREATE (la)-[:ROAD_TO {distance: 370}]->(phoenix) MATCH (la:City {name: "Los Angeles"}), (sf:City {name: "San Francisco"}) CREATE (la)-[:ROAD_TO {distance: 380}]->(sf) CREATE (sf)-[:ROAD_TO {distance: 380}]->(la) MATCH (sf:City {name: "San Francisco"}), (seattle:City {name: "Seattle"}) CREATE (sf)-[:ROAD_TO {distance: 810}]->(seattle) CREATE (seattle)-[:ROAD_TO {distance: 810}]->(sf) MATCH (seattle:City {name: "Seattle"}), (denver:City {name: "Denver"}) CREATE (seattle)-[:ROAD_TO {distance: 1300}]->(denver) CREATE (denver)-[:ROAD_TO {distance: 1300}]->(seattle) MATCH (denver:City {name: "Denver"}), (dallas:City {name: "Dallas"}) CREATE (denver)-[:ROAD_TO {distance: 790}]->(dallas) CREATE (dallas)-[:ROAD_TO {distance: 790}]->(denver) MATCH (denver:City {name: "Denver"}), (chicago:City {name: "Chicago"}) CREATE (denver)-[:ROAD_TO {

Apr 15, 2025 - 04:20
 0
# Neo4j Tutorial: Mastering Variable Length Relationships and Path Algorithms

Introduction

Welcome to this in-depth tutorial on one of Neo4j's most powerful features: working with paths and relationships of variable length. In this guide, we'll explore how to traverse complex networks, find connections between entities, and discover optimal routes through your graph data.

Graph databases excel at analyzing connected data, and Neo4j provides specialized syntax and algorithms for exploring these connections efficiently. Whether you're building a social network, knowledge graph, logistics system, or recommendation engine, the techniques we'll cover in this tutorial will be invaluable for uncovering insights from your interconnected data.

By the end of this tutorial, you'll understand how to:

  • Query relationships of variable lengths
  • Find the shortest path between nodes
  • Discover all shortest paths between nodes
  • Apply these techniques to solve real-world problems

Let's dive in and explore the fascinating world of graph traversal in Neo4j!

Setting Up Our Sample Database

For this tutorial, we'll work with a transportation network representing cities connected by roads. This example will help us understand how to find routes between locations, a common use case for path finding in graphs.

Let's create our sample database:

// Create City nodes
CREATE (nyc:City {name: "New York", population: 8400000})
CREATE (la:City {name: "Los Angeles", population: 3900000})
CREATE (chicago:City {name: "Chicago", population: 2700000})
CREATE (houston:City {name: "Houston", population: 2300000})
CREATE (phoenix:City {name: "Phoenix", population: 1600000})
CREATE (philly:City {name: "Philadelphia", population: 1500000})
CREATE (dallas:City {name: "Dallas", population: 1300000})
CREATE (sf:City {name: "San Francisco", population: 880000})
CREATE (denver:City {name: "Denver", population: 710000})
CREATE (boston:City {name: "Boston", population: 690000})
CREATE (seattle:City {name: "Seattle", population: 730000})
CREATE (atlanta:City {name: "Atlanta", population: 500000})
CREATE (miami:City {name: "Miami", population: 450000})
CREATE (austin:City {name: "Austin", population: 950000})
CREATE (dc:City {name: "Washington DC", population: 690000})

// Create ROAD_TO relationships with distances in miles
MATCH (nyc:City {name: "New York"}), (philly:City {name: "Philadelphia"})
CREATE (nyc)-[:ROAD_TO {distance: 95}]->(philly)
CREATE (philly)-[:ROAD_TO {distance: 95}]->(nyc)

MATCH (nyc:City {name: "New York"}), (boston:City {name: "Boston"})
CREATE (nyc)-[:ROAD_TO {distance: 215}]->(boston)
CREATE (boston)-[:ROAD_TO {distance: 215}]->(nyc)

MATCH (philly:City {name: "Philadelphia"}), (dc:City {name: "Washington DC"})
CREATE (philly)-[:ROAD_TO {distance: 140}]->(dc)
CREATE (dc)-[:ROAD_TO {distance: 140}]->(philly)

MATCH (dc:City {name: "Washington DC"}), (atlanta:City {name: "Atlanta"})
CREATE (dc)-[:ROAD_TO {distance: 600}]->(atlanta)
CREATE (atlanta)-[:ROAD_TO {distance: 600}]->(dc)

MATCH (atlanta:City {name: "Atlanta"}), (miami:City {name: "Miami"})
CREATE (atlanta)-[:ROAD_TO {distance: 660}]->(miami)
CREATE (miami)-[:ROAD_TO {distance: 660}]->(atlanta)

MATCH (atlanta:City {name: "Atlanta"}), (dallas:City {name: "Dallas"})
CREATE (atlanta)-[:ROAD_TO {distance: 780}]->(dallas)
CREATE (dallas)-[:ROAD_TO {distance: 780}]->(atlanta)

MATCH (dallas:City {name: "Dallas"}), (houston:City {name: "Houston"})
CREATE (dallas)-[:ROAD_TO {distance: 240}]->(houston)
CREATE (houston)-[:ROAD_TO {distance: 240}]->(dallas)

MATCH (houston:City {name: "Houston"}), (austin:City {name: "Austin"})
CREATE (houston)-[:ROAD_TO {distance: 165}]->(austin)
CREATE (austin)-[:ROAD_TO {distance: 165}]->(houston)

MATCH (dallas:City {name: "Dallas"}), (austin:City {name: "Austin"})
CREATE (dallas)-[:ROAD_TO {distance: 195}]->(austin)
CREATE (austin)-[:ROAD_TO {distance: 195}]->(dallas)

MATCH (dallas:City {name: "Dallas"}), (phoenix:City {name: "Phoenix"})
CREATE (dallas)-[:ROAD_TO {distance: 1065}]->(phoenix)
CREATE (phoenix)-[:ROAD_TO {distance: 1065}]->(dallas)

MATCH (phoenix:City {name: "Phoenix"}), (la:City {name: "Los Angeles"})
CREATE (phoenix)-[:ROAD_TO {distance: 370}]->(la)
CREATE (la)-[:ROAD_TO {distance: 370}]->(phoenix)

MATCH (la:City {name: "Los Angeles"}), (sf:City {name: "San Francisco"})
CREATE (la)-[:ROAD_TO {distance: 380}]->(sf)
CREATE (sf)-[:ROAD_TO {distance: 380}]->(la)

MATCH (sf:City {name: "San Francisco"}), (seattle:City {name: "Seattle"})
CREATE (sf)-[:ROAD_TO {distance: 810}]->(seattle)
CREATE (seattle)-[:ROAD_TO {distance: 810}]->(sf)

MATCH (seattle:City {name: "Seattle"}), (denver:City {name: "Denver"})
CREATE (seattle)-[:ROAD_TO {distance: 1300}]->(denver)
CREATE (denver)-[:ROAD_TO {distance: 1300}]->(seattle)

MATCH (denver:City {name: "Denver"}), (dallas:City {name: "Dallas"})
CREATE (denver)-[:ROAD_TO {distance: 790}]->(dallas)
CREATE (dallas)-[:ROAD_TO {distance: 790}]->(denver)

MATCH (denver:City {name: "Denver"}), (chicago:City {name: "Chicago"})
CREATE (denver)-[:ROAD_TO {distance: 1000}]->(chicago)
CREATE (chicago)-[:ROAD_TO {distance: 1000}]->(denver)

MATCH (chicago:City {name: "Chicago"}), (nyc:City {name: "New York"})
CREATE (chicago)-[:ROAD_TO {distance: 790}]->(nyc)
CREATE (nyc)-[:ROAD_TO {distance: 790}]->(chicago)

Now we have a connected network of major US cities with roads between them and the distances in miles. Let's visualize this by returning all cities and roads:

MATCH (c1:City)-[r:ROAD_TO]->(c2:City)
WHERE id(c1) < id(c2)  // To avoid duplicate paths
RETURN c1, r, c2

This will show our transportation network, which we'll use to demonstrate variable-length path queries and shortest path algorithms.

Variable Length Relationships

Neo4j allows us to query paths of variable length using a special syntax with asterisks. This is incredibly powerful for exploring networks where you don't know in advance how many hops separate two nodes.

Basic Variable Length Syntax

The basic syntax for variable length relationships is:

-[relationship_type*min..max]->

Where:

  • relationship_type specifies the type of relationship to follow
  • min is the minimum number of hops (optional, defaults to 1)
  • max is the maximum number of hops (optional)

If you omit both min and max, the syntax becomes *, which means "any number of hops."

Let's explore some examples:

Finding Cities Within 2 Road Connections

To find all cities that can be reached from New York by traveling through at most 2 roads:

MATCH (nyc:City {name: "New York"})-[:ROAD_TO*1..2]->(destination:City)
RETURN destination.name AS City, 
       CASE 
         WHEN destination.name = "New York" THEN "Starting Point"
         ELSE "Destination" 
       END AS Type

This will return cities directly connected to New York and those that are two roads away.

Finding All Possible Routes Between Cities

To find all possible routes between New York and Miami (regardless of length):

MATCH path = (:City {name: "New York"})-[:ROAD_TO*]->(:City {name: "Miami"})
RETURN path, length(path) AS hops
ORDER BY hops
LIMIT 5

Warning: This query might be computationally expensive on large graphs as it explores all possible paths. It's usually better to limit the maximum path length:

MATCH path = (:City {name: "New York"})-[:ROAD_TO*1..6]->(:City {name: "Miami"})
RETURN path, length(path) AS hops
ORDER BY hops
LIMIT 5

Extracting Node Information from Paths

When working with paths, you can extract information about the nodes and relationships using functions like nodes() and relationships():

MATCH path = (:City {name: "New York"})-[:ROAD_TO*1..4]->(:City {name: "Miami"})
WITH path, length(path) AS hops
ORDER BY hops
LIMIT 1
RETURN [node IN nodes(path) | node.name] AS cities,
       [rel IN relationships(path) | rel.distance] AS distances,
       reduce(total = 0, distance IN [rel IN relationships(path) | rel.distance] | total + distance) AS totalDistance

This query finds a route from New York to Miami, extracts the names of cities along the route, the distance of each road segment, and calculates the total distance.

The Shortest Path Algorithm

While variable-length relationships allow us to find all paths between nodes, we often want to find the most efficient route. Neo4j provides built-in functions for this:

  • shortestPath(): Finds the path with the fewest number of hops
  • allShortestPaths(): Finds all paths with the minimum number of hops

Finding the Shortest Path Between Cities

To find the shortest path (fewest number of roads) between New York and Los Angeles:

MATCH path = shortestPath((start:City {name: "New York"})-[:ROAD_TO*]-(end:City {name: "Los Angeles"}))
RETURN [node IN nodes(path) | node.name] AS route,
       length(path) AS hops,
       reduce(total = 0, rel IN relationships(path) | total + rel.distance) AS totalDistance

Notice that we're using undirected relationship syntax ()-[]-() which allows the path to traverse relationships in either direction.

Finding All Shortest Paths

Sometimes multiple paths have the same (minimum) number of hops. To find all such paths:

MATCH paths = allShortestPaths((start:City {name: "Dallas"})-[:ROAD_TO*]-(end:City {name: "Chicago"}))
RETURN [node IN nodes(paths) | node.name] AS route,
       length(paths) AS hops,
       reduce(total = 0, rel IN relationships(paths) | total + rel.distance) AS totalDistance
ORDER BY totalDistance

This query returns all routes from Dallas to Chicago that have the minimum number of hops, sorted by total distance.

Advanced Path Finding Techniques

Now let's explore some more advanced techniques for working with paths in Neo4j.

Finding the Truly Shortest Path by Distance

The shortestPath() function finds the path with the fewest hops, but this may not be the shortest in terms of distance. To find the path with the minimum total distance:

MATCH path = (:City {name: "New York"})-[:ROAD_TO*]->(:City {name: "Los Angeles"})
WITH path, reduce(total = 0, rel IN relationships(path) | total + rel.distance) AS totalDistance
ORDER BY totalDistance
LIMIT 1
RETURN [node IN nodes(path) | node.name] AS route,
       length(path) AS hops,
       totalDistance

This query evaluates all paths from New York to Los Angeles and returns the one with the minimum total distance.

Finding Cities Within a Certain Distance

To find all cities within 500 miles of Chicago (considering cumulative road distances):

MATCH path = (:City {name: "Chicago"})-[:ROAD_TO*1..10]-(destination:City)
WHERE destination.name <> "Chicago"
WITH destination, path, 
     reduce(total = 0, rel IN relationships(path) | total + rel.distance) AS totalDistance
WHERE totalDistance <= 500
RETURN DISTINCT destination.name AS city, totalDistance
ORDER BY totalDistance

This query finds all cities connected to Chicago and calculates the total distance along each path, filtering for those within 500 miles.

Finding the Most Central City

One interesting application of path finding is determining the most "central" node in a network. We can use average shortest path length as a measure of centrality:

MATCH (origin:City)
MATCH (destination:City)
WHERE origin <> destination
MATCH path = shortestPath((origin)-[:ROAD_TO*]-(destination))
WITH origin, avg(length(path)) AS avgPathLength
RETURN origin.name AS city, avgPathLength
ORDER BY avgPathLength
LIMIT 5

This query calculates the average shortest path length from each city to all other cities, giving us a measure of which cities are most centrally located in our road network.

Practical Applications

Let's explore some practical applications of variable length relationships and path finding.

Travel Planning

Imagine we're planning a road trip from Boston to San Francisco and want to visit certain cities along the way:

MATCH (start:City {name: "Boston"}),
      (end:City {name: "San Francisco"}),
      (denver:City {name: "Denver"})
MATCH p1 = shortestPath((start)-[:ROAD_TO*]-(denver))
MATCH p2 = shortestPath((denver)-[:ROAD_TO*]-(end))
WITH nodes(p1) + tail(nodes(p2)) as route
RETURN [node IN route | node.name] AS roadTripCities,
       reduce(total = 0, i IN range(0, size(route)-2) |
           total + size([
               (route[i])-[r:ROAD_TO]-(route[i+1]) | r.distance
           ])
       ) AS totalDistance

This query finds the shortest route from Boston to San Francisco that passes through Denver.

Route Optimization

For a delivery business, finding the optimal route through multiple cities is essential:

MATCH (origin:City {name: "New York"})
WITH origin
MATCH (stop1:City {name: "Washington DC"})
MATCH (stop2:City {name: "Atlanta"})
MATCH (stop3:City {name: "Dallas"})
MATCH (destination:City {name: "Houston"})

MATCH path1 = shortestPath((origin)-[:ROAD_TO*]-(stop1))
MATCH path2 = shortestPath((stop1)-[:ROAD_TO*]-(stop2))
MATCH path3 = shortestPath((stop2)-[:ROAD_TO*]-(stop3))
MATCH path4 = shortestPath((stop3)-[:ROAD_TO*]-(destination))

WITH 
  [node IN nodes(path1) | node.name] +
  tail([node IN nodes(path2) | node.name]) +
  tail([node IN nodes(path3) | node.name]) +
  tail([node IN nodes(path4) | node.name]) AS route,

  reduce(total = 0, rel IN relationships(path1) | total + rel.distance) +
  reduce(total = 0, rel IN relationships(path2) | total + rel.distance) +
  reduce(total = 0, rel IN relationships(path3) | total + rel.distance) +
  reduce(total = 0, rel IN relationships(path4) | total + rel.distance) AS totalDistance

RETURN route, totalDistance

This query calculates the most efficient route for a delivery truck that needs to visit multiple cities in a specific order.

Detecting Clusters

We can use variable length paths to detect clusters or communities in our network:

// Find cities that are at most 2 road connections away from each other
MATCH (c1:City)
MATCH (c2:City)
WHERE id(c1) < id(c2)
MATCH path = shortestPath((c1)-[:ROAD_TO*1..2]-(c2))
WHERE length(path) <= 2
WITH c1, collect(c2) AS nearby
RETURN c1.name AS city, [city IN nearby | city.name] AS nearbyCities, size(nearby) AS clusterSize
ORDER BY clusterSize DESC

This query identifies cities that are closely connected, forming natural geographic clusters.

Best Practices and Optimization Tips

When working with path queries in Neo4j, keep these best practices in mind:

  1. Always limit path length: Queries with unbounded path length (*) can be expensive. Use a reasonable upper bound (*1..n) whenever possible:
   // Better than using *
   MATCH path = (a)-[:ROAD_TO*1..10]-(b)
  1. Use directed relationships when possible: If you know the direction of the relationship, use directed syntax (-> or <-) rather than undirected (-), as it narrows the search space:
   // More efficient if you know direction
   MATCH path = (a)-[:ROAD_TO*1..5]->(b)
  1. Filter early: Apply filters as early as possible in the query to reduce the amount of data processed:
   // Better approach
   MATCH (a:City {name: "New York"}), (b:City {name: "Los Angeles"})
   MATCH path = shortestPath((a)-[:ROAD_TO*]-(b))

   // Less efficient
   MATCH path = shortestPath((a)-[:ROAD_TO*]-(b))
   WHERE a.name = "New York" AND b.name = "Los Angeles"
  1. Use PROFILE to analyze query performance: Add PROFILE before your query to see how Neo4j executes it and identify bottlenecks:
   PROFILE MATCH path = shortestPath((a:City {name: "New York"})-[:ROAD_TO*]-(b:City {name: "Los Angeles"}))
   RETURN path
  1. Consider using the Graph Data Science library: For advanced path-finding algorithms, consider using Neo4j's Graph Data Science library, which offers optimized implementations of many algorithms:
   // Example using GDS (requires GDS library to be installed)
   CALL gds.alpha.shortestPath.stream({
     nodeQuery: 'MATCH (n:City) RETURN id(n) as id',
     relationshipQuery: 'MATCH (n)-[r:ROAD_TO]->(m) RETURN id(n) as source, id(m) as target, r.distance as weight',
     startNode: $startNodeId,
     endNode: $endNodeId,
     relationshipWeightProperty: 'weight'
   })
   YIELD nodeId, cost
   MATCH (n) WHERE id(n) = nodeId
   RETURN n.name AS city, cost

Common Pitfalls and How to Avoid Them

When working with variable length paths and path-finding algorithms, be aware of these common pitfalls:

  1. Memory intensive operations:
    Finding all paths between two nodes in a densely connected graph can consume a lot of memory. Always test on a small dataset first and use path length limits.

  2. Confused about directionality:
    Be clear about whether you want to traverse relationships in a specific direction (->, <-) or in either direction (-).

  3. Not considering relationship properties:
    shortestPath() finds the path with the fewest hops, not necessarily the path with the minimum weight (e.g., distance). Use custom aggregations to find weighted shortest paths.

  4. Running into performance issues:
    If your queries are slow, consider adding indexes on properties used in your filters and use more specific patterns to reduce the search space.

Conclusion

In this tutorial, we've explored Neo4j's powerful capabilities for working with variable length relationships and finding paths in graphs. These features are what make graph databases truly shine for connected data analysis.

We've learned how to:

  • Query relationships of variable lengths using the * syntax
  • Find the shortest path between nodes using shortestPath()
  • Discover all shortest paths using allShortestPaths()
  • Extract information from paths using functions like nodes() and relationships()
  • Calculate weighted paths using aggregation functions
  • Apply these techniques to solve real-world problems

Whether you're building a social network, a recommendation engine, a logistics system, or any application that deals with connected data, these path-finding capabilities will be invaluable in your Neo4j toolkit.