Chapter 2: Designing Data-Intensive Applications - Data models - Part 2 : Query Languages

In the first part of this chapter, we discussed relational databases and NoSQL systems. Now, let’s look at something fundamental to how we interact with data: query languages. This introduces the differences between declarative and imperative approaches, their applications in web development, and even how distributed systems like MapReduce fit into this picture. Declarative vs. Imperative Querying When relational databases were introduced, they didn’t just offer a new way of storing data—they introduced a simpler way of querying it. SQL became the standard, and it’s a declarative query language, unlike earlier systems like IMS and CODASYL, which used imperative code. What’s the Difference? Here’s an example of imperative code in JavaScript to find all sharks in a dataset: function getSharks() { var sharks = []; for (var i = 0; i p { background-color: blue; } This selector simply declares the pattern of elements to style. In contrast, here’s an imperative JavaScript version: var liElements = document.getElementsByTagName("li"); for (var i = 0; i

May 1, 2025 - 08:07

Chapter 2: Designing Data-Intensive Applications - Data models - Part 2 : Query Languages

In the first part of this chapter, we discussed relational databases and NoSQL systems. Now, let’s look at something fundamental to how we interact with data: query languages. This introduces the differences between declarative and imperative approaches, their applications in web development, and even how distributed systems like MapReduce fit into this picture.

Declarative vs. Imperative Querying

When relational databases were introduced, they didn’t just offer a new way of storing data—they introduced a simpler way of querying it. SQL became the standard, and it’s a declarative query language, unlike earlier systems like IMS and CODASYL, which used imperative code.

What’s the Difference?

Here’s an example of imperative code in JavaScript to find all sharks in a dataset:

function getSharks() {
  var sharks = [];
  for (var i = 0; i < animals.length; i++) {
    if (animals[i].family === "Sharks") {
      sharks.push(animals[i]);
    }
  }
  return sharks;
}

Step by step, you’re telling the computer how to do it: loop through the data, check a condition, and store the result.

Now look at the declarative version:

SELECT * FROM animals WHERE family = 'Sharks';

You just specify what you want—“Give me all animals where the family is Sharks”—and let the database figure out how to get it.

Why Declarative Wins

Simplicity: Declarative languages let you focus on the result, not the process.
Optimization: The database can optimize how it executes the query (e.g., using indexes or parallel processing).
Parallel Execution: Declarative code doesn’t rely on strict execution order, making it easier to run across multiple cores or machines.

Declarative Queries on the Web

This difference isn’t just for databases—it’s everywhere, including web development. Let’s take a simple example:


   class="selected">
    Sharks
    

      Great White Shark
      Tiger Shark
      Hammerhead Shark
    
  
  
    Whales
    

      Blue Whale
      Humpback Whale
      Fin Whale

You want to highlight the

element of the selected

with a blue background. In CSS (declarative), you’d write:

li.selected > p {
  background-color: blue;
}

This selector simply declares the pattern of elements to style.

In contrast, here’s an imperative JavaScript version:

var liElements = document.getElementsByTagName("li");
for (var i = 0; i < liElements.length; i++) {
  if (liElements[i].className === "selected") {
    var children = liElements[i].childNodes;
    for (var j = 0; j < children.length; j++) {
      var child = children[j];
      if (child.nodeType === Node.ELEMENT_NODE && child.tagName === "P") {
        child.setAttribute("style", "background-color: blue");
      }
    }
  }
}

Why Declarative is Better

Cleaner Code: The CSS snippet is shorter and easier to understand.
Dynamic Updates: If the selected class changes dynamically, CSS handles it automatically, but the JavaScript version would require extra work.

MapReduce Querying

So far, we’ve talked about declarative vs. imperative approaches in databases and web development. Let’s now look at distributed systems, starting with MapReduce.

MapReduce is a programming model popularized by Google for processing large datasets across many machines. It’s not fully declarative or imperative—it’s somewhere in between.

Example Use Case

Say you’re a marine biologist tracking animal sightings. You want a report showing the number of sharks observed per month.

In PostgreSQL, you’d write:

SELECT date_trunc('month', observation_timestamp) AS observation_month,
       sum(num_animals) AS total_animals
FROM observations
WHERE family = 'Sharks'
GROUP BY observation_month;

In MongoDB’s MapReduce, it looks like this:

db.observations.mapReduce(
  function map() {
    var year = this.observationTimestamp.getFullYear();
    var month = this.observationTimestamp.getMonth() + 1;
    emit(year + "-" + month, this.numAnimals);
  },
  function reduce(key, values) {
    return Array.sum(values);
  },
  {
    query: { family: "Sharks" },
    out: "monthlySharkReport"
  }
);

Here’s what’s happening:

Map Function: Processes each document, emitting key-value pairs like "2025-05" and numAnimals.
Reduce Function: Groups data by keys (e.g., months) and performs calculations (e.g., summing up observations).

Pros and Cons

Flexibility: The Map and Reduce functions can include custom logic.
Complexity: Writing two coordinated functions is harder than a single declarative query.

Aggregation Pipelines

Recognizing the complexity of MapReduce, MongoDB introduced the aggregation pipeline in version 2.2. This is a declarative query language with a JSON-based syntax.

Here’s how our shark observation query would look:

db.observations.aggregate([
  { $match: { family: "Sharks" } },
  { $group: {
      _id: {
        year: { $year: "$observationTimestamp" },
        month: { $month: "$observationTimestamp" }
      },
      totalAnimals: { $sum: "$numAnimals" }
    }
  }
]);

The aggregation pipeline is simpler and easier to use than MapReduce while still being powerful.

Takeaways

Declarative Query Languages: Abstract away the "how," making them easier to write and optimize.
Imperative Code: Useful in some cases but often verbose and harder to maintain.
MapReduce: A hybrid approach, powerful but more complex than declarative alternatives.
Aggregation Pipelines: Offer a declarative, user-friendly alternative to MapReduce for distributed querying.

Declarative querying isn’t just about writing less code—it’s about making systems smarter, more efficient, and easier to maintain. In the next chapter, we’ll continue exploring the tools and patterns that make modern data systems tick.

Conclusion

Understanding the distinction between declarative and imperative query languages is crucial for designing scalable and maintainable systems. Declarative approaches, like SQL and aggregation pipelines, allow for cleaner, more optimized queries that abstract away complexity—making them ideal for modern, data-intensive applications. Meanwhile, imperative models still have their place when fine-grained control or custom logic is needed, especially in distributed systems. Striking the right balance is key to building robust architectures.