Building Fault-Tolerant Order Processing with Paxos in a Distributed E-Commerce Store
In a distributed e-commerce system, ensuring that each order is processed exactly once—even when servers crash or messages get lost—is crucial. In this post, I’ll walk you through how the Paxos consensus algorithm is used in my Spring Boot backend to coordinate order creation across multiple servers and guarantee fault tolerance. Why Paxos? Consistency under failures: Even if some nodes crash or the network is unreliable, Paxos ensures that a single, agreed-upon value (here, the “create order” operation) is chosen exactly once. No single point of failure: Any node can propose an order, and the algorithm will still reach consensus as long as a majority of nodes are up. Asynchronous & leaderless: There’s no fixed leader; any node can initiate the consensus process. High-Level Flow Proposal: A client sends an order request (wrapped in a Proposal) to a set of Paxos servers. Promise: Each server replies with a Promise if it hasn’t promised a higher-numbered proposal. Accept: Once the proposer gathers a majority of promises, it sends an accept request. Learn: After a majority of servers accept, they “learn” the chosen operation and execute it (create the order). Key Interface All Paxos servers implement a simple interface (from com.arm.coordinator.common.PaxosServer): public interface PaxosServer { Promise promise(Proposal proposal); Boolean accept(Proposal proposal); Result learn(Proposal proposal, Integer userId); } The Promise Phase The promise(...) method ensures that no server will accept proposals older than the one it has already promised. In your PaxosOrderService: @Override public synchronized Promise promise(Proposal proposal) { serverLogger.info("Receive a promise message"); if (proposal.getId()

In a distributed e-commerce system, ensuring that each order is processed exactly once—even when servers crash or messages get lost—is crucial. In this post, I’ll walk you through how the Paxos consensus algorithm is used in my Spring Boot backend to coordinate order creation across multiple servers and guarantee fault tolerance.
Why Paxos?
- Consistency under failures: Even if some nodes crash or the network is unreliable, Paxos ensures that a single, agreed-upon value (here, the “create order” operation) is chosen exactly once.
- No single point of failure: Any node can propose an order, and the algorithm will still reach consensus as long as a majority of nodes are up.
- Asynchronous & leaderless: There’s no fixed leader; any node can initiate the consensus process.
High-Level Flow
-
Proposal: A client sends an order request (wrapped in a
Proposal
) to a set of Paxos servers. -
Promise: Each server replies with a
Promise
if it hasn’t promised a higher-numbered proposal. -
Accept: Once the proposer gathers a majority of promises, it sends an
accept
request. - Learn: After a majority of servers accept, they “learn” the chosen operation and execute it (create the order).
Key Interface
All Paxos servers implement a simple interface (from com.arm.coordinator.common.PaxosServer
):
public interface PaxosServer<T> {
Promise promise(Proposal proposal);
Boolean accept(Proposal proposal);
Result learn(Proposal proposal, Integer userId);
}
The Promise Phase
The promise(...)
method ensures that no server will accept proposals older than the one it has already promised. In your PaxosOrderService
:
@Override
public synchronized Promise promise(Proposal proposal) {
serverLogger.info("Receive a promise message");
if (proposal.getId() <= this.maxId) {
return new Promise(Status.REJECTED, null);
}
this.maxId = proposal.getId();
if (this.accepted != null) {
// If we’ve already accepted something, let the proposer know
return new Promise(Status.ACCEPTED,
new Proposal(accepted.getId(), accepted.getOperation()));
} else {
// Otherwise, promise not to accept lower proposals
return new Promise(Status.PROMISED, proposal);
}
}
Here, maxId
tracks the highest proposal ID seen so far. If the incoming proposal is newer, we record it and either report back the already-accepted proposal or promise to accept it later.
The Accept Phase
Once a proposer has a majority of PROMISED
responses, it issues an accept
request:
@Override
public synchronized Boolean accept(Proposal proposal) {
serverLogger.info("Received an accept message");
if (proposal.getId() != this.maxId) {
// Only accept if this is the latest promised proposal
return false;
}
// Record that we’ve accepted this proposal
accepted = new Proposal(proposal.getId(), proposal.getOperation());
serverLogger.info("Proposal successfully accepted");
return true;
}
By checking proposal.getId() == maxId
, we ensure consistency: servers will only accept the proposal they most recently promised.
The Learn Phase
After a majority of servers accept, the operation is “learned” and executed exactly once:
@Override
public synchronized Result learn(Proposal proposal, Integer userId) {
serverLogger.info("Received a learn message");
OrderForm form = proposal.getOperation().getOrderForm();
// Validate products, create Order and OrderProducts atomically
Order order = orderService.create(new Order(...));
// ... attach products, set status, etc.
Result result = new Result(true, ResultCodeEnum.ALL_OKAY, "Order created successful");
return result;
}
This final step writes the order to the database only once, even if some servers already crashed after accepting.
Why This Solves Fault Tolerance
- Majority Quorum: As long as a majority of servers respond, the proposal moves forward. Crashed or slow nodes can catch up later.
- Exactly-Once Semantics: By separating promise, accept, and learn phases, you avoid double-creation even if messages are retried.
- Recovery: New or recovering servers can learn the last accepted proposal and apply it to reach the same state.
Next Steps & Extensions
-
Persistent State: Store
maxId
andaccepted
in durable storage so servers can recover after a full restart. - Leader Election: Build a distinguished proposer to reduce round-trip latency.
- Batching: Group multiple operations per consensus round for higher throughput.
By weaving Paxos into your order service, you transform a vanilla Spring Boot app into a rock-solid, fault-tolerant distributed system. Feel free to explore the full code and fork it here: