Scaling/microservices approach to reading files from same directory

My company receives files via SFTP. We currently have a service running on a timer that: polls the inbound directory moves files to an 'In Progress' directory processes files (queueing messages for other microservices to handle) moves them to a 'Done' directory. We run multiple copies of this service, and the solution in the past has been a mutually exclusive set of conditions (think alphabetical/filetype) to ensure that each service instance doesn't step on the others' toes. These conditions are manually set in the configuration, which results in inefficiency when we receive a bunch of files matching a single instance's condition. (All services are capable of processing all files) My question is this: Is there an industry standard way to scale up a file polling service? Ideally one that doesn't involve redrawing disjoint conditions each time? When trying to consider what a solution would look like, I often get stuck on: "use a manager service since each service instance shouldn't need to care about what the other service instances are doing" vs. "avoid unnecessary complexity, and a manager service would require additional overhead, maintenance, and developer time" I've also looked into C#'s FileSystemWatcher or other ways to turn new files into events and feed them into our existing event architecture, but it seems like FileSystemWatcher is known to be unreliable for large volumes of files without polling as a backup. I also can't find anything in our SFTP client that would allow for event hook-ins, nor am I sure that that would be an effective solution given the possibility of premature reads.

Jun 20, 2025 - 04:10
 0

My company receives files via SFTP. We currently have a service running on a timer that:

  1. polls the inbound directory
  2. moves files to an 'In Progress' directory
  3. processes files (queueing messages for other microservices to handle)
  4. moves them to a 'Done' directory.

We run multiple copies of this service, and the solution in the past has been a mutually exclusive set of conditions (think alphabetical/filetype) to ensure that each service instance doesn't step on the others' toes. These conditions are manually set in the configuration, which results in inefficiency when we receive a bunch of files matching a single instance's condition. (All services are capable of processing all files)

My question is this: Is there an industry standard way to scale up a file polling service? Ideally one that doesn't involve redrawing disjoint conditions each time? When trying to consider what a solution would look like, I often get stuck on:

"use a manager service since each service instance shouldn't need to care about what the other service instances are doing"

vs.

"avoid unnecessary complexity, and a manager service would require additional overhead, maintenance, and developer time"

I've also looked into C#'s FileSystemWatcher or other ways to turn new files into events and feed them into our existing event architecture, but it seems like FileSystemWatcher is known to be unreliable for large volumes of files without polling as a backup. I also can't find anything in our SFTP client that would allow for event hook-ins, nor am I sure that that would be an effective solution given the possibility of premature reads.