Preferred approach to a matching process

I am currently in the process of putting together a matching algorithm. The matching process is as follows: Query data is used to perform a "lookup" on a set of reference data in order to determine an applicable id which is then later used in a sales order When the lookup data is queried, quite often more than one result is returned because the query data typically yields a broad number of matches (meaning there is more than one id to choose from to use in the sales order). In order to identify only one unique row, a series of rules are applied (using query data as the source) in order to narrow the results set, for example in the absence of value x then use value y, and so on Once the lookup query result has narrowed sufficiently so as to only identify a single unique row, the id is extracted and a sales order is created I am interested in hearing of different/preferred approaches to this problem. Currently, I approach this as follows: Call a simple data query to extract data from the "lookup" table on a "like for like" basis comparing values that are guaranteed to match to give an initial reduced results set With the resulting (disconnected) data set, I then apply a custom method to refine the result set further, continually narrowing a query until I find only one row If at any point I narrow in so much as to create a zero count result set, I undo the last applied query and "roll back" to the previous query In principal I am happy with this approach, but I am also aware that there may be potential performance impacts with large volumes of data. In an ideal world, I would have the up-front data query dependable enough that it returned only a single result, but I am also concerned that I don't want to over-bloat that layer with too much logic surrounding how the various "rules" should be applied. Can anyone suggest a different approach I should consider? Thanks in advance

May 4, 2025 - 13:18

I am currently in the process of putting together a matching algorithm. The matching process is as follows:

Query data is used to perform a "lookup" on a set of reference data in order to determine an applicable id which is then later used in a sales order
When the lookup data is queried, quite often more than one result is returned because the query data typically yields a broad number of matches (meaning there is more than one id to choose from to use in the sales order). In order to identify only one unique row, a series of rules are applied (using query data as the source) in order to narrow the results set, for example in the absence of value x then use value y, and so on
Once the lookup query result has narrowed sufficiently so as to only identify a single unique row, the id is extracted and a sales order is created

I am interested in hearing of different/preferred approaches to this problem. Currently, I approach this as follows:

Call a simple data query to extract data from the "lookup" table on a "like for like" basis comparing values that are guaranteed to match to give an initial reduced results set
With the resulting (disconnected) data set, I then apply a custom method to refine the result set further, continually narrowing a query until I find only one row
If at any point I narrow in so much as to create a zero count result set, I undo the last applied query and "roll back" to the previous query

In principal I am happy with this approach, but I am also aware that there may be potential performance impacts with large volumes of data. In an ideal world, I would have the up-front data query dependable enough that it returned only a single result, but I am also concerned that I don't want to over-bloat that layer with too much logic surrounding how the various "rules" should be applied.

Can anyone suggest a different approach I should consider?

Thanks in advance