Achieving High Genericity in Code

Building a Highly Generic Tool for Copying Transactional Data Years ago, I set out to build a tool to copy transactional data from one environment to another. A single transaction could span 350+ tables, though not all transactions used all tables. Different transaction types would touch different subsets of these tables. Given this complexity, coding this in Java seemed like a massive undertaking. It was my pet project that I wanted to do during weekends etc. besides my other important projects, so I had to find shortcuts and utilize high reusability of code. The challenge? Writing 100,000+ lines of code to handle individual table reads and inserts. Maintaining this code as table structures changed frequently to add new features almost every month. Ensuring key values were correctly generated, as they relied on Oracle sequences rather than direct copies. Minimizing manual effort, complexity, and long-term maintenance. I needed a solution that required minimal code, could adapt dynamically to schema changes, and eliminated ongoing maintenance. First Attempt: Scripts & Data Pump Utility Initially, I explored using scripts and the Oracle Data Pump utility, which supports complex queries. While this seemed like a viable option, I hit a political roadblock — the DB team didn’t allow non-DB personnel to run complex queries. Teams in large corporations can be territorial, and I had to find another way. The Breakthrough: A Highly Generic Approach I had to design this such that there is high re-use of code. I followed one of my core principles of genericity: If you write even a single line of code, ensure it gets reused by wrapping it inside a function. With that in mind, I designed the solution with a main orchestrator class and another class that implemented the database operations around just two functions with some supporting functions inside the class: A function to run a SELECT query. A function to run an INSERT query. No individual queries for 350+ tables, just a handful of highly generic functions! However, the queries needed to come from outside the code to avoid constant modifications. Dynamic Query Generation Using a Config File Instead of hardcoding queries, I structured everything using a configuration file that: Contained SELECT statements for each table in the correct sequence of insertion. Used subqueries to relate data back to the main primary key — sometimes with nested dependencies. Ensured foreign key relationships were handled dynamically. Eliminated code changes when tables evolved — only the config needed updating. Handling Schema Changes at Runtime Since the tool worked dynamically at runtime, I couldn’t rely on an object model. But that wasn’t a problem — every result set contained field names, effectively allowing dynamic object modeling. The next challenge? Inserting data with the correct datatype. Oracle’s sysCols, sysColumns and other metadata tables provided field details. I mapped data types properly to prevent query failures. Special handling for BLOBs, CLOBs, and other non-standard types was added. This approach meant we looped over every field in every table, determining its type dynamically and constructing the correct INSERT statements at runtime. Performance Bottleneck & Optimization The first run took over 40 minutes and this was totally unacceptable. The solution? Multi-threading. However, I strictly followed another personal principle for multi-threaded programming: Do not share data or resources between threads if you want true parallel execution. The existing config file wasn’t structured for parallel execution, so I: Grouped tables logically, allowing independent execution. Designed threads to run in parallel without dependencies. Ensured data integrity despite concurrent operations. This restructuring took a day of analyzing 350+ table relationships, but the payoff was enormous — execution time dropped from 40 minutes to under 1 minute! Connection Management I didn’t use Hikari connection pooling or any other similar JAR (I can’t recall why), but I had extensive C# experience handling database connections manually. So, I implemented a lightweight connection pooling mechanism — keeping connections open for a limited number of operations before refreshing them. I noticed this connection pooling was more beneficial as it was tailored to my use case. Impact & Value Delivered This tool dramatically improved the productivity of developers and QA teams: ✅ Drastically reduced test data setup time — from hours to mere minutes. ✅ Enabled easy replication of test scenarios across environments. ✅ Adapted dynamically to schema changes without modifying code. ✅ Required just ~5,000 lines of code instead of 100,000+ — a 95% reduction! Whenever a new table was added, only a single line was added in the config file. The code remained untouched — a true victory

Mar 29, 2025 - 22:10

Building a Highly Generic Tool for Copying Transactional Data

Years ago, I set out to build a tool to copy transactional data from one environment to another. A single transaction could span 350+ tables, though not all transactions used all tables. Different transaction types would touch different subsets of these tables. Given this complexity, coding this in Java seemed like a massive undertaking. It was my pet project that I wanted to do during weekends etc. besides my other important projects, so I had to find shortcuts and utilize high reusability of code.

The challenge?

Writing 100,000+ lines of code to handle individual table reads and inserts.
Maintaining this code as table structures changed frequently to add new features almost every month.
Ensuring key values were correctly generated, as they relied on Oracle sequences rather than direct copies.
Minimizing manual effort, complexity, and long-term maintenance.

I needed a solution that required minimal code, could adapt dynamically to schema changes, and eliminated ongoing maintenance.

First Attempt: Scripts & Data Pump Utility

Initially, I explored using scripts and the Oracle Data Pump utility, which supports complex queries. While this seemed like a viable option, I hit a political roadblock — the DB team didn’t allow non-DB personnel to run complex queries. Teams in large corporations can be territorial, and I had to find another way.

The Breakthrough: A Highly Generic Approach

I had to design this such that there is high re-use of code. I followed one of my core principles of genericity:

If you write even a single line of code, ensure it gets reused by wrapping it inside a function.

With that in mind, I designed the solution with a main orchestrator class and another class that implemented the database operations around just two functions with some supporting functions inside the class:

A function to run a SELECT query.
A function to run an INSERT query.
No individual queries for 350+ tables, just a handful of highly generic functions! However, the queries needed to come from outside the code to avoid constant modifications.

Dynamic Query Generation Using a Config File

Instead of hardcoding queries, I structured everything using a configuration file that:

Contained SELECT statements for each table in the correct sequence of insertion.
Used subqueries to relate data back to the main primary key — sometimes with nested dependencies.
Ensured foreign key relationships were handled dynamically.
Eliminated code changes when tables evolved — only the config needed updating.

Handling Schema Changes at Runtime

Since the tool worked dynamically at runtime, I couldn’t rely on an object model. But that wasn’t a problem — every result set contained field names, effectively allowing dynamic object modeling.

The next challenge? Inserting data with the correct datatype.

Oracle’s sysCols, sysColumns and other metadata tables provided field details.
I mapped data types properly to prevent query failures.
Special handling for BLOBs, CLOBs, and other non-standard types was added.

This approach meant we looped over every field in every table, determining its type dynamically and constructing the correct INSERT statements at runtime.

Performance Bottleneck & Optimization

The first run took over 40 minutes and this was totally unacceptable. The solution? Multi-threading.

However, I strictly followed another personal principle for multi-threaded programming:

Do not share data or resources between threads if you want true parallel execution.

The existing config file wasn’t structured for parallel execution, so I:

Grouped tables logically, allowing independent execution.
Designed threads to run in parallel without dependencies.
Ensured data integrity despite concurrent operations.

This restructuring took a day of analyzing 350+ table relationships, but the payoff was enormous — execution time dropped from 40 minutes to under 1 minute!

Connection Management

I didn’t use Hikari connection pooling or any other similar JAR (I can’t recall why), but I had extensive C# experience handling database connections manually. So, I implemented a lightweight connection pooling mechanism — keeping connections open for a limited number of operations before refreshing them. I noticed this connection pooling was more beneficial as it was tailored to my use case.

Impact & Value Delivered

This tool dramatically improved the productivity of developers and QA teams:
✅ Drastically reduced test data setup time — from hours to mere minutes.
✅ Enabled easy replication of test scenarios across environments.
✅ Adapted dynamically to schema changes without modifying code.
✅ Required just ~5,000 lines of code instead of 100,000+ — a 95% reduction!

Whenever a new table was added, only a single line was added in the config file. The code remained untouched — a true victory in generic, maintainable software design.

And yes, for those curious — the UI was built in AngularJS.

Final Thoughts

This was one of my favorite pet projects because it embodied high genericity, efficiency, and maintainability. I love designing solutions that eliminate unnecessary work, reduce technical debt, and maximize long-term value and this tool was a perfect example of that. I wanted to give it to other groups with other applications as the code did not have any application specific implementation. The config file needed changes as this is a different database. Also, the connection configuration needed to point to the new database to populate in the dropdown. So, it was essentially a product that could be used by any team utilizing any Oracle DB schema. Hope you learnt a thing or two — I always look back to it with a sense of pride and satisfaction.

LinkedIn: https://www.linkedin.com/in/vaseem-anjum-07514a7/

My new Pet Project is a No Code Low Code Digitization Platform, check it out here: https://www.digitifynow.com/