Introducing Syntrend - Synthetic Data made easy

Syntrend is a lightweight tool using an expressive project structure to generate randomized synthetic datasets for local development, quality assurance, load testing, and bug investigations. ✨ Introducing Syntrend! I built Syntrend as a response to a lack of tooling that helps developers build their products separate from production data pipelines. It's a synthetic data generator using YAML project files to generate random and calculated values and properties based off of expressions in the project. It's primary objectives are: Be Lightweight: I want it to run anywhere a developer works. Be Easy to Use: It should be easy to understand and use for all members of the team. Be Environment Agnostic: Developers can work offline or online, using local or remote workspaces. QA Engineers need to work in a wide variety of environment from local, remote, in CI Pipelines, and using integrated systems. Support As Many Data Types As Possible: Data takes many forms, so we should be able to generate data into those many forms using an extendable toolset. Be Expressive: Data can have a personality, and we need this data to express that personality wherever we are regardless of how much, how many, or how crazy that data can be. As of right now (v0.3.0), a 45KB package could potentially generate terabytes of data from a single project multiple times over expressing different patterns and behaviours you would see in production.

Mar 19, 2025 - 21:22

Syntrend is a lightweight tool using an expressive project structure to generate randomized synthetic datasets for local development, quality assurance, load testing, and bug investigations.

✨ Introducing Syntrend!

I built Syntrend as a response to a lack of tooling that helps developers build their products separate from production data pipelines. It's a synthetic data generator using YAML project files to generate random and calculated values and properties based off of expressions in the project.

It's primary objectives are:

Be Lightweight: I want it to run anywhere a developer works.
Be Easy to Use: It should be easy to understand and use for all members of the team.
Be Environment Agnostic: Developers can work offline or online, using local or remote workspaces. QA Engineers need to work in a wide variety of environment from local, remote, in CI Pipelines, and using integrated systems.
Support As Many Data Types As Possible: Data takes many forms, so we should be able to generate data into those many forms using an extendable toolset.
Be Expressive: Data can have a personality, and we need this data to express that personality wherever we are regardless of how much, how many, or how crazy that data can be.

As of right now (v0.3.0), a 45KB package could potentially generate terabytes of data from a single project multiple times over expressing different patterns and behaviours you would see in production.