ArkFlow - High-performance Rust stream processing engine
ArkFlow github: https://github.com/chenquan/arkflow High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors. Features High Performance: Built on Rust and Tokio async runtime, offering excellent performance and low latency Multiple Data Sources: Support for Kafka, MQTT, HTTP, files, and other input/output sources Powerful Processing Capabilities: Built-in SQL queries, JSON processing, Protobuf encoding/decoding, batch processing, and other processors Extensible: Modular design, easy to extend with new input, output, and processor components Installation Building from Source # Clone the repository git clone https://github.com/chenquan/arkflow.git cd arkflow # Build the project cargo build --release # Run tests cargo test Quick Start Create a configuration file config.yaml: logging: level: info streams: - input: type: "generate" context: '{ "timestamp": 1625000000000, "value": 10, "sensor": "temp_1" }' interval: 1s batch_size: 10 pipeline: thread_num: 4 processors: - type: "json_to_arrow" - type: "sql" query: "SELECT * FROM flow WHERE value >= 10" - type: "arrow_to_json" output: type: "stdout" Run ArkFlow: ./target/release/arkflow --config config.yaml Configuration Guide ArkFlow uses YAML format configuration files, supporting the following main configuration items: Top-level Configuration logging: level: info # Log level: debug, info, warn, error streams: # Stream definition list - input: # Input configuration # ... pipeline: # Processing pipeline configuration # ... output: # Output configuration # ... Input Components ArkFlow supports multiple input sources: Kafka: Read data from Kafka topics MQTT: Subscribe to messages from MQTT topics HTTP: Receive data via HTTP File: Read data from files Generator: Generate test data SQL: Query data from databases Example: input: type: kafka brokers: - localhost:9092 topics: - test-topic consumer_group: test-group client_id: arkflow start_from_latest: true Processors ArkFlow provides multiple data processors: JSON: JSON data processing and transformation SQL: Process data using SQL queries Protobuf: Protobuf encoding/decoding Batch Processing: Process messages in batches Example: pipeline: thread_num: 4 processors: - type: json_to_arrow - type: sql query: "SELECT * FROM flow WHERE value >= 10" - type: arrow_to_json Output Components ArkFlow supports multiple output targets: Kafka: Write data to Kafka topics MQTT: Publish messages to MQTT topics HTTP: Send data via HTTP File: Write data to files Standard Output: Output data to the console Example: output: type: kafka brokers: - localhost:9092 topic: output-topic client_id: arkflow-producer Examples Kafka to Kafka Data Processing streams: - input: type: kafka brokers: - localhost:9092 topics: - test-topic consumer_group: test-group pipeline: thread_num: 4 processors: - type: json_to_arrow - type: sql query: "SELECT * FROM flow WHERE value > 100" - type: arrow_to_json output: type: kafka brokers: - localhost:9092 topic: processed-topic Generate Test Data and Process streams: - input: type: "generate" context: '{ "timestamp": 1625000000000, "value": 10, "sensor": "temp_1" }' interval: 1ms batch_size: 10000 pipeline: thread_num: 4 processors: - type: "json_to_arrow" - type: "sql" query: "SELECT count(*) FROM flow WHERE value >= 10 group by sensor" - type: "arrow_to_json" output: type: "stdout" License ArkFlow is licensed under the Apache License 2.0.

ArkFlow
github: https://github.com/chenquan/arkflow
High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors.
Features
- High Performance: Built on Rust and Tokio async runtime, offering excellent performance and low latency
- Multiple Data Sources: Support for Kafka, MQTT, HTTP, files, and other input/output sources
- Powerful Processing Capabilities: Built-in SQL queries, JSON processing, Protobuf encoding/decoding, batch processing, and other processors
- Extensible: Modular design, easy to extend with new input, output, and processor components
Installation
Building from Source
# Clone the repository
git clone https://github.com/chenquan/arkflow.git
cd arkflow
# Build the project
cargo build --release
# Run tests
cargo test
Quick Start
- Create a configuration file
config.yaml
:
logging:
level: info
streams:
- input:
type: "generate"
context: '{ "timestamp": 1625000000000, "value": 10, "sensor": "temp_1" }'
interval: 1s
batch_size: 10
pipeline:
thread_num: 4
processors:
- type: "json_to_arrow"
- type: "sql"
query: "SELECT * FROM flow WHERE value >= 10"
- type: "arrow_to_json"
output:
type: "stdout"
- Run ArkFlow:
./target/release/arkflow --config config.yaml
Configuration Guide
ArkFlow uses YAML format configuration files, supporting the following main configuration items:
Top-level Configuration
logging:
level: info # Log level: debug, info, warn, error
streams: # Stream definition list
- input: # Input configuration
# ...
pipeline: # Processing pipeline configuration
# ...
output: # Output configuration
# ...
Input Components
ArkFlow supports multiple input sources:
- Kafka: Read data from Kafka topics
- MQTT: Subscribe to messages from MQTT topics
- HTTP: Receive data via HTTP
- File: Read data from files
- Generator: Generate test data
- SQL: Query data from databases
Example:
input:
type: kafka
brokers:
- localhost:9092
topics:
- test-topic
consumer_group: test-group
client_id: arkflow
start_from_latest: true
Processors
ArkFlow provides multiple data processors:
- JSON: JSON data processing and transformation
- SQL: Process data using SQL queries
- Protobuf: Protobuf encoding/decoding
- Batch Processing: Process messages in batches
Example:
pipeline:
thread_num: 4
processors:
- type: json_to_arrow
- type: sql
query: "SELECT * FROM flow WHERE value >= 10"
- type: arrow_to_json
Output Components
ArkFlow supports multiple output targets:
- Kafka: Write data to Kafka topics
- MQTT: Publish messages to MQTT topics
- HTTP: Send data via HTTP
- File: Write data to files
- Standard Output: Output data to the console
Example:
output:
type: kafka
brokers:
- localhost:9092
topic: output-topic
client_id: arkflow-producer
Examples
Kafka to Kafka Data Processing
streams:
- input:
type: kafka
brokers:
- localhost:9092
topics:
- test-topic
consumer_group: test-group
pipeline:
thread_num: 4
processors:
- type: json_to_arrow
- type: sql
query: "SELECT * FROM flow WHERE value > 100"
- type: arrow_to_json
output:
type: kafka
brokers:
- localhost:9092
topic: processed-topic
Generate Test Data and Process
streams:
- input:
type: "generate"
context: '{ "timestamp": 1625000000000, "value": 10, "sensor": "temp_1" }'
interval: 1ms
batch_size: 10000
pipeline:
thread_num: 4
processors:
- type: "json_to_arrow"
- type: "sql"
query: "SELECT count(*) FROM flow WHERE value >= 10 group by sensor"
- type: "arrow_to_json"
output:
type: "stdout"
License
ArkFlow is licensed under the Apache License 2.0.