Configuration Reference
YAML Structure
Luminara pipelines can be defined declaratively using YAML files. This is the recommended way to configure pipelines for production, allowing you to decouple logic from code.
The root structure of a `luminara.yaml` file:
version: "1.0"
pipeline:
name: "example-etl"
settings:
concurrency: 4
buffer_size: 1000
timeout: 30s
source: ...
transform: ...
sink: ...
Global Settings
- version
- Required. The configuration schema version (must be "1.0").
- pipeline.name
- Optional. A unique identifier for metrics and logging.
- pipeline.settings.concurrency
- Default: 1. Number of worker coroutines per stage.
- pipeline.settings.buffer_size
- Default: 100. Max queue size between stages before backpressure kicks in.
- pipeline.settings.timeout
- Default: "60s". Global timeout for pipeline execution.
Stage Configuration
Sources
Defines where data enters the pipeline. Only one source is allowed per pipeline definition.
source:
type: "file"
path: "/data/input.csv"
format: "csv"
encoding: "utf-8"
Transforms
A list of transformation steps applied sequentially.
transform:
- type: "filter"
field: "status"
value: "active"
- type: "map"
script: "record['timestamp'] = now()"
Sinks
Defines where processed data is sent. Multiple sinks are supported (fan-out).
sink:
- type: "console"
format: "json"
- type: "postgres"
connection_string: "${DB_URL}"
table: "events"
Full Example
Here is a complete configuration for a pipeline that reads JSON logs, filters errors, and writes to a database:
version: "1.0"
pipeline:
name: "error-logs-processor"
settings:
concurrency: 8
buffer_size: 5000
retry_policy:
attempts: 3
backoff: "exponential"
source:
type: "kafka"
topic: "app-logs"
brokers: "kafka-broker:9092"
group_id: "log-processor-v1"
transform:
# Keep only error level logs
- type: "filter"
expression: "record['level'] == 'ERROR'"
# Anonymize user IDs
- type: "python"
module: "utils.privacy"
function: "mask_pii"
sink:
- type: "postgres"
dsn: "postgresql://user:pass@db:5432/logs"
table: "error_events"
batch_size: 100
- type: "slack"
webhook_url: "${SLACK_WEBHOOK}"
template: "New error detected: {{ record.message }}"
Environment Variables
You can reference environment variables in your configuration using the ${VAR_NAME} syntax. This is crucial for secrets management (passwords, API keys).