v0.8.2

Configuration Reference

YAML Structure

Luminara pipelines can be defined declaratively using YAML files. This is the recommended way to configure pipelines for production, allowing you to decouple logic from code.

The root structure of a `luminara.yaml` file:

version: "1.0"
pipeline:
  name: "example-etl"
  settings:
    concurrency: 4
    buffer_size: 1000
    timeout: 30s
  source: ...
  transform: ...
  sink: ...

Global Settings

version
Required. The configuration schema version (must be "1.0").
pipeline.name
Optional. A unique identifier for metrics and logging.
pipeline.settings.concurrency
Default: 1. Number of worker coroutines per stage.
pipeline.settings.buffer_size
Default: 100. Max queue size between stages before backpressure kicks in.
pipeline.settings.timeout
Default: "60s". Global timeout for pipeline execution.

Stage Configuration

Sources

Defines where data enters the pipeline. Only one source is allowed per pipeline definition.

source:
  type: "file"
  path: "/data/input.csv"
  format: "csv"
  encoding: "utf-8"

Transforms

A list of transformation steps applied sequentially.

transform:
  - type: "filter"
    field: "status"
    value: "active"
  - type: "map"
    script: "record['timestamp'] = now()"

Sinks

Defines where processed data is sent. Multiple sinks are supported (fan-out).

sink:
  - type: "console"
    format: "json"
  - type: "postgres"
    connection_string: "${DB_URL}"
    table: "events"

Full Example

Here is a complete configuration for a pipeline that reads JSON logs, filters errors, and writes to a database:

version: "1.0"

pipeline:
  name: "error-logs-processor"
  
  settings:
    concurrency: 8
    buffer_size: 5000
    retry_policy:
      attempts: 3
      backoff: "exponential"

  source:
    type: "kafka"
    topic: "app-logs"
    brokers: "kafka-broker:9092"
    group_id: "log-processor-v1"

  transform:
    # Keep only error level logs
    - type: "filter"
      expression: "record['level'] == 'ERROR'"
    
    # Anonymize user IDs
    - type: "python"
      module: "utils.privacy"
      function: "mask_pii"

  sink:
    - type: "postgres"
      dsn: "postgresql://user:pass@db:5432/logs"
      table: "error_events"
      batch_size: 100
      
    - type: "slack"
      webhook_url: "${SLACK_WEBHOOK}"
      template: "New error detected: {{ record.message }}"

Environment Variables

You can reference environment variables in your configuration using the ${VAR_NAME} syntax. This is crucial for secrets management (passwords, API keys).