API Reference

This page documents the public API for the Luminara framework. All asynchronous methods must be awaited.

Module: luminara.pipeline

The main orchestrator for data flows.

class Pipeline(source: Source, stages: List[Stage] = [], sink: Sink = None, **kwargs)

source: Source: The data source object. Must inherit from luminara.stages.Source.
stages: List[Stage]: A list of transformation stages to apply in sequence.
sink: Sink | List[Sink]: The destination for processed data. Can be a single Sink or a list (for fan-out).
**kwargs: Additional configuration (e.g., buffer_size, concurrency).

async run() -> None: Starts the pipeline execution. This method blocks until the source is exhausted and all buffers are drained.
async stop() -> None: Gracefully shuts down the pipeline, allowing in-flight messages to complete.

Base class for all data inputs.

async read() -> AsyncGenerator[Dict, None]: Abstract. Must yield dictionary records. This is the entry point for data into the pipeline.

Base class for data manipulation.

async process(record: Dict) -> Dict | None: Abstract. Receives a record and returns a modified record. Return None to drop the record (filtering).

Base class for data outputs.

async write(record: Dict) -> None: Abstract. Receives a fully processed record and handles side effects (writing to DB, API, etc).
async write_batch(records: List[Dict]) -> None: Optional. Override for batch processing. Default implementation calls write() for each record.

Data validation utility built on Pydantic.

class Schema(model: Type[pydantic.BaseModel], on_error: str = "raise")

model: A Pydantic model class defining the expected data structure.
on_error: Action to take on validation failure: "raise" (default), "drop", or "log".