Intake service
Intake service is a Rust binary located in services/intake/. Its sole
responsibility is ingestion.
Initialization Phase
When the service boots up:
- It reads
pipeline.yaml. - It loops through all defined
sources. - If a source defines an Avro
schema_path, the service registers this schema with the Confluent Schema Registry. - It creates the required Kafka topics if they do not exist.
The Event Loop
The service continuously listens to the S3 bucket notification topic. When a file arrives:
- Routing: It compares the S3 file path against the YAML configuration to find a match.
- Parsing: It dynamically selects the correct parser (e.g., CSV, JSON)
- Emitting: It encodes the parsed data (in Avro or JSON) and publishes it to the designated Kafka topic.