Architecture¶
This page contains all information about the architecture of the IoT Kafka Data Platform project.
Introduction¶
The IoT Kafka Data Platform project is a showcase of the data engineering capabilities of Aurelius Enterprise in the context of smart manufacturing and Industry 4.0. The project simulates a production line of a factory that produces metal parts. The line is equipped with sensors that measure various parameters of the production process. The data from the sensors is collected and processed in real-time to enable various use cases such as predictive maintenance, advanced production planning, and process optimization.
Key Requirements¶
The IoT Kafka Data Platform project is designed to meet the following key requirements:
- Real-time Data Processing: The system must be able to process data from the sensors in real-time to enable real-time monitoring and control of the production process.
- OT Integration: The system must be able to communicate with the equipment and sensors using common Operational Technology (OT) protocols such as OPC UA and MQTT.
- IT Integration: The system must be able to integrate with other enterprise systems like ERP and MES to enable end-to-end automation of the production process.
- Scalability: The system must be able to scale horizontally to handle a large number of sensors and a high volume of data.
- Fault Tolerance: The system must be fault-tolerant to ensure high availability and reliability.
- Security: The system must be secure to protect data integrity and prevent unauthorized access.
- Observability: The system must provide monitoring and logging capabilities to enable process control and troubleshooting.
Architecture Overview¶
The IoT Kafka Data Platform project consists of the following components:
graph TB
subgraph IIoT[<a href='#industrial-internet-of-things'>Industrial Internet of Things</a>]
EdgeDevices["Edge Devices"]
EdgeGateway["Edge Gateway"]
end
subgraph Kubernetes[<a href='#kubernetes-cluster'>Kubernetes Cluster</a>]
subgraph DataSource[<a href='#upstream-data-sources'>Upstream Data Sources</a>]
UpstreamPostgreSQL[("PostgreSQL")]
end
subgraph Platform[<a href='#data-streaming-platform'>Data Streaming Platform</a>]
KafkaConnect["Kafka Connect"]
Kafka["Apache Kafka"]
SchemaRegistry[("Schema Registry")]
subgraph StreamProcessing[<a href='#stream-processing'>Stream Processing</a>]
NodeRED["Node-RED"]
Microservices["Microservices"]
KSQL["KSQL"]
end
end
subgraph DownstreamConsumers[<a href='#downstream-consumers'>Downstream Consumers</a>]
DownstreamPostgreSQL[("TimescaleDB")]
Grafana
end
end
EdgeDevices -->|"OT Data"| EdgeGateway
EdgeGateway --->|"OPC UA / MQTT"| NodeRED
NodeRED --> Kafka
UpstreamPostgreSQL -->|"JSON"| KafkaConnect
Kafka <--> KafkaConnect
Microservices <--> Kafka
KSQL <--> Kafka
KafkaConnect -->|"AVRO"| DownstreamPostgreSQL
DownstreamPostgreSQL -->|"Data"| Grafana
SchemaRegistry -.->|"Schemas"| KSQL
SchemaRegistry -.->|"Schemas"| KafkaConnect
SchemaRegistry -.->|"Schemas"| Microservices
Industrial Internet of Things¶
Our partner Reniver provides the edge devices and gateway for the production line. The edge devices include sensors and actuators that measure various parameters of the production process. The edge gateway collects data from the edge devices and forwards it to the data streaming platform over OPC UA or MQTT.
Kubernetes Cluster¶
All components of the platform that are the responsibility of Aurelius Enterprise shall be deployed on a Kubernetes cluster. The cluster provides a scalable and fault-tolerant infrastructure that is supported by all major cloud providers, and can also be deployed also on-premises.
Upstream Data Sources¶
In addition to the data from the edge devices, the system can also ingest data from other upstream data sources such as ERP and MES systems. For the Expo project, we will use a PostgreSQL database as an example upstream data source. The database can be deployed on the Kubernetes cluster.
Data Streaming Platform¶
The data from the edge devices is ingested into a data streaming platform that aggregates data streams from edge devices and other upstream data sources, and publishes them to downstream consumers. The platform is built on Apache Kafka and deployed on a Kubernetes cluster.
Kafka Connect can be used to ingest data from upstream data sources and also to publish data to downstream consumers.
Node-RED can also be used to connect to data sources for which no connector is available. For example, we use Node-RED to connect to the MQTT broker provided by Reniver since the MQTT connector for Kafka Connect is not available under a free-use license.
Stream Processing¶
Since Kafka connectivity is supported by many stream processing frameworks, the platform does not prescribe a specific stream processing solution. A microservice architecture is however recommended. The provided Kubernetes cluster is well suited to this purpose. A schema registry is also provided for data serialization and deserialization.
Some recommended frameworks for stream processing include KSQL or Node-RED for low complexity use cases and NestJS with Redis for more complex ones. For use cases with very high data volumes, Apache Flink is recommended.
This example architecture uses a combination of KSQL, Node-RED and NestJS with Redis for stream processing.
Downstream Consumers¶
These are the applications that consume the data streams from the data streaming platform. Examples include enterprise systems like ERP and MES, as well as custom dashboards and monitoring tools.
This example solution includes some dashboards in Grafana to illustrate the various use cases enabled by the architecture.
Recommended databases include PostgreSQL with the TimescaleDB plugin since it provides capabilities for both time-series and relational data, as well as scalability and resilience.
A popular alternative for time-series data is InfluxDB. It provides a more specialized solution for time-series data but lacks the relational capabilities of TimescaleDB.
Observability¶
The solution should provide monitoring and logging capabilities to enable process control and troubleshooting. The observability stack is yet to be determined.
Reniver Data Flow¶
The data flow from the edge devices to the data streaming platform is as follows:
graph LR
subgraph Reniver["Reniver"]
EdgeDevices["Edge Devices"]
EdgeGateway["Edge Gateway"]
end
subgraph Kubernetes
subgraph Platform["Data Streaming Platform"]
NodeRED["Node-RED"]
KafkaConnect["Kafka Connect"]
Kafka["Apache Kafka"]
SchemaRegistry[("Schema Registry")]
subgraph StreamProcessing["Stream Processing"]
KSQL["KSQL"]
end
end
subgraph DownstreamConsumers["Downstream Consumers"]
DownstreamPostgreSQL[("TimescaleDB")]
Grafana
end
end
EdgeDevices -->|"Events"| EdgeGateway
EdgeGateway -->|"OPC UA / MQTT"| NodeRED
NodeRED -->|"JSON"| Kafka
Kafka <-->|"AVRO"| KafkaConnect
Kafka <-->|"JSON / AVRO"| KSQL
KafkaConnect --->|"AVRO"| DownstreamPostgreSQL
DownstreamPostgreSQL -->|"Data"| Grafana
KSQL -.->|"Schemas"| SchemaRegistry
KafkaConnect -.->|"Schemas"| SchemaRegistry
The edge devices send events to the edge gateway, which forwards them to Node-RED over MQTT. Node-RED processes the events, converts them to JSON and publishes them to Kafka. Before we can sink the data to TimescaleDB, we need to convert it to AVRO. This is done KSQL. The schema is stored in the Schema Registry. The data is then consumed through Kafka Connect using the JDBC sink connector and stored in TimescaleDB. TimescaleDB processes the data into several views that are then consumed by Grafana.
Release process¶
To support an agile release process, the architecture features a CI/CD pipeline that adheres to the principles of a traditional DTAP process. The pipeline consists of the following stages:
Development¶
The development environment is used by developers to build and test new features. Each developer has their own development environment that is isolated from other developers. The development environment is deployed as a development container.
Test¶
The test environment is used to test new features before they are deployed to production. The test environment is set up as a GitHub Actions workflow that runs automated tests on the codebase using the development container as a base image.
The following types of testing are performed:
- Code Quality: Code is checked for formatting, linting, and other quality metrics.
- Unit Testing: Tests individual units of code in isolation to ensure they work correctly.
- Integration Testing: Tests the interaction between different units of code to ensure they work together.
- End-to-End Testing: Tests the entire system to ensure it works as expected.
- Peer Review: Code is reviewed by other developers to ensure it meets the project's coding standards.
graph LR
subgraph Local
subgraph Development
subgraph Developer
Impl["Implement feature"]
end
Precommit["Pre-commit"]
end
end
subgraph GitHub
subgraph CI
FeatureBranch["Feature branch"]
Premerge["Pre-merge"]
MainBranch["Main branch"]
end
end
Impl -->|"Changes"| Precommit
Precommit -->|"Formatting & linting"| FeatureBranch
FeatureBranch -->|"Pull request"| Premerge
Premerge -->|"Unit, integration & end-to-end testing\nPeer review"| MainBranch
Acceptance¶
The acceptance environment is used to validate new features with key users and stakeholders before they are released to a wider audience. The acceptance environment is a Kubernetes cluster that is built from a helm chart that is tagged with a pre-release version number.
Production¶
The production environment is the live environment that is used by end-users. The production environment is a Kubernetes cluster that is built from a helm chart that is tagged with a release version number.
Release Pipeline¶
The helm chart depends on various docker images of the components of the architecture. These images are built using GitHub Actions workflows that are triggered by the push of a new tag to the repository. All images are tagged with the new version number and published to the GitHub Container Registry.
Once all docker images are built, the helm chart is updated to reference the new images. It is then published to the GitHub Container Registry with the same version number used for the Docker images.
Finally, the documentation is updated to reflect the new release and published to the project website on Cloudflare.
graph LR
subgraph Local
subgraph Developer
Release["Generate release"]
end
end
subgraph GitHub
subgraph GitHub Actions
Build["Build Docker Images"]
Distribute["Package Helm Chart"]
Documentation["Build Documentation"]
end
GHCR[("GitHub Container Registry")]
Cloudflare[("Cloudflare")]
end
Release -->|"tag (vX.Y.Z)"|Build
Build -->|"Docker image"| GHCR
Release -->|"tag (vX.Y.Z)"| Distribute
Distribute -->|"Helm chart"| GHCR
Release -->|"tag (vX.Y.Z)"| Documentation
Documentation -->|"Documentation"| Cloudflare