Tuesday, October 3, 2023

Data ingestion using AWS Step Functions, AWS IoT Events, and AWS IoT Analytics


A key trend we observe with industrial IoT projects is that new industrial equipment come with out-of-the-box cloud connectivity and allow near real-time processing of sensor data. This makes it possible to immediately generate KPIs allowing to monitor those equipment closely and to optimize their performance. However, a lot of industrial IoT data is still stored in legacy systems (e.g. SCADA systems, messaging systems, on-premises databases) with different formats (e.g. CSV, XML, MIME). This raises multiple challenges concerning the real-time management of that sensor data, such as:

  1. Data transformation orchestration: On-premises industrial data comes from different sources each having their own format and their own communication protocol. It is therefore necessary to find a means to orchestrate the management of all that incoming data, with a view to structure it before analysis.
  2. Event driven processing: Batch jobs can introduce latencies in the data management pipeline, which motivates the need for an event-driven architecture to ingest industrial data.
  3. Data enrichment with external sources: Data enrichment is sometimes a mandatory step in order to allow efficient analysis.
  4. Enhanced data analysis: Complex queries are sometimes needed to generate KPIs.
  5. Ease of maintenance: Companies do not always have the time or the internal resources to deploy and manage complex tools.
  6. Real-time processing of datasets: Some metrics needs to be constantly monitored, which implies immediate processing of datasets to take actions.
  7. Data integration with other services or applications: Finally, it will be necessary to automate the output of the analysis results to other services which need to process them in real time.

In this blog post we show how customers can use AWS Step Functions, AWS IoT Analytics, and AWS IoT Events as the basis of a lightweight architecture to address the aforementioned challenges in a scalable way.

Solution overview

In the use case we discuss in this blog post, we assume that a company is receiving sensor data from its industrial sites through CSV files that are stored in an Amazon S3 bucket. The company has to dynamically process those files to generate key performance indicators (KPIs) every five minutes. We assume the sensor data must be enriched with additional information, such as upper and lower threshold limits, before being stored in Amazon DynamoDB. The flow of data will be constantly monitored so that alarms can be raised if there are missing data records.

The AWS CloudFormation template provided with this blog provisions the resources needed to simulate this use case. It includes a built-in data simulator which feeds a Step Functions workflow with CSV files containing sensor data having four fields: 1/ timestamp, 2/ production order id, 3/ asset id, 4/ value. This data is then converted to JSON objects using AWS Lambda functions, which are then forwarded to AWS IoT Analytics to enrich the data (with upper and lower limits) and retrieve it every five minutes by creating a dataset. Finally, this AWS IoT Analytics dataset is sent automatically to AWS IoT Events to be monitored and stored in DynamoDB.

Data preparation

The simulated data that represents the data coming from an industrial site is stored in a CSV file. This data must then be converted into a JSON format in order to be ingested by AWS IoT Analytics. Step Functions offer a flexible service to prepare data. So, instead of building a custom monolithic application, we spread the complexity of the data transformation across three Lambda functions, which are sequenced via a JSON canvas. The first Lambda function will format the CSV file into JSON, the second Lambda function will format the data field and append a timestamp, and the third Lambda function will send the data to AWS IoT Analytics. In addition, we have implemented checks to redirect invalid files to an Amazon S3 bucket if there is an error during the execution of the first two Lambda functions to be troubleshooted and reprocessed later.

Eastlink Cloud Pvt. Ltd.
Tripureshwor, Kathmandu, Nepal



Related Articles

- Advertisment -spot_img

Recent Articles


Popular Articles

%d bloggers like this: