Understanding Amazon Kinesis Data Stream 🚀
Amazon Kinesis Data Streams (KDS) is a powerful, real-time data streaming service that enables the collection and storage of live data such as click streams, IoT device data, metrics, and logs. This data is then sent to the Kinesis Data Stream for further processing.
Key Features of Kinesis Data Streams 📊
- Retains data for up to 365 days, allowing reprocessing if needed. Data cannot be deleted until it expires.
- Provides data ordering guarantees for messages with the same partition key.
- Supports up to 1MB per second of data ingestion per shard.
- Uses the Kinesis Producer Library (KPL) to optimize data ingestion and the Kinesis Client Library (KCL) to enhance data consumption.
Capacity Modes 🏗️
Kinesis offers two capacity modes to suit different use cases:
- Provisioned Mode: You choose the number of shards. Each shard handles 1MB/s in and 2MB/s out. Scaling is manual, and pricing is based on the number of shards per hour.
- On-Demand Mode: Automatically scales based on peak throughput over the last 30 days. No need for manual capacity planning.
By default, the 2MB/s per shard output is shared between all consumer applications. To enable parallel processing for multiple consumers, Enhanced Fan-Out should be used.
Amazon Kinesis Data Firehose 🔥
Kinesis Data Firehose is a fully managed service designed to load streaming data into destinations such as Amazon S3, Redshift, OpenSearch, third-party services like Splunk, or custom HTTP endpoints.
Key Features 🛠️
- Near real-time data delivery with built-in buffering.
- Supports various formats like Parquet and ORC, along with data compression.
- Enables custom transformations using AWS Lambda.
🔥 Difference Between Kinesis Data Streams and Firehose
- Kinesis Data Streams focuses on real-time streaming data collection.
- Firehose is designed for loading and processing stream data into various storage and analytics platforms.
SQS vs SNS vs Kinesis 🔄
Understanding the differences between these AWS messaging services is crucial:
- SQS (Simple Queue Service): A consumer-pull model where messages are deleted after consumption. Ideal for decoupling microservices and handling large-scale message processing. Supports FIFO for ordering.
- SNS (Simple Notification Service): A publish-subscribe model where messages are sent to multiple subscribers (up to 12.5M per topic). Data is not persist after delivery.
- Kinesis: A real-time data streaming service with ordering at the shard level. Supports both standard pull and enhanced fan-out push models. Enables data replay for analytics and ETL.
🔑 Tip: If you provide a partition key in your messages, you can guarantee ordered delivery for a specific data source (e.g., IoT sensors) even when using multiple shards.
Conclusion 🎯
Amazon Kinesis is a game-changer for real-time data streaming and analytics. Whether you need to ingest, process, or load data into analytics platforms, Kinesis Data Streams and Firehose offer flexible and scalable solutions. Choosing between SQS, SNS, and Kinesis depends on whether you need message queuing, pub-sub distribution, or real-time data analytics.
TL;DR 📝
- Kinesis Data Streams is used to collect and process real-time data.
- Kinesis Firehose loads data into S3, Redshift, OpenSearch, and third-party services.
- Provisioned Mode requires shard management; On-Demand Mode scales automatically.
- Enhanced Fan-Out enables multiple consumers to process data in parallel.
- SQS is a message queue, SNS is a pub/sub system, and Kinesis is for real-time data analytics.
- Use partition keys to maintain data order in Kinesis streams.