Domain 4 β€” Module 4 of 8 50%
23 of 27 overall
Domain 4: Analytics on Azure Free ⏱ ~10 min read

Batch vs Streaming: Two Speeds of Data

Some data arrives in scheduled batches. Other data flows in continuously. Understanding the difference is key to designing the right analytics solution.

Two ways data arrives

Simple explanation

Batch data is like the morning newspaper. Streaming data is like a live news ticker.

The newspaper arrives once a day with yesterday’s news β€” that’s batch. You read it over breakfast. The news ticker scrolls constantly with updates as they happen β€” that’s streaming. You glance at it throughout the day.

Both deliver news. The difference is timing: batch comes in chunks on a schedule; streaming flows continuously in real time.

Batch processing

Batch processing collects data over time and processes it all at once on a schedule.

Priya’s FreshMart example: Sales data from 50 stores is collected throughout the day. At 2 AM, a pipeline extracts the day’s data, transforms it, and loads it into the data warehouse. By morning, yesterday’s dashboard is ready.

Characteristics:

  • Data processed in scheduled chunks (hourly, daily, weekly)
  • Higher latency (minutes to hours between event and insight)
  • Can handle very large volumes efficiently
  • Simpler to implement and debug
  • Lower cost for high-volume processing

Common batch scenarios:

  • Nightly sales reports
  • Monthly billing calculations
  • Weekly inventory reconciliation
  • Historical trend analysis

Stream processing

Stream processing handles data as it arrives β€” event by event, in real time.

Tom’s Pacific Freight example: GPS data from 200 delivery trucks streams in every 10 seconds. A real-time engine processes each update immediately β€” tracking live positions, detecting delays, and alerting dispatchers.

Characteristics:

  • Data processed event-by-event as it arrives
  • Very low latency (milliseconds to seconds)
  • Handles continuous data flows (IoT, logs, social media)
  • More complex to build and maintain
  • Higher cost per event than batch

Common streaming scenarios:

  • Live vehicle tracking
  • Fraud detection (flag suspicious transactions instantly)
  • Social media sentiment monitoring
  • IoT sensor alerts (temperature exceeds threshold)
  • Real-time dashboards (live website traffic)
Batch vs stream processing
FeatureBatch ProcessingStream Processing
Data arrivalCollected over time, processed togetherContinuous flow, processed immediately
LatencyMinutes to hoursMilliseconds to seconds
Volume per runLarge chunksIndividual events
ComplexitySimplerMore complex (ordering, exactly-once)
CostLower per unit of dataHigher per event
Azure servicesData Factory, Fabric pipelines, DatabricksStream Analytics, Fabric Real-Time, Event Hubs
ExampleNightly sales reportLive GPS tracking

Lambda and Kappa architectures

Some systems need both batch and streaming. Two common patterns:

  • Lambda architecture: Two parallel paths β€” a batch layer for historical accuracy and a speed layer for real-time results. Merge the outputs for queries. More complex but handles both needs.
  • Kappa architecture: Single streaming pipeline that handles everything. Simplifies the architecture but requires reprocessing capability.

For DP-900, just know these exist β€” the exam tests concepts, not architecture patterns in detail.

Exam tip: batch vs streaming recognition

The exam describes a scenario and asks which processing type is appropriate:

  • β€œReport on yesterday’s sales” β†’ Batch
  • β€œAlert when a truck deviates from its route” β†’ Streaming
  • β€œProcess data every night at 2 AM” β†’ Batch
  • β€œShow live website visitor count” β†’ Streaming
  • β€œMonthly billing calculation” β†’ Batch
  • β€œDetect credit card fraud in real time” β†’ Streaming

Flashcards

Question

What is batch processing?

Click or press Enter to reveal answer

Answer

Processing data in scheduled chunks (hourly, daily, weekly). Data is collected over time and processed all at once. Higher latency (hours) but simpler and cheaper for large volumes.

Click to flip back

Question

What is stream processing?

Click or press Enter to reveal answer

Answer

Processing data continuously as individual events arrive, in real time. Very low latency (milliseconds to seconds) but more complex and costly per event.

Click to flip back

Question

Give one scenario where batch is better and one where streaming is better.

Click or press Enter to reveal answer

Answer

Batch: Nightly sales report aggregating millions of transactions (no urgency, high volume). Streaming: Fraud detection flagging suspicious credit card transactions immediately (time-critical, per-event).

Click to flip back

Knowledge check

Knowledge Check

FreshMart wants to alert store managers immediately when a product's stock drops below the reorder threshold. Inventory sensors update every 30 seconds. Which processing approach?

Knowledge Check

Pacific Freight generates a monthly performance report comparing delivery times across all 200 drivers over the past 30 days. Which processing approach?

Next up: Real-Time Analytics on Azure β€” which Azure services handle streaming data?