Domain 2 β€” Module 5 of 6 83%
12 of 30 overall
Domain 2: Design Data Storage Solutions Free ⏱ ~18 min read

Blob, Data Lake & Azure Files

Blob Storage, Azure Data Lake Storage, and Azure Files β€” choose the right unstructured storage service based on access patterns, performance needs, and cost constraints.

Choosing unstructured storage

Simple explanation

Blob Storage is a filing cabinet. Data Lake is a warehouse. Azure Files is a shared network drive.

All three store unstructured data, but they serve different purposes: Blob for application data (images, documents, backups), Data Lake for big data analytics (Hadoop, Spark, Synapse), Azure Files for lift-and-shift file shares (SMB/NFS β€” replaces on-prem file servers).

Service comparison

Blob Storage vs Data Lake Storage vs Azure Files
FactorBlob StorageData Lake Storage Gen2Azure Files
NamespaceFlat (container/blob)Hierarchical (directories/files)Hierarchical (shares/directories/files)
ProtocolREST API, SDKsREST API, ABFS driver (Hadoop)SMB 3.0, NFS 4.1, REST API
Access tiersHot, Cool, Cold, ArchiveHot, Cool, Cold, ArchiveHot, Cool (Transaction Optimised, Premium)
AnalyticsBasic β€” needs external computeOptimised β€” native Synapse/Spark/Databricks integrationNot designed for analytics
POSIX ACLsNoYes β€” fine-grained directory/file-level permissionsYes (NFS shares)
Windows mappingNo β€” API access onlyNo β€” API access onlyYes β€” map as drive letter (SMB)
Best forApp data, media, backups, static websitesBig data analytics, data lake patternsLift-and-shift file shares, shared config

πŸ—οΈ Priya’s storage architecture:

  • Blob Storage: Application documents, user uploads, backup archives
  • ADLS Gen2: Data lake for analytics β€” raw data β†’ curated data β†’ reporting (medallion architecture)
  • Azure Files: Migrated 15 on-prem file shares (SMB) β€” mapped as network drives for Windows users
Exam tip: ADLS Gen2 IS Blob Storage with hierarchical namespace

ADLS Gen2 is not a separate service β€” it’s a storage account with the hierarchical namespace feature enabled. This means you get all Blob Storage features (tiers, lifecycle management, redundancy) PLUS directory-level operations and POSIX ACLs. If the scenario needs analytics AND Blob features, recommend ADLS Gen2.

Storage redundancy

Azure Storage Redundancy Options
OptionCopiesRegion ScopeDurabilityBest For
LRS3 copies in one data centreSingle region, single zone11 nines (99.999999999%)Dev/test, non-critical data
ZRS3 copies across 3 availability zonesSingle region, three zones12 ninesProduction β€” survives data centre failure
GRS6 copies: 3 local (LRS) + 3 in paired region (LRS)Two regions16 ninesDR β€” survives regional disaster
GZRS6 copies: 3 across zones (ZRS) + 3 in paired region (LRS)Two regions, primary zone-redundant16 ninesMaximum durability β€” zone + region protection
RA-GRS/RA-GZRSSame as GRS/GZRS + read access to secondaryTwo regions, secondary readable16 ninesRead offloading + DR readiness

🏦 Elena’s redundancy choice: FinSecure Bank uses GZRS for all production storage β€” survives both a single data centre failure (zone redundancy) and a regional disaster (geo-redundancy). Customer-facing reports are served from the RA-GZRS secondary endpoint for read offloading (acceptable for reports that tolerate replication lag β€” reads from secondary are eventually consistent).

Access tiers and lifecycle management

TierStorage CostAccess CostMin RetentionBest For
HotHighestLowestNoneFrequently accessed data
CoolLowerHigher30 daysInfrequent access (monthly)
ColdEven lowerEven higher90 daysRare access (quarterly)
ArchiveLowestHighest (rehydration delay)180 daysCompliance archive, rarely if ever accessed

Lifecycle management rules

Automate tier transitions to optimise cost:

Rule: "age-based-tiering"
- Move to Cool after 30 days without access
- Move to Cold after 90 days
- Move to Archive after 180 days
- Delete after 2,555 days (7 years β€” compliance)
Well-Architected Framework connection

Cost Optimisation: Storage is one of the easiest places to save money. Lifecycle management rules can reduce storage costs by 50-80% by automatically moving data to cheaper tiers.

Reliability: Choose redundancy based on RPO requirements. GRS provides ~15 minutes RPO for regional failover. ZRS provides zone-level HA with zero RPO within the region.

Security: Immutable blobs (WORM storage) prevent deletion or modification β€” required for SEC 17a-4, FINRA, and similar regulations.

Data protection features

FeatureWhat It DoesUse Case
Soft deleteRecovers deleted blobs/containers within retention periodAccidental deletion recovery
Blob versioningKeeps previous versions of blobs automaticallyTrack changes, recover previous versions
Immutable storage (WORM)Prevents modification or deletion for a set periodCompliance: SEC 17a-4, FINRA, legal hold
Point-in-time restoreRestores block blobs to a previous stateRecover from corruption or accidental overwrite

Knowledge check

Question

What makes ADLS Gen2 different from regular Blob Storage?

Click or press Enter to reveal answer

Answer

ADLS Gen2 is Blob Storage with the hierarchical namespace feature enabled. This adds: directory-level operations (rename/delete directories atomically), POSIX ACLs (fine-grained permissions), and native integration with analytics tools (Synapse, Spark, Databricks). All Blob features (tiers, lifecycle, redundancy) still work.

Click to flip back

Question

What's the difference between GRS and GZRS?

Click or press Enter to reveal answer

Answer

GRS: 3 LRS copies locally + 3 LRS copies in the paired region. GZRS: 3 ZRS copies across zones locally + 3 LRS copies in the paired region. GZRS adds zone redundancy in the primary region β€” survives a data centre failure locally AND a regional disaster.

Click to flip back

Question

When should you recommend Azure Files over Blob Storage?

Click or press Enter to reveal answer

Answer

When the application needs SMB or NFS file share access β€” typically lift-and-shift of on-prem file servers. Windows apps that map network drives (Z: drive) need Azure Files. If it's application data accessed via REST API, Blob Storage is simpler and cheaper.

Click to flip back

Knowledge Check

πŸ—οΈ GlobalTech is migrating their data analytics platform. They need hierarchical directory structure, POSIX ACLs for team-level permissions, and native Spark/Synapse integration. Data also needs lifecycle tiering from Hot to Archive. Which service should Priya recommend?

Knowledge Check

🏦 Elena must store financial audit logs that cannot be modified or deleted for 7 years (SEC 17a-4 compliance). The logs are written once and rarely read. Which storage design should she recommend?


Next up: Storage is designed β€” now let’s connect the data together β€” Data Integration & Analytics.