Domain 1 β€” Module 5 of 5 100%
5 of 28 overall
Domain 1: Set Up and Configure an Azure Databricks Environment Free ⏱ ~15 min read

Tables, Views & External Catalogs

Managed tables, external tables, views, materialized views, foreign catalogs, DDL operations, and AI/BI Genie β€” the objects that live inside your Unity Catalog namespace.

Managed vs external tables

Simple explanation

Think of managed tables as company-owned furniture and external tables as rented furniture.

A managed table is like furniture the company bought. When you delete the table, Unity Catalog throws the furniture (data) away too β€” it owns everything.

An external table is like rented furniture. Unity Catalog knows where it is and who can use it, but when you delete the table registration, the actual data stays put β€” it belongs to someone else (your storage account).

FeatureManaged TableExternal Table
Data locationUC managed storageYour ADLS/S3 path
DROP behaviourDeletes metadata AND dataDeletes metadata ONLY
Lifecycle ownerUnity CatalogYou (external storage)
Default formatDelta LakeAny (Delta, Parquet, CSV, JSON)
Best forStandard lakehouse tablesShared data, legacy integration
VACUUM / OPTIMIZEFully supportedSupported for Delta external tables
CostStorage in UC-managed accountStorage in your account
-- Create a managed table (default)
CREATE TABLE prod_sales.curated.daily_revenue (
  sale_date DATE,
  region STRING,
  revenue DECIMAL(12,2)
)
COMMENT 'Daily revenue aggregated by region';

-- Create an external table pointing to existing data
CREATE TABLE prod_sales.raw.partner_transactions
  USING DELTA
  LOCATION 'abfss://raw-data@adlsaccount.dfs.core.windows.net/partner/transactions'
  COMMENT 'Partner transaction files β€” data owned by partner team';

Important: External table paths must be covered by a preconfigured external location in Unity Catalog, backed by a storage credential (typically an Access Connector with managed identity). Without this, the CREATE TABLE statement fails with an access denied error.

Exam decision tree: If you own the data and it’s part of your lakehouse β†’ managed table. If the data is shared, pre-existing, or owned by another team/system β†’ external table.

Views and materialized views

Standard views

A view is a saved SQL query. It doesn’t store data β€” it runs the query each time you access it:

CREATE VIEW prod_sales.reports.top_regions AS
SELECT region, SUM(revenue) AS total_revenue
FROM prod_sales.curated.daily_revenue
GROUP BY region
ORDER BY total_revenue DESC
LIMIT 10;

Materialized views

A materialized view stores the query result physically and refreshes it periodically:

CREATE MATERIALIZED VIEW prod_sales.reports.monthly_summary AS
SELECT
  DATE_TRUNC('month', sale_date) AS month,
  region,
  SUM(revenue) AS monthly_revenue
FROM prod_sales.curated.daily_revenue
GROUP BY 1, 2;
FeatureStandard ViewMaterialized View
Stores data?No β€” runs query each timeYes β€” stores precomputed results
Query speedDepends on underlying tablesFast (precomputed)
Data freshnessAlways currentStale until refreshed
RefreshN/AAutomatic or manual
CostOnly compute at query timeStorage + compute for refresh
Use caseSimple abstractions, security layersDashboards, repeated heavy queries
Supported inAll compute typesLakeflow Spark Declarative Pipelines, SQL warehouses

Mei Lin uses materialized views at Freshmart for the store manager dashboard β€” it queries pre-aggregated monthly sales instead of scanning millions of raw transactions each time.

Exam tip: Materialized views in Declarative Pipelines

Materialized views are first-class objects in Lakeflow Spark Declarative Pipelines. You define them declaratively and the pipeline handles incremental refresh automatically. This is a key exam concept β€” if a question asks about β€œautomatically refreshing precomputed results in a pipeline,” materialized views in Declarative Pipelines are the answer.

DDL operations on tables

The exam tests your knowledge of DDL (Data Definition Language) for both managed and external tables:

Common DDL operations

-- Add a column
ALTER TABLE prod_sales.curated.daily_revenue
  ADD COLUMN discount_pct DECIMAL(5,2);

-- Rename a column
ALTER TABLE prod_sales.curated.daily_revenue
  RENAME COLUMN discount_pct TO discount_percentage;

-- Drop a column
ALTER TABLE prod_sales.curated.daily_revenue
  DROP COLUMN discount_percentage;

-- Change column type (only widening allowed, e.g., INT to BIGINT)
ALTER TABLE prod_sales.curated.daily_revenue
  ALTER COLUMN revenue TYPE DECIMAL(15,2);

-- Add table comment
COMMENT ON TABLE prod_sales.curated.daily_revenue
  IS 'Daily revenue by region β€” refreshed nightly at 3 AM';

-- Drop a managed table (DELETES DATA)
DROP TABLE prod_sales.curated.daily_revenue;

-- Drop an external table (data files remain)
DROP TABLE prod_sales.raw.partner_transactions;

Key exam fact: Schema evolution in Delta Lake supports adding columns and widening types. Narrowing types (BIGINT to INT) or renaming columns requires enabling column mapping.

Schema evolution and column mapping

Delta Lake supports two levels of schema evolution:

OperationDefault BehaviourWith Column Mapping
Add columnβœ… Supportedβœ… Supported
Widen type (INT β†’ BIGINT)βœ… Supportedβœ… Supported
Rename column❌ Not supportedβœ… Supported
Drop column❌ Not supportedβœ… Supported
Reorder columns❌ Not supportedβœ… Supported

Enable column mapping:

ALTER TABLE my_table SET TBLPROPERTIES ('delta.columnMapping.mode' = 'name');

Exam tip: If the question involves renaming or dropping columns, you need column mapping enabled. This is a common trap.

Foreign catalogs

A foreign catalog connects Unity Catalog to external database systems, making their tables visible in the three-level namespace:

-- Step 1: Create a connection to an external system
CREATE CONNECTION sqlserver_erp
  TYPE SQLSERVER
  OPTIONS (
    host 'erp-server.database.windows.net',
    port '1433',
    user SECRET('kv-scope', 'erp-user'),
    password SECRET('kv-scope', 'erp-password')
  );

-- Step 2: Create a foreign catalog using that connection
CREATE FOREIGN CATALOG erp_data
  USING CONNECTION sqlserver_erp;

Once created, Ravi can query the ERP system’s tables as if they were local:

-- Query a foreign catalog table like any other UC table
SELECT * FROM erp_data.dbo.customers WHERE region = 'APAC';

Supported connection types include SQL Server, MySQL, PostgreSQL, Snowflake, BigQuery, and others.

Exam scenario: β€œRavi needs to join DataPulse’s lakehouse data with customer records in an on-premises SQL Server.” β†’ Create a connection + foreign catalog.

AI/BI Genie

AI/BI Genie is a natural-language interface for data exploration. Business users type questions in plain English and Genie generates SQL queries against your lakehouse tables.

As a data engineer, your job is to configure Genie instructions β€” metadata that helps Genie understand your data:

ConfigurationPurposeExample
Table descriptionsTell Genie what each table contains”Daily revenue by region, refreshed nightly”
Column descriptionsExplain non-obvious columns”region_code: ISO 3166-1 alpha-2 country code”
Sample questionsTeach Genie expected query patterns”What was last month’s revenue in APAC?”
Trusted assetsMark verified tables that Genie should preferFlag prod_sales.reports.* as trusted
-- Add descriptions that help Genie
COMMENT ON TABLE prod_sales.curated.daily_revenue
  IS 'Daily revenue aggregated by region. Updated nightly at 3 AM NZST. Use for revenue reporting queries.';

COMMENT ON COLUMN prod_sales.curated.daily_revenue.region
  IS 'Business region: APAC, EMEA, Americas';
Exam tip: Genie is about metadata, not coding

The exam won’t ask you to build Genie from scratch. It tests whether you understand that data engineers enable Genie by writing good descriptions, sample questions, and marking trusted assets. Think of it as documenting your data for AI consumption.

Question

What happens when you DROP a managed table vs. an external table?

Click or press Enter to reveal answer

Answer

Managed table: both metadata AND underlying data files are deleted. External table: only the metadata in Unity Catalog is removed β€” the data files in your storage location remain untouched.

Click to flip back

Question

What is a materialized view and where is it used?

Click or press Enter to reveal answer

Answer

A materialized view stores precomputed query results physically (unlike a standard view, which re-runs the query each time). Used in Lakeflow Spark Declarative Pipelines and SQL warehouses for dashboards and repeated heavy queries. Must be refreshed to stay current.

Click to flip back

Question

What is a foreign catalog in Unity Catalog?

Click or press Enter to reveal answer

Answer

A foreign catalog connects UC to an external database system (SQL Server, MySQL, PostgreSQL, Snowflake, etc.) via a CONNECTION object. External tables appear in the UC namespace and can be queried like local tables.

Click to flip back

Question

How do data engineers configure AI/BI Genie for data discovery?

Click or press Enter to reveal answer

Answer

Write descriptive COMMENT ON TABLE/COLUMN statements, provide sample questions, and mark trusted assets. Genie uses this metadata to generate accurate SQL from natural-language questions.

Click to flip back

Knowledge check

Knowledge Check

Dr. Sarah Okafor is migrating Athena Group's legacy data warehouse to the lakehouse. She wants to register existing Parquet files in ADLS Gen2 without moving them. If the team later removes the table from Unity Catalog, the files must remain intact. Which table type should she use?

Knowledge Check

Mei Lin wants to improve dashboard performance at Freshmart. The store manager dashboard runs a heavy aggregation query that scans 200 million rows. The same query runs 50 times per day with identical results (data refreshes nightly). What should she create?

Knowledge Check

TomΓ‘s needs NovaPay's fraud detection system to join lakehouse data with customer records stored in an Azure SQL Database. The SQL Database is managed by a separate team and cannot be migrated. How should TomΓ‘s make the SQL Database tables queryable in Unity Catalog?


Next up: Securing Unity Catalog: Who Gets What β€” privileges, table and column-level access control, and row-level security.