Building a medallion style delta lake with Databricks

posted: 2023-02-05 11:30:46 perma-link, RSS comments feed

This is the simplest recipe for getting started with a medallion style delta lake with Databricks.

The steps/commands can be summarized as:

1. Create a new Databricks workspace and launch a new cluster.

2. Create a new Delta lake by using the spark.sql('''CREATE TABLE ... USING DELTA''') command.

3. Use the spark.sql('''LOAD DATA ... INTO TABLE ...''') command to load data into the newly created Delta lake.

4. Use the spark.sql('''ALTER TABLE ... ADD COLUMNS ...''') command to add new columns to the Delta lake table
as needed.

5. Use the spark.sql('''MERGE INTO ...''') command to merge new data into the Delta lake table.

6. Use the spark.sql('''OPTIMIZE ...''') command to optimize the Delta lake table for performance.

7. Use the spark.sql('''VACUUM ...''') command to remove old versions of the data from the Delta lake table.

8. Use the spark.sql('''DESCRIBE HISTORY ...''') command to view the history of the Delta lake table.

9. Use the spark.sql('''DESCRIBE DETAIL ...''') command to view the details of the Delta lake table.

10. Use the spark.sql('''SHOW PARTITIONS ...''') command to view the partitions of the Delta lake table.

By following these steps, you can create a medallion style delta lake with Databricks that can be easily maintained, optimized, and queried for data analysis.

HackerMoJo.com

Building a medallion style delta lake with Databricks

Comments

Post a comment