Dagster & Databricks
You can orchestrate Databricks from Dagster in multiple ways depending on your needs, including through Databricks Connect, Dagster Pipes, the Dagster Databricks Component, or Dagster Connections.
Choosing an integration approach
| Approach | How it works | Choose when |
|---|---|---|
| Databricks Connect | Write Spark code in Dagster assets that executes on Databricks compute |
|
| Dagster Pipes | Submit Databricks jobs and stream logs/metadata back to Dagster |
|
| DatabricksAssetBundleComponent | Reads your databricks.yml bundle config and creates Dagster assets from job tasks |
|
| DatabricksWorkspaceComponent | Connects to your workspace, discovers jobs, and exposes them as Dagster assets |
|
| Dagster Databricks Connection (Dagster+ only) | Automatically discover Databricks tables and catalogs as external assets in the Dagster+ UI |
|
About Databricks
Databricks is a unified data analytics platform that simplifies and accelerates the process of building big data and AI solutions. It integrates seamlessly with Apache Spark and offers support for various data sources and formats. Databricks provides powerful tools to create, run, and manage data pipelines, making it easier to handle complex data engineering tasks. Its collaborative and scalable environment is ideal for data engineers, scientists, and analysts who need to process and analyze large datasets efficiently.