🎯 Objectif Plutôt que d’avoir une longue chaîne de if dans un modèle dbt, on […]
🧹 How to Remove Tracked Files That Should Be Ignored in Git
When working on a project, it’s common to add files or directories to your .gitignore […]
Understanding __repr__ vs __str__ in Python – What’s the Difference?
If you’re coming from a Java background like me, you’re probably used to overriding toString() […]
🔥 Delta Live Tables (DLT) vs. Pipelines classiques (Delta Tables + Structured Streaming + Batch) dans Databricks
Lorsque tu développes une pipeline de données dans Databricks, tu as deux choix principaux : […]
Understanding Fact Tables and Dimension Tables in a Dimensional Model
In a traditional data warehousing architecture—often guided by the Kimball methodology—data is organized around two […]
Databricks : Job Clusters VS All-Purpose Clusters
In Databricks, clusters are distributed environments used to execute tasks or workloads. There are two […]
To convert the type of column in Apache Spark, you use cast, not convert.
The cast function allows you to change the data type of column in a DataFrame […]
In apache spark the executors accept jobs from the driver or tasks from the driver ?
In Apache Spark, executors accept and execute tasks from the driver. Here’s a breakdown of […]
Apache spark glossary
Slot CPU Core, it is often associated with a CPU core. Each physical core of […]
Redshift LOCK
Overview There are three LOCK mode: AccessExclusiveLock: Acquired primarily during DDL operations, such as ALTER TABLE, DROP, or TRUNCATE. […]