Skip to main content

Posts

Session 7 data flow part 2

  Data Flow Name : df_transform_hospital_admissions Pipeline Steps : Source (HospitalAdmissionSource) : Pulls data from ds_raw_hospital_admission . SelectReqdFields : Renames or selects specific fields: country , indicator , etc. LookupCountry : Performs a lookup using CountrySource (likely from ds_country_lookup ) to enrich the data. SelectReqdFields2 : Refines the result further with a new set of selected or renamed fields. Split into Weekly and Daily : A Conditional Split divides the data into two branches: Weekly (9 columns total) Daily (filtering on indicator column, likely conditional logic) Right Panel : Shows general properties. Name: df_transform_hospital_admissions . Description: Empty. Bottom Panel (Data preview) : Currently loading: “Fetching data…”. Status: Data flow debug is enabled (green). Operation counts like INSERT , UPDATE , DELETE , etc., are N/A , meaning this is likely a preview r...

Transformation - section 6 - data flow

  Feature from Slide Explanation ✅ Code-free data transformations Data Flows in ADF allow you to build transformations using a drag-and-drop visual interface , with no need for writing Spark or SQL code. ✅ Executed on Data Factory-managed Databricks Spark clusters Internally, ADF uses Azure Integration Runtimes backed by Apache Spark clusters , managed by ADF, not Databricks itself . While it's similar in concept, this is not the same as your own Databricks workspace . ✅ Benefits from ADF scheduling and monitoring Data Flows are fully integrated into ADF pipelines, so you get all the orchestration, parameterization, logging, and alerting features of ADF natively. ⚠️ Important Clarification Although it says "executed on Data Factory managed Databricks Spark clusters," this does not mean you're using your own Azure Databricks workspace . Rather: ADF Data Flows run on ADF-managed Spark clusters. Azure Databricks notebooks (which you trigger via an "Exe...