Skip to main content

Posts

Hands on project for ADF

Azure Pipeline Creation and Configuration Steps To log in to the Azure portal, a Microsoft account is required. After creating the account, sign in to the Azure portal and proceed with the following steps to build the data pipeline. 1. Create Resource Group and Storage Account Create a Resource Group in Azure. Under the resource group, create the required resources for the pipeline. Azure Data Lake Storage (ADLS Gen2) Create a Storage Account . Enable Hierarchical Namespace to convert it into Data Lake Storage Gen2 . Inside the storage account: Create a container (e.g., blog-container ). Organize data using folders/subfolders (can be created dynamically or manually using directory structure). Storage Structure Example Create a storage account named sdmm : sdmm gold processing sales silver sd mm bronze sd mm 2. Azure Data Factory (ADF) Setup Create an Azure Data Factory instance. Go to Managed Identities and enabl...

Session 7 data flow part 2

  Data Flow Name : df_transform_hospital_admissions Pipeline Steps : Source (HospitalAdmissionSource) : Pulls data from ds_raw_hospital_admission . SelectReqdFields : Renames or selects specific fields: country , indicator , etc. LookupCountry : Performs a lookup using CountrySource (likely from ds_country_lookup ) to enrich the data. SelectReqdFields2 : Refines the result further with a new set of selected or renamed fields. Split into Weekly and Daily : A Conditional Split divides the data into two branches: Weekly (9 columns total) Daily (filtering on indicator column, likely conditional logic) Right Panel : Shows general properties. Name: df_transform_hospital_admissions . Description: Empty. Bottom Panel (Data preview) : Currently loading: “Fetching data…”. Status: Data flow debug is enabled (green). Operation counts like INSERT , UPDATE , DELETE , etc., are N/A , meaning this is likely a preview r...

Transformation - section 6 - data flow

  Feature from Slide Explanation ✅ Code-free data transformations Data Flows in ADF allow you to build transformations using a drag-and-drop visual interface , with no need for writing Spark or SQL code. ✅ Executed on Data Factory-managed Databricks Spark clusters Internally, ADF uses Azure Integration Runtimes backed by Apache Spark clusters , managed by ADF, not Databricks itself . While it's similar in concept, this is not the same as your own Databricks workspace . ✅ Benefits from ADF scheduling and monitoring Data Flows are fully integrated into ADF pipelines, so you get all the orchestration, parameterization, logging, and alerting features of ADF natively. ⚠️ Important Clarification Although it says "executed on Data Factory managed Databricks Spark clusters," this does not mean you're using your own Azure Databricks workspace . Rather: ADF Data Flows run on ADF-managed Spark clusters. Azure Databricks notebooks (which you trigger via an "Exe...