Session 7 data flow part 2

Data Flow Name: `df_transform_hospital_admissions`

Pipeline Steps:

Source (HospitalAdmissionSource):
- Pulls data from ds_raw_hospital_admission.
SelectReqdFields:
- Renames or selects specific fields: country, indicator, etc.
LookupCountry:
- Performs a lookup using CountrySource (likely from ds_country_lookup) to enrich the data.
SelectReqdFields2:
- Refines the result further with a new set of selected or renamed fields.
Split into Weekly and Daily:
- A Conditional Split divides the data into two branches:
  - Weekly (9 columns total)
  - Daily (filtering on indicator column, likely conditional logic)

Right Panel:

Shows general properties.
Name: df_transform_hospital_admissions.
Description: Empty.

Bottom Panel (Data preview):

Currently loading: “Fetching data…”.
Status: Data flow debug is enabled (green).
Operation counts like INSERT, UPDATE, DELETE, etc., are N/A, meaning this is likely a preview run or the data hasn’t loaded yet.

🔁 Complete Transformation Breakdown

🟦 1. Source (ds_raw_hospital_admission)

What it does:
- Reads raw hospital admission data from a source dataset (e.g., CSV, database).
- Fields: country, reported_date, hospital_occupancy_count, icu_occupancy_count, etc.

🟨 2. fields2 (Conditional Split)

What it does:
- Splits incoming data into two branches: Weekly and Daily.
- Based on a condition, likely using a flag or pattern in the data like:
```
sql
reported_granularity == 'weekly' => Weekly branch  
reported_granularity == 'daily' => Daily branch
```
Why:
- Enables separate transformation logic for weekly and daily reporting formats.

🟩 3. Weekly Branch

🔷 a. JoinWithDate (Join)

What it does:
- Joins raw data with a Date Dimension (likely AggDimDate).
- Join keys: reported_date from source and date from the dimension.
Why:
- Enriches records with derived values like year_week, week_start_date, etc.

🔷 b. PivotWeekly (Pivot)

What it does:
- Pivots indicators (like hospital and ICU occupancy counts) into separate columns.
Group by:
- Likely year_week, country
Values:
- Transforms rows into a wider format with columns like:
  - hospital_occupancy_count
  - icu_occupancy_count
Why:
- Aggregates and reshapes data for weekly reporting.

🔷 c. SortWeekly (Sort)

What it does:
- Sorts the data by reported_year_week and country
Why:
- Ensures data is consistently ordered before writing to sink.

🔷 d. SelectWeekly (Select)

What it does:
- Keeps only required columns and renames as needed.
- Final schema might include:
  - country, reported_year_week, hospital_occupancy_count, icu_occupancy_count
Why:
- Cleans and prepares data for export.

🔷 e. WeeklySink (Sink)

What it does:
- Writes the transformed weekly data to a target dataset.
- Sink: ds_processed_hospital_admission_weekly
Why:
- Makes weekly data available for reporting/analytics.

🟩 4. Daily Branch

🔷 a. PivotDaily (Pivot)

What it does:
- Similar to PivotWeekly, but operates on daily granularity.
Group by:
- reported_date, country
Why:
- Converts long-format daily data into a wide format for daily analysis.

🔷 b. SortDaily (Sort)

What it does:
- Sorts by reported_date and country
Why:
- Ensures orderliness and data consistency in final output.

🔷 c. SelectDaily (Select)

What it does:
- Selects relevant fields like:
  - country, reported_date, hospital_occupancy_count, icu_occupancy_count, population, source
Why:
- Aligns with target schema and ensures only meaningful data is exported.

🔷 d. DailySink (Sink)

What it does:
- Writes the final daily data to ds_processed_hospital_admission_daily
Why:

Makes daily data available for downstream use (dashboards, exports).

Transformation	Type	Description
`ds_raw_hospital_admission`	Source	Loads raw hospital admission data
`fields2`	Conditional Split	Splits data into Daily and Weekly pipelines
`JoinWithDate`	Join	Adds weekly context by joining with date dimension
`PivotWeekly`	Pivot	Converts indicator rows into columns (weekly)
`SortWeekly`	Sort	Sorts by week and country
`SelectWeekly`	Select	Keeps/renames columns for export
`WeeklySink`	Sink	Outputs to weekly processed dataset
`PivotDaily`	Pivot	Converts indicator rows into columns (daily)
`SortDaily`	Sort	Sorts by date and country
`SelectDaily`	Select	Keeps/renames columns for export
`DailySink`	Sink	Outputs to daily processed dataset

Keerthana Blogs

Search This Blog

Session 7 data flow part 2

Data Flow Name: `df_transform_hospital_admissions`

Pipeline Steps:

Right Panel:

Bottom Panel (Data preview):

🔁 Complete Transformation Breakdown

🟦 1. Source (ds_raw_hospital_admission)

🟨 2. fields2 (Conditional Split)

🟩 3. Weekly Branch

🔷 a. JoinWithDate (Join)

🔷 b. PivotWeekly (Pivot)

🔷 c. SortWeekly (Sort)

🔷 d. SelectWeekly (Select)

🔷 e. WeeklySink (Sink)

🟩 4. Daily Branch

🔷 a. PivotDaily (Pivot)

🔷 b. SortDaily (Sort)

🔷 c. SelectDaily (Select)

🔷 d. DailySink (Sink)

Comments

Post a Comment

Popular posts from this blog

session 19 Git Repository

Session 18 monitering and logging - Azure Monitor , Log analytics , and job notification

ingestion of data from the database

Keerthana Blogs

Session 7 data flow part 2

Data Flow Name: df_transform_hospital_admissions

Pipeline Steps:

Right Panel:

Bottom Panel (Data preview):

🔁 Complete Transformation Breakdown

🟦 1. Source (ds_raw_hospital_admission)

🟨 2. fields2 (Conditional Split)

🟩 3. Weekly Branch

🔷 a. JoinWithDate (Join)

🔷 b. PivotWeekly (Pivot)

🔷 c. SortWeekly (Sort)

🔷 d. SelectWeekly (Select)

🔷 e. WeeklySink (Sink)

🟩 4. Daily Branch

🔷 a. PivotDaily (Pivot)

🔷 b. SortDaily (Sort)

🔷 c. SelectDaily (Select)

🔷 d. DailySink (Sink)

Comments

Post a Comment

Popular posts from this blog

session 19 Git Repository

Session 18 monitering and logging - Azure Monitor , Log analytics , and job notification

ingestion of data from the database

Data Flow Name: `df_transform_hospital_admissions`