ADF - ingestion from website and from the storage blog

in ECDC website the reports are providing in the weekly instead of daily so for the easy access authir gave the csv files in the git hub repository.

Always review the data before ingestion to understand its structure and contents.

test the connection and click on create.

now the linked services created.

create one new dataset .

Name: ds_cases_deaths_raw_csv_http - This is the unique identifier for this dataset within ADF. The naming convention suggests it's for raw CSV data related to cases and deaths, retrieved via HTTP.
Linked service: ls_http_opendata_ecdc_europa_eu - This is the connection information that tells ADF where to find the data. In this case, it's an HTTP linked service, pointing to a public data source from ECDC (European Centre for Disease Prevention and Control). The pencil icon next to it indicates it can be edited.
Relative URL: covid19/nationalcasedeath/csv/covid19/raw/main/ecdc_data/cases_deaths.csv - This is the specific path to the CSV file relative to the base URL defined in the linked service. It points to a file named cases_deaths.csv within a nested folder structure related to COVID-19 data. This suggests that the data is publicly available COVID-19 statistics.
First row as header: This checkbox is selected (indicated by the checkmark). This means that ADF will treat the first row of the cases_deaths.csv file as column headers, not as data. This is crucial for correct schema inference and data mapping.
Import schema:
- From connection/store: This option is selected. ADF will try to infer the schema (column names and data types) directly from the CSV file based on the first row (headers) and data samples. This is common for initial setup.
- From sample file: (Not selected) - You could provide a separate sample file to infer the schema from.
- None: (Not selected) - You would manually define the schema.

In essence, this dataset defines how to access a specific CSV file containing COVID-19 cases and deaths data, located on an HTTP server managed by ECDC, and how to interpret its structure (first row as headers, schema inferred).

now create another dataset ..

then click ok dataset is created .

create the copy activity copying the dataset from the http server to sink dataset.

🔹 Parameters vs Variables in ADF

Feature	Parameters	Variables
Scope	Pipeline level	Pipeline level
Mutability	Immutable (can't change after set)	Mutable (can change during pipeline)
Use Cases	Input values for pipeline	Temporary storage during pipeline

🔹 Using Parameters

1. Define Parameters

Go to the pipeline → Parameters tab → Click + New to add a parameter.

plaintext

Name: myParam
Type: String
Default value: optional

2. Pass Parameters to Activities

For example, in a Copy Data activity:

Go to the Source or Sink.
In the dynamic content box (click the "Add dynamic content" link), you can use:


@pipeline().parameters.myParam

You can also pass parameters to datasets or linked services.

3. Pass Parameters When Triggering a Pipeline

When running a pipeline manually or via another pipeline (using Execute Pipeline), you can provide parameter values.

🔹 Using Variables

1. Define Variables

In the pipeline → Variables tab → Click + New.

plaintext

Name: myVar
Type: String / Boolean / Array

2. Set Variable (Set Variable Activity)

Drag Set Variable activity into your pipeline.
In the settings:
- Variable name: myVar
- Value: Use dynamic content (e.g.,)


'Hello World'
@concat('Folder/', pipeline().parameters.fileName)

3. Modify Variable (Append Variable Activity)

Used only for Array variables to add elements during the pipeline.

🔹 Using Variables and Parameters in Activities

Here are a few examples of dynamic usage:

📌 In Copy Activity Source/Sink path:

adf

@concat('input/', pipeline().parameters.fileName)

📌 In If Condition Activity:

adf

@equals(variables('myVar'), 'expectedValue')

📌 In Stored Procedure Activity:

Pass parameter to SP:

adf

@variables('sqlParam')

🔹 Common Use Case

Suppose you want to pass a file name as a parameter and use it dynamically in a copy activity:

1. Define a parameter: `fileName`

2. In source dataset parameter:

adf

@pipeline().parameters.fileName

3. Set a variable based on this:

adf

@concat('processed/', pipeline().parameters.fileName)

✅ Summary

Parameters: Set once, used for configuration and input.
Variables: Used to store intermediate values during pipeline execution.
You can use expressions like @concat, @pipeline(), @variables(), etc., in most ADF activity fields via Dynamic Content.

Lookup activity, and for each loop activity

[Lookup File List]

↓

[ForEach File]

↓

[Copy / Databricks Activity]

For example, if we have 4 files that we need to ingest, we can use a lookup activity to consume all the files at once and iterate using a for each loop activity

Here we are giving the output from the look-up activity as input for each activity.

In order to set the variables in inside the for loop we are giving set for loop activity .

in that set the variable as output of the source URL

Now we are parameterizing the hardcoded URL in the linked service.

and then by using the copy activity we are ingesting the data .

Metadata-driven Architecture

there are 4 source URL we need to ingest using one pipeline using one trigger .

within the for each activity the copy activity was there . and attached the trigger to that pipeline.

Keerthana Blogs

Search This Blog