inside the DAG file there will be the default py file for orchestration.
you can create the number of DAG files and upload into this folder to create the workflow
if you click on the files, it will lead to airflow monitoring under graph you can see DAG and operator (task) , and under code ,you can see the code for orchestration
if the DAG is yellow in color that means it's not working properly then we have to debug if its in green color its works properly .
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'your_name',
'start_date': datetime(2023, 10, 26),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'dagrun_timeout': timedelta(hours=2),
}
with DAG(
dag_id='my_data_pipeline', # DAG Name
schedule_interval='@daily', # Schedule interval
default_args=default_args,
catchup=False, The
catchup
parameter determines whether Airflow should "catch up" on past scheduled runs that were missed.) as dag:
task1 = BashOperator(
task_id='extract_data', # Task ID
bash_command='echo "Extracting data..."',
)
task2 = BashOperator(
task_id='transform_data', # Task ID
bash_command='echo "Transforming data..."',
)
task1 >> task2 # Define task dependencies,
>>
operator defines the order in which tasks should be executedKey Points:
- DAG parameters control the overall behavior of the workflow.
- Operator parameters define the individual steps within the workflow.
task_id
values must be unique within a DAG.- The
dag_id
must be unique across all dags inside of the airflow environment. - The
schedule_interval
is what tells airflow when to run the dag. - The
dagrun_timeout
is a safety net.
I hope this helps clarify these important Airflow concepts.
Comments
Post a Comment