inside the DAG file there will be the default py file for orchestration.
you can create the number of DAG files and upload into this folder to create the workflow
if you click on the files, it will lead to airflow monitoring under graph you can see DAG and operator (task) , and under code ,you can see the code for orchestration
if the DAG is yellow in color that means it's not working properly then we have to debug if its in green color its works properly .
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'your_name',
'start_date': datetime(2023, 10, 26),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'dagrun_timeout': timedelta(hours=2),
}
with DAG(
dag_id='my_data_pipeline', # DAG Name
schedule_interval='@daily', # Schedule interval
default_args=default_args,
catchup=False, The
catchup parameter determines whether Airflow should "catch up" on past scheduled runs that were missed.) as dag:
task1 = BashOperator(
task_id='extract_data', # Task ID
bash_command='echo "Extracting data..."',
)
task2 = BashOperator(
task_id='transform_data', # Task ID
bash_command='echo "Transforming data..."',
)
task1 >> task2 # Define task dependencies,
>> operator defines the order in which tasks should be executedKey Points:
- DAG parameters control the overall behavior of the workflow.
- Operator parameters define the individual steps within the workflow.
task_idvalues must be unique within a DAG.- The
dag_idmust be unique across all dags inside of the airflow environment. - The
schedule_intervalis what tells airflow when to run the dag. - The
dagrun_timeoutis a safety net.
I hope this helps clarify these important Airflow concepts.
Comments
Post a Comment