Skip to main content

sub queries explaination

 1. Scalar Subqueries:

  • Definition: A scalar subquery returns a single value (one row, one column).

  • Usage: They can be used anywhere a single value is expected, such as in WHERE clauses, SELECT lists, or HAVING clauses.

  • Example:

    SQL
    SELECT employee_id, first_name, last_name, salary
    FROM employees
    WHERE salary > (SELECT AVG(salary) FROM employees);
    
    • Explanation: The inner subquery (SELECT AVG(salary) FROM employees) calculates the average salary, which is a single value. The outer query then compares each employee's salary to this average.

2. Multiple-Row Subqueries:

  • Definition: These subqueries return multiple rows (typically one column).

  • Usage: They are used with operators like IN, NOT IN, ANY, ALL.

  • Examples:

    • Using IN:

      SQL
      SELECT employee_id, first_name, last_name
      FROM employees
      WHERE department_id IN (SELECT department_id FROM departments WHERE location = 'New York');
      
      • Explanation: The subquery returns a list of department_ids located in New York. The outer query then selects employees whose department_id is in that list.
    • Using ANY (or SOME):

      SQL
      SELECT employee_id, first_name, last_name, salary
      FROM employees
      WHERE salary > ANY (SELECT salary FROM employees WHERE department_id = 30);
      
      • Explanation: This selects employees whose salary is greater than at least one salary in department 30.
    • Using ALL:

      SQL
      SELECT employee_id, first_name, last_name, salary
      FROM employees
      WHERE salary > ALL (SELECT salary FROM employees WHERE department_id = 30);
      
      • Explanation: This selects employees whose salary is greater than every salary in department 30.

3. Multiple-Column Subqueries:

  • Definition: These subqueries return multiple columns and multiple rows.

  • Usage: They are often used for comparing multiple columns at once.

  • Example:

    SQL
    SELECT employee_id, first_name, last_name
    FROM employees
    WHERE (department_id, salary) IN (SELECT department_id, MAX(salary) FROM employees GROUP BY department_id);
    
    • Explanation: This query finds employees who have the maximum salary within their respective departments. The subquery returns pairs of department_id and MAX(salary). The outer query then finds employees whose (department_id, salary) pair matches one of those returned by the subquery.

4. Correlated Subqueries:

  • Definition: A correlated subquery depends on the outer query for its values. The inner query is executed once for each row processed by the outer query.

  • Usage: They are used when you need to compare values between the outer and inner queries.

  • Example:

    SQL
    SELECT employee_id, first_name, last_name, salary
    FROM employees e1
    WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e2.department_id = e1.department_id);
    
    • Explanation: The subquery calculates the average salary for each employee's department. The e2.department_id = e1.department_id clause links the inner and outer queries, making it a correlated subquery.

5. Subqueries in the FROM Clause (Inline Views or Derived Tables):

  • Definition: The result of the subquery is treated as a temporary table (an inline view) within the FROM clause.

  • Usage: They are used to simplify complex queries by pre-processing data.

  • Example:

    SQL
    SELECT dept_name, avg_salary
    FROM (SELECT d.department_name AS dept_name, AVG(e.salary) AS avg_salary
          FROM employees e, departments d
          WHERE e.department_id = d.department_id
          GROUP BY d.department_name)
    WHERE avg_salary > 60000;
    
    • Explanation: The inner subquery calculates the average salary for each department and assigns it the alias avg_salary. the outer query then filters these results.

6. EXISTS and NOT EXISTS Subqueries:

  • Definition: These subqueries check for the existence (or absence) of rows that satisfy a given condition.

  • Usage: They are used to test for the presence of related data.

  • Example:

    SQL
    SELECT department_name
    FROM departments d
    WHERE EXISTS (SELECT 1 FROM employees e WHERE e.department_id = d.department_id);
    
    • Explanation: This query finds departments that have at least one employee. The EXISTS subquery returns true if there is at least one employee in the department.

Comments

Popular posts from this blog

session 19 Git Repository

  🔁 Steps to Create a Branch in Databricks, Pull from Git, and Merge into a Collaborative Branch Create a New Branch in Databricks: Go to the Repos tab in your workspace. Navigate to the Git-linked repo. Click the Git icon (or three dots ⋮) and choose "Create Branch." Give your branch a name (e.g., feature-xyz ) and confirm. Pull the Latest Changes from Git: With your new branch selected, click the Git icon again. Select “Pull” to bring the latest updates from the remote repository into your local Databricks environment. Make Changes & Commit: Edit notebooks or files as needed in your branch. Use the "Commit & Push" option to push changes to the remote repo. Merge into the Collaborative Branch: Switch to the collaborative branch (e.g., dev or main ) in Git or from the Databricks UI. Click "Pull & Merge" . Choose the branch you want to merge into the collaborative branch. Review the c...

Session 18 monitering and logging - Azure Monitor , Log analytics , and job notification

 After developing the code, we deploy it into the production environment. To monitor and logging the jobs run in the real time systems in azure  we have scheduled the jobs under the workflow , we haven't created any monitoring or any matrics . After a few times, the job failed, but we don't know because we haven't set up any monitoring, and every time we can't navigate to workspace-> workflows, under runs to see to check whether the job has been successfully running or not and in real time there will be nearly 100 jobs or more jobs to run  In real time, the production support team will monitor the process. Under the workflow, there is an option called Job notification. After setting the job notification, we can set a notification to email . if we click the date and time its takes us to the notebook which is scheduled there we can able to see the error where it happens . order to see more details, we need to under Spark tab, where we have the option to view logs ( tha...

Transformation - section 6 - data flow

  Feature from Slide Explanation ✅ Code-free data transformations Data Flows in ADF allow you to build transformations using a drag-and-drop visual interface , with no need for writing Spark or SQL code. ✅ Executed on Data Factory-managed Databricks Spark clusters Internally, ADF uses Azure Integration Runtimes backed by Apache Spark clusters , managed by ADF, not Databricks itself . While it's similar in concept, this is not the same as your own Databricks workspace . ✅ Benefits from ADF scheduling and monitoring Data Flows are fully integrated into ADF pipelines, so you get all the orchestration, parameterization, logging, and alerting features of ADF natively. ⚠️ Important Clarification Although it says "executed on Data Factory managed Databricks Spark clusters," this does not mean you're using your own Azure Databricks workspace . Rather: ADF Data Flows run on ADF-managed Spark clusters. Azure Databricks notebooks (which you trigger via an "Exe...