Skip to main content

SQL hackerrank interview question

 

Youtube : Complex SQL Query Breakdown Step By Step Detailed Explanation



answer: SELECT name
FROM Employee
WHERE salary > 2000 AND months < 10
ORDER BY employee_id ;





Answer: SELECT
    con.contest_id,
    con.hacker_id, 
    con.name, 
    SUM(total_submissions) AS total_submissions, 
    SUM(total_accepted_submissions) AS total_accepted_submissions, 
    SUM(total_views) AS total_views, 
    SUM(total_unique_views) AS total_unique_views
FROM
    contests con 
JOIN
    colleges col ON con.contest_id = col.contest_id 
JOIN
    challenges cha ON col.college_id = cha.college_id 
LEFT JOIN
    (SELECT challenge_id, 
            SUM(total_views) AS total_views, 
            SUM(total_unique_views) AS total_unique_views
     FROM view_stats 
     GROUP BY challenge_id) vs 
    ON cha.challenge_id = vs.challenge_id 
LEFT JOIN
    (SELECT challenge_id, 
            SUM(total_submissions) AS total_submissions, 
            SUM(total_accepted_submissions) AS total_accepted_submissions 
     FROM submission_stats 
     GROUP BY challenge_id) ss 
    ON cha.challenge_id = ss.challenge_id
GROUP BY
    con.contest_id, con.hacker_id, con.name
HAVING
    SUM(total_submissions) != 0 OR 
    SUM(total_accepted_submissions) != 0 OR
    SUM(total_views) != 0 OR
    SUM(total_unique_views) != 0
ORDER BY
    contest_id;

Query Explanation:

1. Base Tables:

  • Contests: Contains contest details (contest_idhacker_idname).

  • Colleges: Links contests to colleges (college_idcontest_id).

  • Challenges: Represents coding challenges within colleges (challenge_idcollege_id).

  • View_Stats: Contains view statistics for challenges (challenge_idtotal_viewstotal_unique_views).

  • Submission_Stats: Contains submission statistics for challenges (challenge_idtotal_submissionstotal_accepted_submissions).


2. Subqueries:

  • vs Subquery:

    sql
    Copy
    (select challenge_id, sum(total_views) as total_views, sum(total_unique_views) as total_unique_views
     from view_stats 
     group by challenge_id)
    • Aggregates total_views and total_unique_views for each challenge.

    • Groups by challenge_id to ensure each challenge has a single row of aggregated stats.

  • ss Subquery:

    sql
    Copy
    (select challenge_id, sum(total_submissions) as total_submissions, sum(total_accepted_submissions) as total_accepted_submissions 
     from submission_stats 
     group by challenge_id)
    • Aggregates total_submissions and total_accepted_submissions for each challenge.

    • Groups by challenge_id to ensure each challenge has a single row of aggregated stats.


3. Main Query:

  • Joins:

    • Contests is joined with Colleges to link contests to colleges.

    • Colleges is joined with Challenges to link colleges to challenges.

    • Challenges is left-joined with the vs subquery to include view stats for each challenge.

    • Challenges is left-joined with the ss subquery to include submission stats for each challenge.

  • Aggregation:

    • The SUM function calculates the total values for:

      • total_submissions

      • total_accepted_submissions

      • total_views

      • total_unique_views

    • These sums are calculated for each contest (con.contest_id).

  • Grouping:

    • The results are grouped by con.contest_idcon.hacker_id, and con.name to ensure each contest is represented as a single row.

  • Filtering:

    • The HAVING clause ensures that only contests with at least one non-zero sum are included:

      sql
      Copy
      having sum(total_submissions)!=0 or 
             sum(total_accepted_submissions)!=0 or
             sum(total_views)!=0 or
             sum(total_unique_views)!=0
  • Ordering:

    • The results are ordered by contest_id to ensure the output is sorted.


Query Breakdown:

sql
Copy
SELECT
    con.contest_id,
    con.hacker_id, 
    con.name, 
    SUM(total_submissions) AS total_submissions, 
    SUM(total_accepted_submissions) AS total_accepted_submissions, 
    SUM(total_views) AS total_views, 
    SUM(total_unique_views) AS total_unique_views
FROM
    contests con 
JOIN
    colleges col ON con.contest_id = col.contest_id 
JOIN
    challenges cha ON col.college_id = cha.college_id 
LEFT JOIN
    (SELECT challenge_id, 
            SUM(total_views) AS total_views, 
            SUM(total_unique_views) AS total_unique_views
     FROM view_stats 
     GROUP BY challenge_id) vs 
    ON cha.challenge_id = vs.challenge_id 
LEFT JOIN
    (SELECT challenge_id, 
            SUM(total_submissions) AS total_submissions, 
            SUM(total_accepted_submissions) AS total_accepted_submissions 
     FROM submission_stats 
     GROUP BY challenge_id) ss 
    ON cha.challenge_id = ss.challenge_id
GROUP BY
    con.contest_id, con.hacker_id, con.name
HAVING
    SUM(total_submissions) != 0 OR 
    SUM(total_accepted_submissions) != 0 OR
    SUM(total_views) != 0 OR
    SUM(total_unique_views) != 0
ORDER BY
    contest_id;

Example Walkthrough:

Input Data:

  • Contests:

    contest_idhacker_idname
    1101Contest A
    2102Contest B
  • Colleges:

    college_idcontest_id
    11
    22
  • Challenges:

    challenge_idcollege_id
    11
    22
  • View_Stats:

    challenge_idtotal_viewstotal_unique_views
    110050
    2200100
  • Submission_Stats:

    challenge_idtotal_submissionstotal_accepted_submissions
    1105
    22010

Query Execution:

  1. Joins:

    • Contests is joined with Colleges and Challenges to link contests to challenges.

    • Challenges is left-joined with vs and ss to include view and submission stats.

  2. Aggregation:

    • For contest_id = 1:

      • total_submissions = 10

      • total_accepted_submissions = 5

      • total_views = 100

      • total_unique_views = 50

    • For contest_id = 2:

      • total_submissions = 20

      • total_accepted_submissions = 10

      • total_views = 200

      • total_unique_views = 100

  3. Filtering:

    • Both contests have non-zero sums, so they are included.

  4. Ordering:

    • Results are sorted by contest_id.

Output:

contest_idhacker_idnametotal_submissionstotal_accepted_submissionstotal_viewstotal_unique_views
1101Contest A10510050
2102Contest B2010200100

Key Points:

  • The query uses subqueries to pre-aggregate stats for each challenge.

  • LEFT JOIN ensures that challenges without stats are still included (with NULL values, which are treated as 0 in aggregation).

  • The HAVING clause filters out contests with all-zero sums.

  • The result is sorted by contest_id.

Comments

Popular posts from this blog

session 19 Git Repository

  🔁 Steps to Create a Branch in Databricks, Pull from Git, and Merge into a Collaborative Branch Create a New Branch in Databricks: Go to the Repos tab in your workspace. Navigate to the Git-linked repo. Click the Git icon (or three dots ⋮) and choose "Create Branch." Give your branch a name (e.g., feature-xyz ) and confirm. Pull the Latest Changes from Git: With your new branch selected, click the Git icon again. Select “Pull” to bring the latest updates from the remote repository into your local Databricks environment. Make Changes & Commit: Edit notebooks or files as needed in your branch. Use the "Commit & Push" option to push changes to the remote repo. Merge into the Collaborative Branch: Switch to the collaborative branch (e.g., dev or main ) in Git or from the Databricks UI. Click "Pull & Merge" . Choose the branch you want to merge into the collaborative branch. Review the c...

Session 18 monitering and logging - Azure Monitor , Log analytics , and job notification

 After developing the code, we deploy it into the production environment. To monitor and logging the jobs run in the real time systems in azure  we have scheduled the jobs under the workflow , we haven't created any monitoring or any matrics . After a few times, the job failed, but we don't know because we haven't set up any monitoring, and every time we can't navigate to workspace-> workflows, under runs to see to check whether the job has been successfully running or not and in real time there will be nearly 100 jobs or more jobs to run  In real time, the production support team will monitor the process. Under the workflow, there is an option called Job notification. After setting the job notification, we can set a notification to email . if we click the date and time its takes us to the notebook which is scheduled there we can able to see the error where it happens . order to see more details, we need to under Spark tab, where we have the option to view logs ( tha...

Transformation - section 6 - data flow

  Feature from Slide Explanation ✅ Code-free data transformations Data Flows in ADF allow you to build transformations using a drag-and-drop visual interface , with no need for writing Spark or SQL code. ✅ Executed on Data Factory-managed Databricks Spark clusters Internally, ADF uses Azure Integration Runtimes backed by Apache Spark clusters , managed by ADF, not Databricks itself . While it's similar in concept, this is not the same as your own Databricks workspace . ✅ Benefits from ADF scheduling and monitoring Data Flows are fully integrated into ADF pipelines, so you get all the orchestration, parameterization, logging, and alerting features of ADF natively. ⚠️ Important Clarification Although it says "executed on Data Factory managed Databricks Spark clusters," this does not mean you're using your own Azure Databricks workspace . Rather: ADF Data Flows run on ADF-managed Spark clusters. Azure Databricks notebooks (which you trigger via an "Exe...