SQL remember points

1. If we give an after where condition, right (city, 2) search will provide an answer according to the last two characters.

Example: Select distinct CITY from STATION where left(CITY, 1) not in ('a','e','i','o','u',' A','E','I','O','U') and right(CITY, 1) not in ('a','e','i','o','u','A','E','I','O','U');

Here it will search and give an answer, is the city name from the left first character not starting with following and city name right means from last 1 means first character is not starting with following.

2. If we want to search for one character within the keyword, like with _ and % can be used if it is more than one character, go for In keyword

3. If there are more than 3 tables and you don't know how to join or perform a query, remember the ER relationship. Try to visualize each table use case, relate them to each other, and practice accordingly.

4. Go to dbdiagram.io - Database Relationship Diagrams Design Tool website to get the relationship between the tables.

5. To make the advanced joins, if we want to join more than three tables. We have to think like submission_stats joined on view_stats. view_stats joined on challenges. challenges joined on the college. But in the query it looks like bottom to top approach.

SELECT

con.contest_id,

con.hacker_id,

con.name,

SUM(total_submissions) AS total_submissions,

SUM(total_accepted_submissions) AS total_accepted_submissions,

SUM(total_views) AS total_views,

SUM(total_unique_views) AS total_unique_views

FROM

contests con

JOIN

colleges col ON con.contest_id = col.contest_id

JOIN

challenges cha ON col.college_id = cha.college_id

LEFT JOIN

(SELECT challenge_id,

SUM(total_views) AS total_views,

SUM(total_unique_views) AS total_unique_views

FROM view_stats

GROUP BY challenge_id) vs

ON cha.challenge_id = vs.challenge_id

LEFT JOIN

(SELECT challenge_id,

SUM(total_submissions) AS total_submissions,

SUM(total_accepted_submissions) AS total_accepted_submissions

FROM submission_stats

GROUP BY challenge_id) ss

ON cha.challenge_id = ss.challenge_id

GROUP BY

con.contest_id, con.hacker_id, con.name

HAVING

SUM(total_submissions) != 0 OR

SUM(total_accepted_submissions) != 0 OR

SUM(total_views) != 0 OR

SUM(total_unique_views) != 0

ORDER BY

contest_id;

6. Subqueries can be used within SQL joins to filter, aggregate, or manipulate data before joining. For example, you can use a subquery to find customers who have spent the most in each product category before joining them with other tables.

Comments

session 19 Git Repository

🔁 Steps to Create a Branch in Databricks, Pull from Git, and Merge into a Collaborative Branch Create a New Branch in Databricks: Go to the Repos tab in your workspace. Navigate to the Git-linked repo. Click the Git icon (or three dots ⋮) and choose "Create Branch." Give your branch a name (e.g., feature-xyz ) and confirm. Pull the Latest Changes from Git: With your new branch selected, click the Git icon again. Select “Pull” to bring the latest updates from the remote repository into your local Databricks environment. Make Changes & Commit: Edit notebooks or files as needed in your branch. Use the "Commit & Push" option to push changes to the remote repo. Merge into the Collaborative Branch: Switch to the collaborative branch (e.g., dev or main ) in Git or from the Databricks UI. Click "Pull & Merge" . Choose the branch you want to merge into the collaborative branch. Review the c...

Session 18 monitering and logging - Azure Monitor , Log analytics , and job notification

After developing the code, we deploy it into the production environment. To monitor and logging the jobs run in the real time systems in azure we have scheduled the jobs under the workflow , we haven't created any monitoring or any matrics . After a few times, the job failed, but we don't know because we haven't set up any monitoring, and every time we can't navigate to workspace-> workflows, under runs to see to check whether the job has been successfully running or not and in real time there will be nearly 100 jobs or more jobs to run In real time, the production support team will monitor the process. Under the workflow, there is an option called Job notification. After setting the job notification, we can set a notification to email . if we click the date and time its takes us to the notebook which is scheduled there we can able to see the error where it happens . order to see more details, we need to under Spark tab, where we have the option to view logs ( tha...

Transformation - section 6 - data flow

Feature from Slide Explanation ✅ Code-free data transformations Data Flows in ADF allow you to build transformations using a drag-and-drop visual interface , with no need for writing Spark or SQL code. ✅ Executed on Data Factory-managed Databricks Spark clusters Internally, ADF uses Azure Integration Runtimes backed by Apache Spark clusters , managed by ADF, not Databricks itself . While it's similar in concept, this is not the same as your own Databricks workspace . ✅ Benefits from ADF scheduling and monitoring Data Flows are fully integrated into ADF pipelines, so you get all the orchestration, parameterization, logging, and alerting features of ADF natively. ⚠️ Important Clarification Although it says "executed on Data Factory managed Databricks Spark clusters," this does not mean you're using your own Azure Databricks workspace . Rather: ADF Data Flows run on ADF-managed Spark clusters. Azure Databricks notebooks (which you trigger via an "Exe...

Keerthana Blogs

Search This Blog