Skip to main content

Posts

2.a flattening the JSON File.

Think of JSON processing as this journey: Raw JSON → Python objects → Flatten → Clean → DataFrame → Spark → Production pipeline Goal In Local Python In PySpark (Databricks) In Cloud SQL (Snowflake/BigQuery) Read a file json.load(f) spark.read.json() COPY INTO / Storage Integration Go inside an object data["key"]["subkey"] df.select("key.subkey") SELECT column:key.subkey Turn a list into rows for item in my_list: explode(col("my_list")) LATERAL FLATTEN() / UNNEST() Phase 1 — JSON Fundamentals 1. JSON Data Types You need to immediately recognize how JSON maps to Python. JSON Python Example Object dict {"name":"John"} Array list [1,2,3] String str "London" Number int/float 100 Boolean bool true Null None null Example: { "employee":{ "id":100, "name":"John" }, "skills":[ "Python", "Spark" ] } Python sees this as: { "employee...

2. Things data engineer should know

  data enginner should know how to write and read python code to extract the data from the internal and external apps . 1. Need understand what is API call means . if we search in the front end (GUI) the results will be fetched back  database through API call . a. We need to know to write the python code to fetch the details through API calls . its possible by using Request library in python. 2. Need to understand the use case of the company. example : telecom company (news channel). the company like news channels , weather report like shown in the diagram . it can be acheive through  by exposing certain API by the company X to teh other company . another company gets the details by sending the API calls . The structure of an API call is generally the same across industries. In telecom, APIs are commonly used to manage subscribers, retrieve usage, send SMS, provision services, or check network status. General Structure of an API Call HTTP Method + URL + Headers + Paramete...