spark session
First import Spark session from pyspark
Define a Spark session
spark = sparksession.bulider.appname("name").getorcreate()
spark session methods -->read, write, createDataframe, table, sql
Read the functions
df1= spark.read.options().csv(file name )
.json(file name)
.text(file name)
options () --> multiline () --> df_multiline_json = spark.read.option("multiline", "true").json("multiline_json.json")
delimiter --> df_pipe_delimited = spark.read.option("delimiter", "|").csv("file_without_header.csv")
Header --> df_no_header = spark.read.csv("file_with_header.csv") # Default header=false
df_header = spark.read.option("header", "true").csv("file_with_header.csv")
schema --> if there is no header, we need to construct the schema by structtype
4. schema Option
print("\n--- schema Option ---")
custom_schema = StructType([
StructField("person_id", IntegerType(), True),
StructField("full_name", StringType(), True),
StructField("age", IntegerType(), True)
])
df_with_schema = spark.read.option("header", "true").schema(custom_schema).csv("file_with_header.csv")
inferschema --> df_infer_schema = spark.read.option("header", "true").option("inferSchema", "true").csv("file_with_header.csv")
df_infer_schema.show()
format().load("file name") = if we need to expliity mention the file format and load the file
df.createor ReplaceTempView("view name ")
spark.table("view name ")
we can change into
Comments
Post a Comment