Python - Pandas lib

Pandas is a Python library.

Pandas are used to analyze data.

A Pandas Series is like a column in a table.

a = [1, 7, 2]

myvar = pd.Series(a)

print(myvar[0]) --> output will be 1 (or)

myvar = pd.Series(a, index = ["x", "y", "z"])

Labels

If nothing else is specified, the values are labeled with their index number. First value has an index 0, second value has index 1 etc.

This label can be used to access a specified value.

if its dict to data frame

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories)

print(myvar)

output :

day1    420
day2    380
day3    390

DataFrames

Data sets in Pandas are usually multi-dimensional tables, called DataFrames.

A series is like a column, a DataFrame is a whole table.

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

myvar = pd.DataFrame(data)

print(myvar)

Pandas use the loc attribute to return one or more specified row(s)

print(df.loc[0])

  calories    420
  duration     50
  Name: 0, dtype: int6

to_string() to print the entire DataFrame.

1. Importing pandas:

Python
import pandas as pd
import numpy as np # Often used with pandas

2. Creating DataFrames:

From a dictionary:

Python
data = {'col1': [1, 2, 3], 'col2': ['a', 'b', 'c']}
df = pd.DataFrame(data)

From a list of lists:

Python
data = [[1, 'a'], [2, 'b'], [3, 'c']]
df = pd.DataFrame(data, columns=['col1', 'col2'])

From a CSV file:

Python

df = pd.read_csv('data.csv')

From an Excel file:

Python

df = pd.read_excel('data.xlsx')

3. Basic DataFrame Operations:

Viewing data:

Python
df.head()       # First 5 rows
df.tail()       # Last 5 rows
df.info()       # DataFrame info
df.describe()   # Summary statistics
df.shape        # (rows, columns)
df.columns      # Column names
df.index        # Index values

Selecting data:

Python
df['col1']       # Select column 'col1'
df[['col1', 'col2']] # Select multiple columns
df.loc[0]         # Select row by label (index)
df.iloc[0]        # Select row by integer position
df[df['col1'] > 1] # Boolean indexing (filtering)

Adding/removing columns:

Python
df['new_col'] = [4, 5, 6] # Add a new column
df.drop('col1', axis=1)    # Remove column 'col1'

Adding/removing rows:

Python
df = df.append({'col1':4, 'col2':'d'}, ignore_index=True) #add row.
df.drop(0) #remove row by index.

4. Data Manipulation:

Sorting:

Python

df.sort_values(by='col1')

Grouping:

Python

df.groupby('col1').mean()

Applying functions:

Python
df['col1'].apply(lambda x: x * 2)

Handling missing values:

Python
df.isnull()       # Check for missing values
df.dropna()       # Remove rows with missing values
df.fillna(0)      # Fill missing values with 0

String operations (for string columns):

Python
df['col2'].str.upper() #convert to upper case.
df['col2'].str.contains('a') #boolean series if string contains a.

Merging/Joining:

Python
pd.merge(df1, df2, on='common_col') # Merge DataFrames
pd.concat([df1,df2]) #combine dataframes vertically
df1.join(df2, on='index', how='left') #join dataframes.

5. Time Series (if applicable):

Datetime conversion:

Python

df['date'] = pd.to_datetime(df['date'])

Resampling:

Python
df.resample('M', on='date').mean() #resample to monthly data.

6. Saving Data:

To CSV:

Python

df.to_csv('output.csv', index=False)

To Excel:

Python

df.to_excel('output.xlsx', index=False)

Important Notes:

axis=0 refers to rows, and axis=1 refers to columns.
inplace=True modifies the DataFrame directly, without creating a copy.
Always check the pandas documentation for the most up-to-date information.

Keerthana Blogs

Search This Blog