Pandas is a Python library.
Pandas are used to analyze data.
A Pandas Series is like a column in a table.
myvar = pd.Series(a)
print(myvar[0]) --> output will be 1 (or)
myvar = pd.Series(a, index = ["x", "y", "z"])
Labels
If nothing else is specified, the values are labeled with their index number. First value has an index 0, second value has index 1 etc.
This label can be used to access a specified value.
if its dict to data frame
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories)
print(myvar)
output :
day1 420 day2 380 day3 390
DataFrames
Data sets in Pandas are usually multi-dimensional tables, called DataFrames.
A series is like a column, a DataFrame is a whole table.
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
myvar = pd.DataFrame(data)
print(myvar)
Pandas use the loc
attribute to return one or more specified row(s)
print(df.loc[0])
calories 420 duration 50 Name: 0, dtype: int6
to_string()
to print the entire DataFrame.
1. Importing pandas:
import pandas as pd
import numpy as np # Often used with pandas
2. Creating DataFrames:
- From a dictionary:
data = {'col1': [1, 2, 3], 'col2': ['a', 'b', 'c']}
df = pd.DataFrame(data)
- From a list of lists:
data = [[1, 'a'], [2, 'b'], [3, 'c']]
df = pd.DataFrame(data, columns=['col1', 'col2'])
- From a CSV file:
df = pd.read_csv('data.csv')
- From an Excel file:
df = pd.read_excel('data.xlsx')
3. Basic DataFrame Operations:
- Viewing data:
df.head() # First 5 rows
df.tail() # Last 5 rows
df.info() # DataFrame info
df.describe() # Summary statistics
df.shape # (rows, columns)
df.columns # Column names
df.index # Index values
- Selecting data:
df['col1'] # Select column 'col1'
df[['col1', 'col2']] # Select multiple columns
df.loc[0] # Select row by label (index)
df.iloc[0] # Select row by integer position
df[df['col1'] > 1] # Boolean indexing (filtering)
- Adding/removing columns:
df['new_col'] = [4, 5, 6] # Add a new column
df.drop('col1', axis=1) # Remove column 'col1'
- Adding/removing rows:
df = df.append({'col1':4, 'col2':'d'}, ignore_index=True) #add row.
df.drop(0) #remove row by index.
4. Data Manipulation:
- Sorting:
df.sort_values(by='col1')
- Grouping:
df.groupby('col1').mean()
- Applying functions:
df['col1'].apply(lambda x: x * 2)
- Handling missing values:
df.isnull() # Check for missing values
df.dropna() # Remove rows with missing values
df.fillna(0) # Fill missing values with 0
- String operations (for string columns):
df['col2'].str.upper() #convert to upper case.
df['col2'].str.contains('a') #boolean series if string contains a.
- Merging/Joining:
pd.merge(df1, df2, on='common_col') # Merge DataFrames
pd.concat([df1,df2]) #combine dataframes vertically
df1.join(df2, on='index', how='left') #join dataframes.
5. Time Series (if applicable):
- Datetime conversion:
df['date'] = pd.to_datetime(df['date'])
- Resampling:
df.resample('M', on='date').mean() #resample to monthly data.
6. Saving Data:
- To CSV:
df.to_csv('output.csv', index=False)
- To Excel:
df.to_excel('output.xlsx', index=False)
Important Notes:
axis=0
refers to rows, andaxis=1
refers to columns.inplace=True
modifies the DataFrame directly, without creating a copy.- Always check the pandas documentation for the most up-to-date information.
Comments
Post a Comment