Python Pandas for Data Analysis (Learn what Matters)

What is Pandas?
Pandas is a Python library designed to handle structured data easily. It offers powerful data structures, namely Series and DataFrames, to manage and manipulate data effectively. With Pandas, you can clean, analyze, and even visualize data in a few lines of code.
Why Use Pandas?
Efficient Data Handling: Process large datasets with ease.
Versatility: Handle various data formats, including CSV, JSON, Excel, and SQL.
Integration: Works seamlessly with other data science libraries like Numpy, Scipy, and Scikit-Learn.
Installing Pandas
To start using Pandas, install it using pip:
pip install pandas
Getting Started with Pandas
Pandas is imported using pd as a common alias:
import pandas as pd
Pandas Data Structures: Series and DataFrames
Pandas has two types of data structures:
i. Series - One dimensional array with indexes.
ii. Dataframe - Tabular spreadsheet like structure represent rows and columns.
Series
A Pandas Series is a one-dimensional array that can hold any data type.
# Creating a Series
data = pd.Series([10, 20, 30, 40])
print(data)
# output -
# 0 10
# 1 20
# 2 30
# 3 40
DataFrame
A DataFrame is a two-dimensional structure (table) with labeled axes (rows and columns). It’s the most commonly used structure in Pandas.
import numpy as np
import pandas as pd
dic1 = {
"name":['sutapa', 'gojo', 'Suguru', 'levi', 'naruto'],
"marks": [89, 56, 34, 90, 32],
"city": ["Kalyani", "Shibuiya", "Keisen", "Wall Maria", "Konoha"]
}
df = pd.DataFrame(dic1)
print(df)
# output -
# name marks city
# 0 sutapa 89 Kalyani
# 1 gojo 56 Shibuiya
# 2 satroru 34 Keisen
# 3 levi 90 Wall Maria
# 4 naruto 32 Konoha
#Gives a Table like structutre
Note- DataFrame is collection of Series
print(df['name'])
#output -
#0 sutapa
#1 gojo
#2 Suguru
#3 levi
#4 naruto
#Name: name, dtype: object
print(type(df['name']))
#or
print(df[df.columns[0]])
# pandas.core.series.Series (DataFrame is collection of series)
Converting into Excel Form
pd.to_csv() method in Pandas is used to export a DataFrame to a CSV (Comma-Separated Values) file, in excel sheet. It converts the data in a DataFrame into a text format (most of the time in a Excel File)
df.to_csv('students.csv')
# Now we will get a Excel sheet on that folder with
# the name of 'students.csv' with the same data as 'df'

Without Index: If you want the excel file or .csv file without any index number then -
df.to_csv('students_NoIndex.csv', index=False)
# It will make the file without indecing
Without Headers: .csv file without any headers (it’s give me a text file)
df.to_csv('students_NoHeaders', header=False)

Specifying Columns: used to save only specific columns in the DataFrame.
# Exporting only specific columns
df.to_csv('students_SpecificColumns.csv', columns=['name', 'city'])
Reading file in Pandas
(Pandas also supports other file formats like Excel, JSON, SQL databases, and more.)
- Reading a CSV File
The most commonly used function is pd.read_csv() for reading CSV files, but Pandas also provides functions to read Excel, JSON, SQL, and other formats.
import pandas as pd
# Read a CSV file
df = pd.read_csv('data.csv')
- Reading an Excel File
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')
Reading a JSON File
df = pd.read_json('data.json')Reading a Plain Text File
# Reading a tab-separated text file
df = pd.read_csv('data.txt', sep='\t')
- Reading a SQL File
Some Extra Functions In Pandas
.head()&.tail()- are used to quickly preview the first and last few rows of a DataFrame
# Show a few rows from start
df.head()
# Show a few rows from last
df.tail()
# Reading only 1st two rows
df.head(2)
# Reading only last two rows
df.tail(2)

.describe()- fIt provides a quick summary of the main statistical metrics for numerical columns in a DataFrame
# Describe the numarical values with 'count', 'mean', 'std', 'min', 'max' etc
df.describe()

Creating Random Series & Dataframe
Random Series using
pd.Series&np.randomimport pandas as pd import numpy as np Random_series = pd.Series(np.random.rand(6)) print(Random_series); output - 0 0.740176 1 0.527784 2 0.751823 3 0.774766 4 0.416964 5 0.816490 dtype: float64Random Dataframe using
pd.DataFrame&np.randomimport pandas as pd import numpy as np Random_dataframe = pd.DataFrame(np.random.rand(4,6)) print(Random_dataframe); output - 0 1 2 3 4 5 0 0.344185 0.204012 0.034631 0.176049 0.795942 0.045355 1 0.128220 0.390108 0.532313 0.969730 0.877553 0.508515 2 0.732563 0.080579 0.589364 0.077826 0.708684 0.215115 3 0.508971 0.029887 0.487922 0.316253 0.827630 0.901426
Converting a DataFrame to a NumPy Array
using to_numpy()
data = {
'A': [1, 2, 3],
'B': ['a', 'b', 'c']
}
df = pd.DataFrame(data)
# Convert to NumPy array
numpy_array = df.to_numpy()
print(numpy_array)
output -
[['1' 'a']
['2' 'b']
['3' 'c']]
Converting a Series to a NumPy Array
series = pd.Series([10, 20, 30])
# Convert Series to NumPy array
numpy_array = series.to_numpy()
print(numpy_array)
output -
[10 20 30]
Transposing a DataFrame
using .T
data = {
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
}
df = pd.DataFrame(data)
# Transpose the DataFrame
df_transposed = df.T
print(df_transposed)
output -
0 1 2
A 1 2 3
B 4 5 6
C 7 8 9
Note- For a two dimension array, axis=0 means row and axis=1 columns
df.sort_index(axis=1, ascending=False)
# output -
# C B A
#0 7 4 1
#1 8 5 2
#2 9 6 3
# Here, the columns gets in decending order, cause we mention that ‘axis=1’
df.sort_index(axis=0, ascending=False)
# output
# A B C
#2 3 6 9
#1 2 5 8
#0 1 4 7
# Rows are in decending order



