Numpy for Data Analysis (Learn what Matters)
A Complete Guide to Using Numpy in Python for Data Analysis
Numpy
NumPy is a powerful Python library essential for numerical computing, especially useful in scientific and data analysis tasks. It provides a flexible, efficient array object called ndarray
, which allows for fast operations on large arrays and matrices of numeric data. NumPy arrays are more efficient than Python lists for numerical operations.
Installing
For installing numpy in your computer, create a file, open windows powershell there and run this command
pip install numpy
(Then install python and Jupyter Notebook if it’s not installed in your computer
for installing Jupyter Notebook, run this command in your terminal
pip install jupyter
for installing python go to this website https://www.python.org/downloads/ & dowload it.
Now to launch the jupyter notebook, run this
jupyter notebook
Now click on the ‘New’ option and select ‘Python 3’ from that drop down
Why we use Numpy
Numpy saves space by storing numbers efficiently.
Numpy provides ready-made functions like argmax and argmin.
Numpy allows flexible data types.
Numpy is fast, making data processing quicker.
Numpy is easy to learn and provides good memory management.
(Why we use Jupyter Notebook
Jupyter Notebook is an open-source tool for data analysis, providing visualizations and live coding.
Jupyter Notebook supports multiple programming languages and allows easy sharing of notebooks.
for learning more about Jupyter Notebook, visit my blogn post about this notebook → https://sutto.hashnode.dev/jupyter-notebook-basics-learn-what-you-need )
Using Numpy
Importing Numpy -
import numpy as np
creating an array -
myarr = np.array([2, 45, 67,90])
myarr
accessing a perticular element with index number
myarr[1]
Checking Datatypes
myarr.dtype
# output - dtype('int64')
output - dtype('int64')
changing a value in array
myarr[1] = 78
Checking shape/structure of the array
myarr.shape
# output - (3,4)
Checking the size of the matrix
myarr.size
# output - [12]
Some Numpy Datatypes -
Integer Types
np.int
8
: 8-bit integer (-128 to 127)np.int
16
: 16-bit integernp.int
32
: 32-bit integernp.int
64
: 64-bit integer
Floating Point Types
np.float16
: Half precision floatnp.float32
: Single precision floatnp.float64
: Double precision float
Boolean Type
np.bool_
: Boolean (True or False)
String Types
np.string_
: Fixed-size ASCII string
Object Type
np.object_
: For generic Python objects. Useful for arrays of mixed data types or arbitrary Python objects.
Date and Time Types
np.datetime64
: Date and time (supports various granularities, e.g., year, month, day, hour)np.timedelta64
: Differences in dates and times
Additional Types
np.void
: Used for raw data types and structured arrays
Numpy Axis
In NumPy, axes refer to the directions along which operations can be performed on an array.
Axis 0 refers to the first dimension (often the rows).
Axis 1 refers to the second dimension (often the columns).
Higher dimensions follow similarly with Axis 2, Axis 3, and so on.
For 1D array (matrix), there will be only Axis 0
For a 2D array (matrix), the axes will be: Axis 0 (rows) & Axis 1 (columns)
For a 3D array, the axes extend to a third dimension, where: Axis 0 (depth (or the "first level" of the 3D array)), Axis 1 (rows) & Axis 2 (columns)
a = [[1,2,3], [4,5,6],[7,8,9]]
arr = np.array(a)
print(arr)
Summing Across an Axis
- Sum along Axis 0 (Rows)
sum_axis_0 = arr.sum(axis=0)
print(sum_axis_0)
# output -
# array([12, 15, 18])
Sum along Axis 1 (Columns)
sum_axis_1 = arr.sum(axis=1) print(sum_axis_1) # output - # array([ 6, 15, 24])
Methods of creating Arrays in Numpy
There are 5 general mechanisms for creating arrays:
Converting from Python Data Structures: Creating arrays by converting other Python structures like lists or tuples.
ex-
list_data = [1, 2, 3, 4, 5] array_from_list = np.array(list_data) print("Array from list:", array_from_list)
Built-in NumPy Array Creation Functions: Using NumPy's intrinsic functions, such as arange
, ones
, and zeros
, to create arrays directly.
ex of -
np.arange()
- Array with a Range of Values
import numpy as np
arr = np.arange(0, 10, 2) # Starts at 0, ends before 10, step size of 2
print("np.arange:", arr)
# Output: [0 2 4 6 8]
arr2 = np.arange(34)
# output -
# array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
# 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33])
np.ones()
- Array Filled with Ones
arr = np.ones((3, 3)) # 3x3 array filled with 1
print("np.ones:\n", arr)
# Output:
# [[1. 1. 1.]
# [1. 1. 1.]
# [1. 1. 1.]]
np.zeros()
- Array Filled with Zeros
arr = np.zeros((2, 4)) # 2x4 array filled with 0
print("np.zeros:\n", arr)
# Output:
# [[0. 0. 0. 0.]
# [0. 0. 0. 0.]]
np.full()
- Array with a Specific Constant Value
arr = np.full((2, 2), 7) # 2x2 array filled with the value 7
print("np.full:\n", arr)
# Output:
# [[7 7]
# [7 7]]
np.eye()
- Identity Matrix
arr = np.eye(3) # 3x3 identity matrix
print("np.eye:\n", arr)
# Output:
# [[1. 0. 0.]
# [0. 1. 0.]
# [0. 0. 1.]]
np.linspace()
- Array of Linearly Spaced Values
arr = np.linspace(0, 1, 5) # 5 values from 0 to 1, inclusive (equally sapced)
print("np.linspace:", arr)
# Output: [0. 0.25 0.5 0.75 1. ]
np.diag()
- Array with Specified Diagonal Values
arr = np.diag([1, 2, 3]) # Diagonal array with 1, 2, 3
print("np.diag:\n", arr)
# Output:
# [[1 0 0]
# [0 2 0]
# [0 0 3]]
np.empty()
- empty the array and fill the array with random elements
emp = np.empty((2,3))
print(emp)
# output -
# array([[1.9180575e-316, 0.0000000e+000, 0.0000000e+000],
# [0.0000000e+000, 0.0000000e+000, 0.0000000e+000]])
np.empty_like()
- reate a new array with the same shape and data type as an existing array, but without initializing any values.
em_like = np.empty_like(arr)
print(em_like)
np.identity() - gives identity matrix
ide = np.identity(3)
print(ide)
#output -
# [[1. 0. 0.]
# [0. 1. 0.]
# [0. 0. 1.]]
Loading Arrays from Files: Reading arrays from disk, whether from standard formats (like
.npy
or.csv
) or custom file formats.Creating Arrays from Raw Data: Initializing arrays from raw bytes using strings or buffers.
Using Specialized Library Functions: Leveraging functions from libraries like
random
to generate arrays with specific properties.
TWO IMPORTANT METHODES
.reshape()
- change the shape of an existing array without altering its data.
arr = np.arange(18)
print(arr)
# output -
# array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17])
# reshaping the array
arr.reshape(3,6)
# output -
# array([[ 0, 1, 2, 3, 4, 5],
# [ 6, 7, 8, 9, 10, 11],
# [12, 13, 14, 15, 16, 17]])
(Now if we try to print the ‘arr’ it will give us the previous result, to upadte that we can write in this way
arr = arr.reshape(3,6)
now it will give us the reshaped result)
.ravel()
- flatten a multi-dimensional array into a 1D array. (opposite of .reshape() method
arr.ravel()
# output - array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17])
.argmax()
- give the index of the maximum value in an array.
newArr = np.array([1,5,75,86,0])
newArr.argmax()
# output - np.int64(3)
.argmin()
- give the index of the minimum value in an array.
newArr.argmin()
# output - np.int64(4)
.argsort()
- returns the indices that would sort an array. In other words, it gives you the indices that you can use to rearrange the elements of the array in sorted order.
newArr.argsort()
# output - array([4, 0, 1, 2, 3])
Using argmax()
on a 2D array (without specifying axis
):
(in 2D array they 1st flattend the array, make it a 1D array, then they give the index number of maximum number in that flattend array)
arr = np.array([[1, 3, 7], [2, 5, 6]])
# Find the index of the maximum value in the flattened array
np.argmax(arr)
Using argmax()
on a 2D array with axis
:
arr = np.array([[1, 3, 7], [2, 5, 6]])
# Find the index of the maximum value along the first axis (columns)
np.argmax(arr, axis=0)
# Index of the maximum values along axis 0 (: [1 1 0]
np.argmax(arr, axis=1)
#Index of the maximum values along axis 1: [2 1]
Using argsort()
on a 2D array
arr_2d = np.array([[3, 1, 2], [5, 4, 6]])
# Sort along axis 0 (columns)
sorted_indices_axis_0 = arr_2d.argsort(axis=0)
print(sorted_indices_axis_0)
# output -
# [[0 0 0]
# [1 1 1]]
# Sort along axis 1 (rows)
sorted_indices_axis_1 = arr_2d.argsort(axis=1)
print(sorted_indices_axis_1)
#output
# [[1 2 0]
# [1 0 2]]
Attributes in Numpy
.T
- transpose of an arrayarr.T # output - # array([[1, 4, 7], # [2, 5, 8], # [3, 6, 9]])
.flat
- to access a 1D iterator over a multi-dimensional array.for item in arr.flat: print(item) # output - 1 2 3 4 5 6 7 8 9
.ndim - show number of dimensions
arr.ndim # output - # 2
there is more like (.size
, .shape
, .dtype
, .nbytes
(how much bytes the array is consuming), etc)
Matrix operation on Numpy
Matrix Addition
In addition of matrix, The matrices must have the same dimensions (i.e., the same number of rows and columns).
The addition is performed element-wise, meaning each element in one matrix is added to the element in the same position in the other matrix.
# Define two matrices (2x3)
A = np.array([[1, 2, 3],
[4, 5, 6]])
B = np.array([[7, 8, 9],
[10, 11, 12]])
# Perform matrix addition
print(A + B)
#output
# [[ 8 10 12]
# [14 16 18]]
(But if we try to do this without numpy, it will just increase the element numbers
# witout numpy(np)
print([56,8] + [2,9])
#output
# [56, 8, 2, 9]
Matrix Multiplication
Matrix multiplication in NumPy can be done in two ways:
Element-wise multiplication (using
*
ornp.multiply()
), which multiplies corresponding elements.Matrix multiplication (using
@
ornp.dot
()
), which follows the rules of linear algebra. [dot multiplication]
In Element-wise multiplication each element in one matrix is multiplied by the corresponding element in the other matrix.
print(A * B) # output # [[ 7 16 27] # [40 55 72]]
Matrix multiplication (Dot multiplication) follows the rules of linear algebra. For matrix multiplication, the number of columns in the first matrix must match the number of rows in the second matrix. The result has dimensions
(rows of A, columns of B)
.# Perform matrix multiplication C = A @ B print(C) #output # [[ 58 64] # [139 154]]
(
First row, first column:
(1*7) + (2*9) + (3*11) = 58
First row, second column:
(1*8) + (2*10) + (3*12) = 64
Second row, first column:
(4*7) + (5*9) + (6*11) = 139
Second row, second column: (4*8) + (5*10) + (6*12) = 154
)
SquareRoot of matrix
In NumPy, we can calculate the square root of each element in a matrix using
np.sqrt()
.
# Define a matrix
A = np.array([[4, 9, 16],
[25, 36, 49]])
# Compute the element-wise square root
print( np.sqrt(A))
#output
# [[2. 3. 4.]
# [5. 6. 7.]]
Sum of every element in Matrix
.sum()
is a method that calculates the sum of elements along a specified axis of an array. It can be applied to an entire array or along specific rows or columns.
- Sum of All Elements
# Define a 2D array
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
total_sum = arr.sum()
print("Sum of all elements:", total_sum)
#output
# Sum of all elements: 45
# 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 = 45.
Sum of ROWS (Across the Columns or through the Columns)(
axis=0
) (one extra row will be added)print(arr.sum(axis=0)) #output # [12 15 18]
First column:
1 + 4 + 7 = 12
Second column:
2 + 5 + 8 = 15
Third column:
3 + 6 + 9 = 18
Sum of COLUMNS (Across the Rows or through the Rows)
(
axis=1
) (one extra column will be added)print(arr.sum(axis=1)) # output # [ 6 15 24]
Explanation:
First row:
1 + 2 + 3 = 6
Second row:
4 + 5 + 6 = 15
Third row:
7 + 8 + 9 = 24
Minimun and Maximum element in matrix
.min()
Method - find the minimum value of a martixarr = np.array([[1, 2, 3], [4, 0, 1], [5, 0, 3]]) print(arr.min()) # output # 0
.max()
Method - find the maximum value of a martixprint(arr.max()) # output # 5
Find specific element in Matrix
.where()
the
.where()
function is a powerful tool for selecting elements based on conditions. It allows you to find indices where a condition isTrue
and optionally replace values based on conditions.arr = np.array([[1, 2, 3], [4, 0, 1], [5, 0, 3]]) # Find indices where elements are greater than 3 result = np.where(arr > 3) print(result) # output # (array([1, 2]), array([0, 0]))
Explanation:
This output shows that elements
4
and5
are greater than 3 at positions(1, 0)
and(2, 0)
.array([1, 2])
: Row indices where elements are greater than 3.Row
1
contains4
.Row
2
contains5
.
array([0, 0])
: Column indices where elements are greater than 3.- Column
0
for both elements.
- Column
( Now,
type(np.where(ar2>3))
# output
# tuple )
A TUPLE is an ordered, immutable collection of elements in Python. Tuples are similar to lists but with two key differences:
Immutability: Once created, the elements of a tuple cannot be changed, added, or removed.
Syntax: Tuples are defined using parentheses
()
rather than square brackets[]
. )
Counting specific element in Numpy
np.count_nonzero()
, this function count the number of occurrences of a specific value in an array. (normally it’s counts non-zero elements)arr = np.array([0,0,1, 2, 3, 1, 1, 4, 1]) np.count_nonzero(arr) # output # 7
Also, You can use the
np.count_nonzero()
function to count the number of occurrences of a specific value in an array.
count_ones = np.count_nonzero(arr == 1)
# count all the elements with '1' value
print(count_ones)
# Output: 4
Find indices of non-zero elements
.nonzero()
method is used to find the indices of non-zero elements in an array. It returns a tuple of arrays, one for each dimension of the input array, containing the indices of the elements that are non-zero.
# With 1D array
arr = np.array([0, 2, 0, 3, 0, 4])
print(np.nonzero(arr))
# Output: (array([1, 3, 5]),)
# the non-zero values are located at indices 1, 3, and 5.
Now This is not the all, there are many methods and attributes that are not cover in this blog, So you can see all other methods and attributes in this official website of numpy -