Numpy for Data Analysis (Learn what Matters)

A Complete Guide to Using Numpy in Python for Data Analysis

Numpy for Data Analysis (Learn what Matters)

Numpy

NumPy is a powerful Python library essential for numerical computing, especially useful in scientific and data analysis tasks. It provides a flexible, efficient array object called ndarray, which allows for fast operations on large arrays and matrices of numeric data. NumPy arrays are more efficient than Python lists for numerical operations.

Installing

For installing numpy in your computer, create a file, open windows powershell there and run this command

pip install numpy

(Then install python and Jupyter Notebook if it’s not installed in your computer

for installing Jupyter Notebook, run this command in your terminal

pip install jupyter

for installing python go to this website https://www.python.org/downloads/ & dowload it.

Now to launch the jupyter notebook, run this

jupyter notebook

Now click on the ‘New’ option and select ‘Python 3’ from that drop down

Why we use Numpy

  • Numpy saves space by storing numbers efficiently.

  • Numpy provides ready-made functions like argmax and argmin.

  • Numpy allows flexible data types.

  • Numpy is fast, making data processing quicker.

  • Numpy is easy to learn and provides good memory management.

    (Why we use Jupyter Notebook

  • Jupyter Notebook is an open-source tool for data analysis, providing visualizations and live coding.

  • Jupyter Notebook supports multiple programming languages and allows easy sharing of notebooks.

  • for learning more about Jupyter Notebook, visit my blogn post about this notebook → https://sutto.hashnode.dev/jupyter-notebook-basics-learn-what-you-need )

Using Numpy

Importing Numpy -

import numpy as np

creating an array -

myarr = np.array([2, 45, 67,90])
myarr

accessing a perticular element with index number

myarr[1]

Checking Datatypes

myarr.dtype
# output - dtype('int64')

output - dtype('int64')

changing a value in array

myarr[1] = 78

Checking shape/structure of the array

myarr.shape
# output - (3,4)

Checking the size of the matrix

myarr.size
# output - [12]

Some Numpy Datatypes -

Integer Types

Floating Point Types

  • np.float16: Half precision float

  • np.float32: Single precision float

  • np.float64: Double precision float

Boolean Type

  • np.bool_: Boolean (True or False)

String Types

  • np.string_: Fixed-size ASCII string

Object Type

  • np.object_: For generic Python objects. Useful for arrays of mixed data types or arbitrary Python objects.

Date and Time Types

  • np.datetime64: Date and time (supports various granularities, e.g., year, month, day, hour)

  • np.timedelta64: Differences in dates and times

Additional Types

  • np.void: Used for raw data types and structured arrays

Numpy Axis

In NumPy, axes refer to the directions along which operations can be performed on an array.

  • Axis 0 refers to the first dimension (often the rows).

  • Axis 1 refers to the second dimension (often the columns).

  • Higher dimensions follow similarly with Axis 2, Axis 3, and so on.

For 1D array (matrix), there will be only Axis 0

For a 2D array (matrix), the axes will be: Axis 0 (rows) & Axis 1 (columns)

For a 3D array, the axes extend to a third dimension, where: Axis 0 (depth (or the "first level" of the 3D array)), Axis 1 (rows) & Axis 2 (columns)

a = [[1,2,3], [4,5,6],[7,8,9]]
arr = np.array(a)
print(arr)

Summing Across an Axis

  1. Sum along Axis 0 (Rows)
sum_axis_0 = arr.sum(axis=0)
print(sum_axis_0)
# output - 
# array([12, 15, 18])
  1. Sum along Axis 1 (Columns)

     sum_axis_1 = arr.sum(axis=1)
     print(sum_axis_1)
     # output -
     # array([ 6, 15, 24])
    

Methods of creating Arrays in Numpy

There are 5 general mechanisms for creating arrays:

  • Converting from Python Data Structures: Creating arrays by converting other Python structures like lists or tuples.

    ex-

      list_data = [1, 2, 3, 4, 5]
      array_from_list = np.array(list_data)
      print("Array from list:", array_from_list)
    

Built-in NumPy Array Creation Functions: Using NumPy's intrinsic functions, such as arange, ones, and zeros, to create arrays directly.

ex of -

np.arange() - Array with a Range of Values

import numpy as np

arr = np.arange(0, 10, 2)  # Starts at 0, ends before 10, step size of 2
print("np.arange:", arr)
# Output: [0 2 4 6 8]
arr2 = np.arange(34)
# output - 
 # array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
  #     17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33])

np.ones() - Array Filled with Ones

arr = np.ones((3, 3))  # 3x3 array filled with 1
print("np.ones:\n", arr)
# Output:
# [[1. 1. 1.]
#  [1. 1. 1.]
#  [1. 1. 1.]]

np.zeros() - Array Filled with Zeros

arr = np.zeros((2, 4))  # 2x4 array filled with 0
print("np.zeros:\n", arr)
# Output:
# [[0. 0. 0. 0.]
#  [0. 0. 0. 0.]]

np.full() - Array with a Specific Constant Value

arr = np.full((2, 2), 7)  # 2x2 array filled with the value 7
print("np.full:\n", arr)
# Output:
# [[7 7]
#  [7 7]]

np.eye() - Identity Matrix

arr = np.eye(3)  # 3x3 identity matrix
print("np.eye:\n", arr)
# Output:
# [[1. 0. 0.]
#  [0. 1. 0.]
#  [0. 0. 1.]]

np.linspace() - Array of Linearly Spaced Values

arr = np.linspace(0, 1, 5)  # 5 values from 0 to 1, inclusive (equally sapced)
print("np.linspace:", arr)
# Output: [0.   0.25 0.5  0.75 1.  ]

np.diag() - Array with Specified Diagonal Values

arr = np.diag([1, 2, 3])  # Diagonal array with 1, 2, 3
print("np.diag:\n", arr)
# Output:
# [[1 0 0]
#  [0 2 0]
#  [0 0 3]]

np.empty() - empty the array and fill the array with random elements

emp = np.empty((2,3))
print(emp)
# output - 
# array([[1.9180575e-316, 0.0000000e+000, 0.0000000e+000],
  #     [0.0000000e+000, 0.0000000e+000, 0.0000000e+000]])

np.empty_like() - reate a new array with the same shape and data type as an existing array, but without initializing any values.

em_like = np.empty_like(arr)
print(em_like)

np.identity() - gives identity matrix

ide = np.identity(3)
print(ide)
#output - 
# [[1. 0. 0.]
# [0. 1. 0.]
# [0. 0. 1.]]
  • Loading Arrays from Files: Reading arrays from disk, whether from standard formats (like .npy or .csv) or custom file formats.

  • Creating Arrays from Raw Data: Initializing arrays from raw bytes using strings or buffers.

  • Using Specialized Library Functions: Leveraging functions from libraries like random to generate arrays with specific properties.

TWO IMPORTANT METHODES

.reshape() - change the shape of an existing array without altering its data.

arr = np.arange(18)
print(arr) 
# output -
 # array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17])

# reshaping the array 
arr.reshape(3,6)
# output - 
#  array([[ 0,  1,  2,  3,  4,  5],
 #      [ 6,  7,  8,  9, 10, 11],
  #     [12, 13, 14, 15, 16, 17]])

(Now if we try to print the ‘arr’ it will give us the previous result, to upadte that we can write in this way

arr = arr.reshape(3,6)

now it will give us the reshaped result)

.ravel() - flatten a multi-dimensional array into a 1D array. (opposite of .reshape() method

arr.ravel()
# output - array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17])

.argmax() - give the index of the maximum value in an array.

newArr = np.array([1,5,75,86,0])
newArr.argmax()
# output - np.int64(3)

.argmin() - give the index of the minimum value in an array.

newArr.argmin()
# output - np.int64(4)

.argsort() - returns the indices that would sort an array. In other words, it gives you the indices that you can use to rearrange the elements of the array in sorted order.

newArr.argsort()
# output - array([4, 0, 1, 2, 3])

Using argmax() on a 2D array (without specifying axis):

(in 2D array they 1st flattend the array, make it a 1D array, then they give the index number of maximum number in that flattend array)

arr = np.array([[1, 3, 7], [2, 5, 6]])
# Find the index of the maximum value in the flattened array
np.argmax(arr)

Using argmax() on a 2D array with axis:

arr = np.array([[1, 3, 7], [2, 5, 6]])
# Find the index of the maximum value along the first axis (columns)
np.argmax(arr, axis=0)
# Index of the maximum values along axis 0 (: [1 1 0]

np.argmax(arr, axis=1)
#Index of the maximum values along axis 1: [2 1]

Using argsort() on a 2D array

arr_2d = np.array([[3, 1, 2], [5, 4, 6]])

# Sort along axis 0 (columns)
sorted_indices_axis_0 = arr_2d.argsort(axis=0)
print(sorted_indices_axis_0)
# output - 
 # [[0 0 0]
 # [1 1 1]]


# Sort along axis 1 (rows)
sorted_indices_axis_1 = arr_2d.argsort(axis=1)
print(sorted_indices_axis_1)
#output
# [[1 2 0]
# [1 0 2]]

Attributes in Numpy

  1. .T - transpose of an array

     arr.T
     # output -
     # array([[1, 4, 7],
       #     [2, 5, 8],
        #    [3, 6, 9]])
    
  2. .flat - to access a 1D iterator over a multi-dimensional array.

     for item in arr.flat: 
         print(item)
     # output - 
     1
     2
     3
     4
     5
     6
     7
     8
     9
    
  3. .ndim - show number of dimensions

     arr.ndim
     # output - 
     # 2
    

there is more like (.size, .shape, .dtype , .nbytes (how much bytes the array is consuming), etc)

Matrix operation on Numpy

  • Matrix Addition

    In addition of matrix, The matrices must have the same dimensions (i.e., the same number of rows and columns).

    The addition is performed element-wise, meaning each element in one matrix is added to the element in the same position in the other matrix.

# Define two matrices (2x3)
A = np.array([[1, 2, 3],
              [4, 5, 6]])

B = np.array([[7, 8, 9],
              [10, 11, 12]])

# Perform matrix addition
print(A + B)

#output
# [[ 8 10 12]
# [14 16 18]]

(But if we try to do this without numpy, it will just increase the element numbers

# witout numpy(np)
print([56,8] + [2,9])

#output
# [56, 8, 2, 9]
  • Matrix Multiplication

    Matrix multiplication in NumPy can be done in two ways:

    1. Element-wise multiplication (using * or np.multiply()), which multiplies corresponding elements.

    2. Matrix multiplication (using @ or np.dot()), which follows the rules of linear algebra. [dot multiplication]

  • In Element-wise multiplication each element in one matrix is multiplied by the corresponding element in the other matrix.

      print(A * B)
    
      # output
      #  [[ 7 16 27]
      #  [40 55 72]]
    
  • Matrix multiplication (Dot multiplication) follows the rules of linear algebra. For matrix multiplication, the number of columns in the first matrix must match the number of rows in the second matrix. The result has dimensions (rows of A, columns of B).

      # Perform matrix multiplication
      C = A @ B
      print(C)
       #output
      #  [[ 58  64]
      #  [139 154]]
    

    (

  • First row, first column: (1*7) + (2*9) + (3*11) = 58

  • First row, second column: (1*8) + (2*10) + (3*12) = 64

  • Second row, first column: (4*7) + (5*9) + (6*11) = 139

Second row, second column: (4*8) + (5*10) + (6*12) = 154

)

  • SquareRoot of matrix

    In NumPy, we can calculate the square root of each element in a matrix using np.sqrt().

# Define a matrix
A = np.array([[4, 9, 16],
              [25, 36, 49]])

# Compute the element-wise square root
print( np.sqrt(A))

#output
# [[2. 3. 4.]
# [5. 6. 7.]]

Sum of every element in Matrix

.sum() is a method that calculates the sum of elements along a specified axis of an array. It can be applied to an entire array or along specific rows or columns.

  1. Sum of All Elements
# Define a 2D array
arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

total_sum = arr.sum()
print("Sum of all elements:", total_sum)

#output
# Sum of all elements: 45
# 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 = 45.
  1. Sum of ROWS (Across the Columns or through the Columns)(axis=0) (one extra row will be added)

     print(arr.sum(axis=0))
     #output
     # [12 15 18]
    
    • First column: 1 + 4 + 7 = 12

    • Second column: 2 + 5 + 8 = 15

    • Third column: 3 + 6 + 9 = 18

  2. Sum of COLUMNS (Across the Rows or through the Rows)

    (axis=1) (one extra column will be added)

     print(arr.sum(axis=1))
     # output
     # [ 6 15 24]
    

    Explanation:

    • First row: 1 + 2 + 3 = 6

    • Second row: 4 + 5 + 6 = 15

    • Third row: 7 + 8 + 9 = 24

Minimun and Maximum element in matrix

  1. .min() Method - find the minimum value of a martix

     arr = np.array([[1, 2, 3],
                     [4, 0, 1],
                     [5, 0, 3]])
     print(arr.min())
     # output
     # 0
    
    1. .max() Method - find the maximum value of a martix

       print(arr.max())
       # output
       # 5
      
  • Find specific element in Matrix .where()

    the .where() function is a powerful tool for selecting elements based on conditions. It allows you to find indices where a condition is True and optionally replace values based on conditions.

      arr = np.array([[1, 2, 3],
                      [4, 0, 1],
                      [5, 0, 3]])
    
      # Find indices where elements are greater than 3
      result = np.where(arr > 3)
      print(result)
    
      # output 
      # (array([1, 2]), array([0, 0]))
    

    Explanation:

    • This output shows that elements 4 and 5 are greater than 3 at positions (1, 0) and (2, 0).

    • array([1, 2]): Row indices where elements are greater than 3.

      • Row 1 contains 4.

      • Row 2 contains 5.

    • array([0, 0]): Column indices where elements are greater than 3.

      • Column 0 for both elements.

( Now,

type(np.where(ar2>3))
# output 
# tuple )

A TUPLE is an ordered, immutable collection of elements in Python. Tuples are similar to lists but with two key differences:

  1. Immutability: Once created, the elements of a tuple cannot be changed, added, or removed.

  2. Syntax: Tuples are defined using parentheses () rather than square brackets []. )

  • Counting specific element in Numpy

    np.count_nonzero(), this function count the number of occurrences of a specific value in an array. (normally it’s counts non-zero elements)

      arr = np.array([0,0,1, 2, 3, 1, 1, 4, 1])
      np.count_nonzero(arr)
    
      # output
      # 7
    

    Also, You can use the np.count_nonzero() function to count the number of occurrences of a specific value in an array.

count_ones = np.count_nonzero(arr == 1)
# count all the elements with '1' value
print(count_ones)  
# Output: 4
  • Find indices of non-zero elements

    .nonzero() method is used to find the indices of non-zero elements in an array. It returns a tuple of arrays, one for each dimension of the input array, containing the indices of the elements that are non-zero.

# With 1D array
arr = np.array([0, 2, 0, 3, 0, 4])
print(np.nonzero(arr))
 # Output: (array([1, 3, 5]),)
# the non-zero values are located at indices 1, 3, and 5.

Now This is not the all, there are many methods and attributes that are not cover in this blog, So you can see all other methods and attributes in this official website of numpy -

Numpy array methods and attributes