L8: Numpy

Bogdan G. Popescu

bogdan.popescu@johncabot.edu

John Cabot University

Introduction to NumPy

Characteristics

Numpy is a library that provides the tools to work with high performance multidimensional arrays

Characteristics

offers a powerful tool to manipulate multidimensional arrays
provides tools to perform mathematical and logical operations including linear algebra
is more memory efficient and executes numerical data operations quicker

The first thing we do is to import the relevant library:

import numpy as np

Introduction to NumPy

What is an Array?

An array is a structure for storing and retrieving data.

For instance, if each element of the data were a number, we might visualize a “one-dimensional” array like a list:

A two-dimensional array would be like a table:

1	2	3	4	5
6	7	8	9	10
11	12	13	14	15

Introduction to NumPy

What is an Array?

A three-dimensional array would be like a set of tables, perhaps stacked as though they were printed on separate pages.

In NumPy, this idea is generalized to an arbitrary number of dimensions, and so the fundamental array class is called ndarray

It represents an “N-dimensional array”.

Introduction to NumPy

NumPy Restrictions

Most NumPy arrays have some restrictions.

All elements of the array must be of the same type of data.
Once created, the total size of the array can’t change.
The shape must be “rectangular”:

Introduction to NumPy

Arrays

The easiest way to create an array is to use np.array

For example, we can create an array below:

a = np.array([[0,1,2,3],[3,2,1,0],[1,1,1,1]])  
a

array([[0, 1, 2, 3],
       [3, 2, 1, 0],
       [1, 1, 1, 1]])

Here are operations that can tell us more information about these arrays:

shape: tuple with the array dimensions
ndim: number of dimensions of the array
size: number of elements of the array
dtype: type of data of array

Introduction to NumPy

Array Example

This was our original array

array([[0, 1, 2, 3],
       [3, 2, 1, 0],
       [1, 1, 1, 1]])

Let us see how they work in practice

a.shape

(3, 4)

a.ndim

a.size

a.dtype

dtype('int64')

Performance Comparison: NumPy Arrays vs. Lists

One of the main advantages of NumPy over native Python lists is performance.

NumPy arrays are implemented in C and optimized for high-performance computation

Python lists are more general and flexible but can be slower for numerical tasks.

Let’s compare the time taken to perform an element-wise addition on a large array using both Python lists and NumPy arrays.

Performance Comparison: NumPy Arrays vs. Lists

Example

Let’s compare the time taken to perform an element-wise addition on a large array using both Python lists and NumPy arrays.

import numpy as np
import time

# Create large arrays and lists with one million obs.
array_size = 10**6
python_list = list(range(array_size))
numpy_array = np.arange(array_size)

Performance Comparison: NumPy Arrays vs. Lists

Example

Let’s compare the time taken to perform an element-wise addition

# Python list addition
start = time.time()
python_result = [x + x for x in python_list]
python_time = time.time() - start
python_time

0.023259878158569336

# NumPy array addition
start = time.time()
numpy_result = numpy_array + numpy_array
numpy_time = time.time() - start
numpy_time

0.0038080215454101562

python_time, numpy_time

(0.023259878158569336, 0.0038080215454101562)

Performance Comparison: NumPy Arrays vs. Lists

Why is NumPy faster?

Contiguous Memory

NumPy arrays are stored in contiguous blocks of memory. Data access is more efficient than the scattered memory layout of Python lists.

Low-Level Optimization

NumPy operations are implemented in C, allowing the use of highly optimized libraries and leveraging CPU vectorization techniques.

Vectorization

NumPy can perform operations on entire arrays without the need for explicit loops, allowing for faster execution.

Operations with arrays

Element-wise Operations in NumPy

Element-wise operations allow you to perform calculations on each corresponding element in two arrays or between an array and a scalar.

Key Examples:

Addition (+): Adds corresponding elements.
Subtraction (-): Subtracts corresponding elements.
Multiplication (*): Multiplies corresponding elements.
Division (/): Divides corresponding elements.

Operations with arrays

Element-wise Operations Examples

Here are examples of Addition and Multiplication

import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Element-wise operations
sum_result = a + b    # [5, 7, 9]
mul_result = a * b    # [4, 10, 18]

sum_result, mul_result

(array([5, 7, 9]), array([ 4, 10, 18]))

Operations with arrays

Scalar Operations in NumPy

You can also apply operations between an array and a scalar, which applies the operation to each element of the array.

# Scalar operations
scalar_add = a + 10   # [11, 12, 13]
scalar_mul = a * 2    # [2, 4, 6]

scalar_add, scalar_mul

(array([11, 12, 13]), array([2, 4, 6]))

Indexing and Slicing

Intro

We learned about slicing and indexing in the case of lists

We can use similar methods in the case of numpy arrays.

The difference here is that any change here modifies the original array.

Indexing and Slicing

Example

Let us create a new array:

a = np.arange(5)
a

array([0, 1, 2, 3, 4])

We can easily replace elements within this array

a[2:4] =10
a

array([ 0,  1, 10, 10,  4])

If we still want to keep the original array, we can make a copy

b = a.copy()
b

array([ 0,  1, 10, 10,  4])

Indexing and Slicing

Example

We can now change b

b[:2]=1
b

array([ 1,  1, 10, 10,  4])

This means that a will stay the same: see below

array([ 0,  1, 10, 10,  4])

array([ 1,  1, 10, 10,  4])

Multidimensional arrays

Examples

We can create multidimensional arrays in the following way:

We can create 8 blocks, each containing 2 rows and 2 columns

a = np.arange(32).reshape(8,2,2)
a

array([[[ 0,  1],
        [ 2,  3]],

       [[ 4,  5],
        [ 6,  7]],

       [[ 8,  9],
        [10, 11]],

       [[12, 13],
        [14, 15]],

       [[16, 17],
        [18, 19]],

       [[20, 21],
        [22, 23]],

       [[24, 25],
        [26, 27]],

       [[28, 29],
        [30, 31]]])

Multidimensional arrays

Column Extraction

We can extract the first column from the first block

a[0,:,0]

array([0, 2])

We can extract the second column in the first block.

a[0,:,1]

array([1, 3])

Multidimensional arrays

Column Extraction

What if we want to extract the first column in the eighth block?

a[7,:,0]

array([28, 30])

Numpy functions

Creating Default Arrays

Numpy has specific functions to create default values within an array:

zeros: creates an array only with 0’s.
ones: creates an array only with 1’s.
eye: creates an identity matrix of size n
empty: creates an uninitialized array of the specified shape and dtype.
full: create an array with specified constant value

Numpy functions

Creating Default Arrays

Let us look at some examples:

np.zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

np.ones((5,1))

array([[1.],
       [1.],
       [1.],
       [1.],
       [1.]])

np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

Universal Functions

Intro

Universal functions are functions that perform element-wise operations on data in arrays.

Generally, these types of functions are applied to each element of an array.

a = np.arange(5)
a

array([0, 1, 2, 3, 4])

b = np.sqrt(a)
b

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ])

np.exp(b)

array([1.        , 2.71828183, 4.11325038, 5.65223367, 7.3890561 ])

np.add(a,a)

array([0, 2, 4, 6, 8])

np.multiply(a,a)

array([ 0,  1,  4,  9, 16])

Other Operations

Mathematical and Statistical Methods

There are a set of mathematical functions that compute statistics on an entire array.

Functions include methods such as: sum, mean, standard deviation.

a = np.arange(9).reshape(3,3)
a

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Means

a.mean()

4.0

np.mean(a)

4.0

Other Operations

Mathematical and Statistical Methods

The previous operations have been performed on the entire matrix. It is possible to specify the axis, as shown below:

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Sum on the columns

#Basic python
a.sum(axis=0)
#Numpy method
np.sum(a, axis=0)

array([ 9, 12, 15])

Other Operations

Mathematical and Statistical Methods

The previous operations have been performed on the entire matrix. It is possible to specify the axis, as shown below:

Sum on the rows

#Basic python
a.sum(axis=1)
#Numpy method
np.sum(a, axis=1)

array([ 3, 12, 21])

Other Operations

Exercises

Let us imagine that we have a small dataset representing student scores in different subjects.

# Columns: Math, Physics, Chemistry, Biology
[[85, 78, 92, 88],
  [79, 85, 86, 91],
  [90, 88, 76, 85],
  [70, 80, 85, 89],
  [88, 92, 90, 94]]

Tasks:

Create a numpy array from the given dataset.
Compute the mean score for each subject.
Identify the highest scores in each subject.
Determine the highest score across all subjects.

Other Operations

Exercise Answers

# Task 1: Create the numpy array
scores = np.array([[85, 78, 92, 88],
    [79, 85, 86, 91],
    [90, 88, 76, 85],
    [70, 80, 85, 89],
    [88, 92, 90, 94]])

# Task 2: Compute the mean score for each subject
mean_scores = scores.mean(axis=0)
print("Mean scores for each subject:", mean_scores)

Mean scores for each subject: [82.4 84.6 85.8 89.4]

# Task 3: Identify the highest and lowest scores in each subject
highest_scores = scores.max(axis=0)
lowest_scores = scores.min(axis=0)
print("Highest scores for each subject:", highest_scores)

Highest scores for each subject: [90 92 92 94]

# Task 4: Determine the highest score across all subjects.
highest_scores = scores.max()
highest_scores

Common Errors in NumPy

1. Shape Mismatch Error

While working with NumPy, you may encounter some common errors.

Frequent issues: when the shapes of arrays do not match for operations like addition, multiplication, or broadcasting.

NumPy requires that arrays have the same shape.

Example:

a = np.array([1, 2, 3])
b = np.array([[1, 2, 3], [4, 5, 6]])

# Trying to add arrays with incompatible shapes
try:
    result = a + b
except ValueError as e:
    print(e)

Common Errors in NumPy

1. Shape Mismatch Error

Fix: Ensure that arrays are of compatible shapes or reshape them using .reshape() or .expand_dims() for broadcasting.

Common Errors in NumPy

2. Type Error

NumPy is strict about data types (dtype).

Sometimes, operations fail if the types are incompatible (e.g., trying to multiply a string array with an integer).

a = np.array(['1', '2', '3'])
try:
    result = a * 2
except TypeError as e:
    print(e)

ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U1'), dtype('int64')) -> None

Fix: Convert the array to the correct type using .astype().

a = a.astype(int)
result = a * 2
result

array([2, 4, 6])

Common Errors in NumPy

3. Index Error

NumPy raises an IndexError if you try to access an element outside the bounds of the array.

a = np.array([1, 2, 3])
try:
    print(a[5])
except IndexError as e:
    print(e)

index 5 is out of bounds for axis 0 with size 3

Fix: Ensure that you access elements within the valid range of indices.