# Lecture 12: NumPy Arrays and Vectorized Computing
## Introduction to NumPy Arrays

**Learning Objectives:**
- Understand why NumPy arrays are essential for numerical computing
- Master array creation and basic operations
- Learn array indexing and slicing techniques
- Apply vectorized operations for efficient computation
- Understand broadcasting and array shapes
- Build practical applications using NumPy arrays

## Setup and Imports

NumPy is Python's fundamental package for scientific computing. It provides fast, efficient array operations that form the foundation of data science and machine learning. Think of NumPy arrays as supercharged lists optimized for mathematical operations.

In [None]:
# Import NumPy - the foundation of numerical computing in Python
import numpy as np
import time  # For performance comparisons

# Check NumPy version
print(f"NumPy version: {np.__version__}")

## Part 1: Why NumPy Arrays Matter

NumPy arrays are like athletic versions of Python lists - they're faster, more efficient, and designed specifically for mathematical operations. While lists are versatile containers that can hold any type of data, NumPy arrays are specialized for numerical data and provide dramatic performance improvements. Let's see why they're essential for data science.

In [None]:
# Performance comparison: Lists vs Arrays
size = 1000000  # One million elements

# Using Python lists
list_data = list(range(size))
start_time = time.time()
list_result = [x * 2 for x in list_data]  # Double each element
list_time = time.time() - start_time

print(f"List operation time: {list_time:.4f} seconds")

In [None]:
# Using NumPy arrays
array_data = np.arange(size)
start_time = time.time()
array_result = array_data * 2  # Vectorized operation - no loops!
array_time = time.time() - start_time

print(f"Array operation time: {array_time:.4f} seconds")
print(f"NumPy is {list_time/array_time:.0f}x faster!")

### Exercise 1.1: Speed Test
Try calculating the square of each element in a list vs array with 500,000 elements.

In [None]:
# Your code here
# Create a list and array with 500,000 elements
# Time squaring each element
# Compare the results

In [None]:
# Solution
test_size = 500000

# List approach
test_list = list(range(test_size))
start = time.time()
squared_list = [x**2 for x in test_list]
list_square_time = time.time() - start

# Array approach
test_array = np.arange(test_size)
start = time.time()
squared_array = test_array**2  # Vectorized squaring
array_square_time = time.time() - start

print(f"List squaring: {list_square_time:.4f} seconds")
print(f"Array squaring: {array_square_time:.4f} seconds")
print(f"Speedup: {list_square_time/array_square_time:.0f}x")

## Part 2: Creating NumPy Arrays

NumPy provides multiple ways to create arrays, each suited to different needs. Just like you might choose different containers for different purposes (a toolbox for tools, a lunchbox for food), you'll choose different array creation methods based on your data source and requirements.

In [None]:
# Creating arrays from lists
grades = [85, 92, 78, 95, 88]
grade_array = np.array(grades)
print(f"Grade array: {grade_array}")
print(f"Type: {type(grade_array)}")
print(f"Data type: {grade_array.dtype}")

In [None]:
# Creating arrays with initialization functions
zeros = np.zeros(5)  # Array of zeros
ones = np.ones(10)   # Array of ones
empty = np.empty(3)  # Uninitialized array (faster but contains garbage)

print(f"Zeros: {zeros}")
print(f"Ones: {ones}")
print(f"Empty: {empty}")

In [None]:
# Creating sequences with arange and linspace
# arange: like range() but returns an array
counting = np.arange(0, 10)  # 0 to 9
evens = np.arange(0, 20, 2)  # Even numbers 0 to 18

# linspace: evenly spaced values between start and end
linear = np.linspace(0, 1, 5)  # 5 values from 0 to 1

print(f"Counting: {counting}")
print(f"Evens: {evens}")
print(f"Linear space: {linear}")

### Exercise 2.1: Array Creation
Create arrays for the following:
1. An array of 10 zeros
2. An array of odd numbers from 1 to 19
3. An array of 6 evenly spaced values from 0 to 100

In [None]:
# Your code here

In [None]:
# Solution
# 1. Array of 10 zeros
ten_zeros = np.zeros(10)
print(f"Ten zeros: {ten_zeros}")

# 2. Odd numbers from 1 to 19
odds = np.arange(1, 20, 2)
print(f"Odd numbers: {odds}")

# 3. Six evenly spaced values from 0 to 100
spaced_values = np.linspace(0, 100, 6)
print(f"Evenly spaced: {spaced_values}")

## Part 3: Array Properties and Data Types

NumPy arrays have several important properties that describe their structure and content. Understanding these properties is like knowing the specifications of a tool - it helps you use the array effectively and avoid errors.

In [None]:
# Array properties
sample_array = np.array([1, 2, 3, 4, 5, 6])

print(f"Array: {sample_array}")
print(f"Shape: {sample_array.shape}")      # Dimensions of the array
print(f"Size: {sample_array.size}")        # Total number of elements
print(f"Dimensions: {sample_array.ndim}")  # Number of dimensions
print(f"Data type: {sample_array.dtype}")  # Type of elements
print(f"Item size: {sample_array.itemsize} bytes")  # Bytes per element

In [None]:
# Specifying data types
integers = np.array([1, 2, 3, 4], dtype=np.int32)
floats = np.array([1, 2, 3, 4], dtype=np.float64)
booleans = np.array([True, False, True], dtype=bool)

print(f"Integers: {integers}, dtype: {integers.dtype}")
print(f"Floats: {floats}, dtype: {floats.dtype}")
print(f"Booleans: {booleans}, dtype: {booleans.dtype}")

In [None]:
# Type conversion
int_array = np.array([1, 2, 3, 4])
float_array = int_array.astype(float)  # Convert to float

print(f"Original: {int_array}, dtype: {int_array.dtype}")
print(f"Converted: {float_array}, dtype: {float_array.dtype}")

### Exercise 3.1: Array Properties
Create an array of temperatures [72.5, 68.3, 75.9, 71.2] and display all its properties.

In [None]:
# Your code here

In [None]:
# Solution
temps = np.array([72.5, 68.3, 75.9, 71.2])

print(f"Temperature array: {temps}")
print(f"Shape: {temps.shape}")
print(f"Size: {temps.size}")
print(f"Dimensions: {temps.ndim}")
print(f"Data type: {temps.dtype}")
print(f"Bytes per element: {temps.itemsize}")
print(f"Total bytes: {temps.nbytes}")

## Part 4: Array Indexing and Slicing

Accessing elements in NumPy arrays works similarly to lists, but with additional powerful features. It's like having a more sophisticated addressing system - not just house numbers, but also the ability to select entire neighborhoods or specific patterns of houses.

In [None]:
# Basic indexing
scores = np.array([85, 92, 78, 95, 88, 76, 91])

print(f"All scores: {scores}")
print(f"First score: {scores[0]}")
print(f"Last score: {scores[-1]}")
print(f"Third score: {scores[2]}")

In [None]:
# Slicing arrays
print(f"First three scores: {scores[:3]}")
print(f"Middle scores: {scores[2:5]}")
print(f"Every other score: {scores[::2]}")
print(f"Reversed scores: {scores[::-1]}")

In [None]:
# Boolean indexing - very powerful!
passing = scores > 80  # Create boolean mask
print(f"Passing mask: {passing}")
print(f"Passing scores: {scores[passing]}")

# Direct boolean indexing
high_scores = scores[scores >= 90]
print(f"High scores (>= 90): {high_scores}")

### Exercise 4.1: Array Selection
Given an array of daily temperatures, select:
1. All temperatures above 75°F
2. The first 3 days
3. Every other day's temperature

In [None]:
# Your code here
daily_temps = np.array([68, 72, 75, 78, 82, 79, 71, 69, 76, 80])

In [None]:
# Solution
daily_temps = np.array([68, 72, 75, 78, 82, 79, 71, 69, 76, 80])

# 1. Temperatures above 75°F
hot_days = daily_temps[daily_temps > 75]
print(f"Temperatures above 75°F: {hot_days}")

# 2. First 3 days
first_three = daily_temps[:3]
print(f"First 3 days: {first_three}")

# 3. Every other day
alternating = daily_temps[::2]
print(f"Every other day: {alternating}")

## Part 5: Vectorized Operations

Vectorized operations are NumPy's superpower. Instead of writing loops to process each element, you can operate on entire arrays at once. It's like the difference between painting a fence one plank at a time versus using a spray gun - the same result, but much faster and more efficient.

In [None]:
# Arithmetic operations on arrays
prices = np.array([10.99, 25.50, 8.75, 15.00, 32.99])

# Apply 20% discount to all prices at once
discounted = prices * 0.8
print(f"Original prices: ${prices}")
print(f"After 20% discount: ${discounted}")

In [None]:
# Multiple array operations
quantities = np.array([2, 1, 3, 2, 1])
total_cost = prices * quantities  # Element-wise multiplication

print(f"Prices: ${prices}")
print(f"Quantities: {quantities}")
print(f"Total cost per item: ${total_cost}")
print(f"Grand total: ${total_cost.sum():.2f}")

In [None]:
# Mathematical functions
angles = np.array([0, 30, 45, 60, 90])  # Degrees
radians = np.radians(angles)  # Convert to radians
sines = np.sin(radians)  # Calculate sine values

print(f"Angles (degrees): {angles}")
print(f"Sine values: {np.round(sines, 3)}")

### Exercise 5.1: Temperature Conversion
Convert an array of Fahrenheit temperatures to Celsius using the formula: C = (F - 32) * 5/9

In [None]:
# Your code here
fahrenheit = np.array([32, 68, 86, 104, 212])

In [None]:
# Solution
fahrenheit = np.array([32, 68, 86, 104, 212])

# Vectorized conversion - no loops needed!
celsius = (fahrenheit - 32) * 5/9

print(f"Fahrenheit: {fahrenheit}°F")
print(f"Celsius: {celsius}°C")
print(f"Rounded: {np.round(celsius, 1)}°C")

## Part 6: Array Functions and Aggregations

NumPy provides a rich set of functions for analyzing and summarizing array data. These functions are like having a team of specialized analysts who can quickly tell you everything about your data - the average, the extremes, the spread, and more.

In [None]:
# Statistical functions
test_scores = np.array([78, 85, 92, 67, 95, 88, 73, 91, 82, 79])

print(f"Test scores: {test_scores}")
print(f"Mean: {test_scores.mean():.2f}")
print(f"Median: {np.median(test_scores):.2f}")
print(f"Standard deviation: {test_scores.std():.2f}")
print(f"Variance: {test_scores.var():.2f}")

In [None]:
# Min, max, and range
print(f"Minimum score: {test_scores.min()}")
print(f"Maximum score: {test_scores.max()}")
print(f"Range: {test_scores.ptp()}")  # Peak-to-peak (max - min)

# Finding positions
print(f"Position of min: {test_scores.argmin()}")
print(f"Position of max: {test_scores.argmax()}")

In [None]:
# Cumulative operations
daily_sales = np.array([120, 150, 180, 95, 200, 175, 160])
cumulative_sales = daily_sales.cumsum()  # Running total

print(f"Daily sales: ${daily_sales}")
print(f"Cumulative: ${cumulative_sales}")
print(f"Total sales: ${daily_sales.sum()}")

### Exercise 6.1: Grade Analysis
Analyze an array of student grades to find:
1. The class average
2. The highest and lowest grades
3. How many students passed (grade >= 70)

In [None]:
# Your code here
grades = np.array([85, 67, 92, 78, 55, 91, 73, 88, 69, 95, 82, 77])

In [None]:
# Solution
grades = np.array([85, 67, 92, 78, 55, 91, 73, 88, 69, 95, 82, 77])

# 1. Class average
average = grades.mean()
print(f"Class average: {average:.2f}")

# 2. Highest and lowest
highest = grades.max()
lowest = grades.min()
print(f"Highest grade: {highest}")
print(f"Lowest grade: {lowest}")

# 3. Students who passed
passing = grades >= 70
num_passing = passing.sum()  # True counts as 1
print(f"Students who passed: {num_passing} out of {len(grades)}")
print(f"Pass rate: {num_passing/len(grades)*100:.1f}%")

## Part 7: Random Numbers with NumPy

NumPy's random module is like a sophisticated dice-rolling system that can generate random numbers following various patterns and distributions. This is essential for simulations, testing, and machine learning applications.

In [None]:
# Set random seed for reproducibility
np.random.seed(42)

# Random integers
dice_rolls = np.random.randint(1, 7, size=10)  # 10 dice rolls
print(f"Dice rolls: {dice_rolls}")

# Random floats between 0 and 1
random_floats = np.random.random(5)
print(f"Random floats: {random_floats}")

In [None]:
# Normal distribution (bell curve)
# Generate heights with mean=170cm, std=10cm
heights = np.random.normal(170, 10, 100)

print(f"Average height: {heights.mean():.1f} cm")
print(f"Std deviation: {heights.std():.1f} cm")
print(f"Tallest: {heights.max():.1f} cm")
print(f"Shortest: {heights.min():.1f} cm")

In [None]:
# Random choice from array
colors = np.array(['red', 'blue', 'green', 'yellow', 'purple'])
random_colors = np.random.choice(colors, size=8)
print(f"Random color selection: {random_colors}")

# Shuffling an array
cards = np.arange(1, 14)  # 1 to 13 (like card values)
np.random.shuffle(cards)
print(f"Shuffled cards: {cards}")

### Exercise 7.1: Dice Simulation
Simulate rolling two dice 1000 times and calculate:
1. The average sum
2. How often you roll a 7
3. The most common sum

In [None]:
# Your code here

In [None]:
# Solution
# Simulate 1000 rolls of two dice
rolls = 1000
die1 = np.random.randint(1, 7, rolls)
die2 = np.random.randint(1, 7, rolls)
sums = die1 + die2

# 1. Average sum
avg_sum = sums.mean()
print(f"Average sum: {avg_sum:.2f}")

# 2. How often we roll a 7
sevens = (sums == 7).sum()
print(f"Rolled 7: {sevens} times ({sevens/rolls*100:.1f}%)")

# 3. Most common sum
unique_sums, counts = np.unique(sums, return_counts=True)
most_common_idx = counts.argmax()
print(f"Most common sum: {unique_sums[most_common_idx]} (occurred {counts[most_common_idx]} times)")

## Comprehensive Exercise: Student Grade Analyzer

Build a complete grade analysis system using NumPy arrays.

In [None]:
# Your task: Create a grade analysis system that:
# 1. Generates random test scores for 30 students (60-100 range)
# 2. Calculates letter grades (A: 90+, B: 80-89, C: 70-79, D: 60-69)
# 3. Finds class statistics (mean, median, std)
# 4. Identifies students who need help (below 70)
# 5. Calculates the grade distribution

In [None]:
# Solution
# 1. Generate random scores
np.random.seed(42)  # For reproducibility
num_students = 30
scores = np.random.randint(60, 101, num_students)

print(f"Test scores for {num_students} students:")
print(scores)
print()

# 2. Calculate letter grades
def get_letter_grades(scores):
    """Convert numeric scores to letter grades."""
    grades = np.empty(len(scores), dtype='U1')  # Unicode string of length 1
    grades[scores >= 90] = 'A'
    grades[(scores >= 80) & (scores < 90)] = 'B'
    grades[(scores >= 70) & (scores < 80)] = 'C'
    grades[(scores >= 60) & (scores < 70)] = 'D'
    return grades

letter_grades = get_letter_grades(scores)
print(f"Letter grades: {letter_grades}")
print()

# 3. Class statistics
print("Class Statistics:")
print(f"Mean score: {scores.mean():.2f}")
print(f"Median score: {np.median(scores):.2f}")
print(f"Standard deviation: {scores.std():.2f}")
print(f"Highest score: {scores.max()}")
print(f"Lowest score: {scores.min()}")
print()

# 4. Students needing help
struggling = scores < 70
struggling_count = struggling.sum()
print(f"Students needing help (< 70): {struggling_count}")
print(f"Their scores: {scores[struggling]}")
print()

# 5. Grade distribution
print("Grade Distribution:")
unique_grades, counts = np.unique(letter_grades, return_counts=True)
for grade, count in zip(unique_grades, counts):
    percentage = count / num_students * 100
    print(f"{grade}: {count} students ({percentage:.1f}%)")

## Summary and Key Takeaways

You've learned the fundamentals of NumPy arrays:

1. **Performance**: NumPy arrays are much faster than Python lists for numerical operations
2. **Creation**: Multiple ways to create arrays (from lists, zeros, ones, arange, linspace)
3. **Indexing**: Powerful selection capabilities including boolean indexing
4. **Vectorization**: Operate on entire arrays without writing loops
5. **Functions**: Rich set of mathematical and statistical functions
6. **Random Numbers**: Generate random data for simulations and testing

NumPy is the foundation of scientific computing in Python. These skills will be essential as you move into data science, machine learning, and scientific programming!