# Assignment 4: Sports Analytics with NumPy

**Course**: INF 605 - Introduction to Programming - Python  
**Assignment**: NumPy Data Analysis Challenge (Lectures 12-13, 15)  
**Due Date**: Sunday, November 9, 2025  
**Total Points**: 100 points

---

## Welcome to the Google Data Lab!

Congratulations! You've been selected as a data analytics intern at Google's Advanced Data Analytics Lab. Under the direction of Dr. Sarah Chen, Senior Data Scientist, you'll be helping modernize how our company analyzes performance metrics.

The lab's mission is to provide coachs and athletes with data-driven insights to improve performance, prevent injuries, and make strategic decisions. You'll be working with real performance data, applying NumPy's powerful numerical computing capabilities to solve practical problems.

Why NumPy? Traditional Python loops are slow when processing large datasets. NumPy uses highly optimized C code under the hood, making calculations 10-100 times faster. Professional sports teams around the world - from the NBA to FIFA clubs - use Python and NumPy for their analytics systems.

This assignment has **8 problems** that progressively build your NumPy skills, from basic array operations to building a complete analytics system.

### How to Complete Each Problem

1. **Read the situation** - Dr. Chen will explain what analysis is needed
2. **Review the requirements** - Understand the technical specifications
3. **Study the examples** - See exactly how your functions should work
4. **Read the hints** - Get specific guidance on NumPy functions to use
5. **Write your code** - Replace `pass` with your implementation
6. **Test thoroughly** - Run the provided test code to verify correctness
7. **Submit when ready** - Your code will be automatically graded

### Tips for Success

- **Start early**: Begin with Problems 1-2 to build confidence
- **Use NumPy functions**: Avoid Python loops - NumPy has faster built-in functions
- **Test frequently**: Run test code after implementing each function
- **Read hints carefully**: They tell you exactly which NumPy functions to use
- **Check documentation**: Use `help(np.function_name)` or online NumPy docs
- **Ask for help**: Attend office hours if you get stuck

### Important Rules

- Do not change function names or parameters
- Do not modify test code
- Import only `numpy` - no other libraries allowed
- Your code must use NumPy array operations (no Python loops where NumPy can do it)

Let's get started with your first assignment from Dr. Chen!

---

## Problem 1: Player Statistics Array Creation (10 points)

### The Situation

Dr. Chen greets you on your first day at the Data Lab. "Welcome to the team! I'm excited to have you here. Let's start with something fundamental - organizing our basketball team's scoring data."

She pulls up last night's game statistics on her computer screen. "We have the points scored by each of our eight players: 12, 8, 15, 22, 6, 18, 14, and 10 points. In traditional Python, we'd use a list to store these numbers. But here's the thing - when you're analyzing performance data, you need speed. Lists are fine for small datasets, but we're dealing with season-long statistics, multiple teams, and real-time analysis during games."

Dr. Chen opens a Jupyter notebook. "That's where NumPy comes in. NumPy arrays are specifically designed for numerical operations. They're stored more efficiently in memory, and all the mathematical operations are implemented in highly optimized C code. This means calculations that might take seconds with lists happen in milliseconds with NumPy."

She shows you a comparison: "For example, if we wanted to calculate the average of a million numbers, a Python loop would take about 1 second. The exact same calculation with NumPy takes about 5 milliseconds - that's 200 times faster! For real-time game analysis, this speed difference is critical."

"Your first task is simple but important: take our basketball scoring data and create a NumPy array. Once we have the data in array format, we can use all of NumPy's powerful functions for analysis. I also want you to understand the basic properties of arrays - their shape tells us how the data is organized, the dtype tells us what type of numbers we're storing, and the length tells us how many data points we have."

### Your Task

Create two functions to work with player statistics:

**Function 1: `create_player_stats(scores_list)`**
- Takes a Python list of integer scores as input
- Converts the list to a NumPy array
- Returns the NumPy array

**Function 2: `get_array_info(arr)`**
- Takes a NumPy array as input
- Returns a dictionary with three keys:
  - `'shape'`: The shape tuple of the array
  - `'dtype'`: The data type of the array elements
  - `'length'`: The number of elements in the array

### Examples

```python
scores = create_player_stats([12, 8, 15, 22, 6, 18, 14, 10])
print(scores)
# Output: [12  8 15 22  6 18 14 10]

info = get_array_info(scores)
print(info)
# Output: {'shape': (8,), 'dtype': dtype('int64'), 'length': 8}

print(scores[3])  # Access fourth player's score
# Output: 22
```

In [None]:
import numpy as np

# YOUR CODE HERE: Implement both functions

def create_player_stats(scores_list):
    pass  # Delete this and write your implementation

def get_array_info(arr):
    pass  # Delete this and write your implementation

# Test your code

# Test data from last night's game
game_scores = [12, 8, 15, 22, 6, 18, 14, 10]

# Create the array
scores_array = create_player_stats(game_scores)

# Get array information
info = get_array_info(scores_array)

---

## Problem 2: Basic Statistical Analysis (10 points)

### The Situation

"Excellent work!" Dr. Chen says, reviewing your array creation code. "You've got the data properly structured. Now let's do what we do best here at the lab - extract meaningful insights from the numbers."

She pulls up a coaching report template on her screen. "Every time a coach walks into this lab, they want to know three fundamental statistics about their team's performance: What's the average? Who performed the best? Who needs the most support? These are the starting points for every analysis we do."

Dr. Chen continues: "In traditional Python, you'd write a loop to sum all the values, divide by the count for the average, then write another loop to find the maximum... it gets tedious and slow. NumPy has built-in functions that do all of this instantly. `np.mean()` calculates the average in one line. `np.max()` and `np.min()` find the highest and lowest values immediately. And here's a cool trick - `np.argmax()` and `np.argmin()` don't just tell you the maximum value, they tell you WHERE it is in the array - which player index scored the most or least."

She shows you last week's report: "The basketball coach wants to know: What's the team's average points per player? This tells us if we're distributing scoring well across the team or relying too heavily on one or two stars. Who's our top scorer? They might deserve recognition or extra defensive attention from opponents. Who scored the least? Maybe they need additional practice time or are having an off game."

"One more thing," Dr. Chen adds, "always round your averages to 2 decimal places. Coaches don't need to see 13.125 - they want 13.13. It's cleaner and more professional in reports."

### Your Task

Implement `calculate_basic_stats(scores)` that:
- Takes a NumPy array of scores as input
- Calculates the mean (average) score
- Finds the maximum score
- Finds the minimum score
- Finds the index of the top-scoring player
- Finds the index of the lowest-scoring player
- Returns all statistics in a dictionary

### Examples

```python
scores = np.array([12, 8, 15, 22, 6, 18, 14, 10])
stats = calculate_basic_stats(scores)

print(stats)
# Output:
# {
#     'mean': 13.12,
#     'max': 22,
#     'min': 6,
#     'top_player_index': 3,
#     'lowest_player_index': 4
# }
```

In [None]:
# YOUR CODE HERE: Implement the statistical analysis function

def calculate_basic_stats(scores):
    pass  # Delete this and write your implementation

# Test your code

# Use the same game data
game_scores = np.array([12, 8, 15, 22, 6, 18, 14, 10])

stats = calculate_basic_stats(game_scores)

# Verify the actual player scores at those indices


---

## Problem 3: Team Season Records Analysis (15 points)

### The Situation

Dr. Chen walks over to your desk with a laptop and a look of urgency. "We have a meeting with the women's soccer coach in 30 minutes, and she needs a season performance analysis. I've got her team's game-by-game data right here."

She shows you a spreadsheet with 10 rows, each representing a game. "For each game, we tracked goals scored and goals allowed. The coach wants to understand the season at a glance: What's their goal differential for each game? Did they dominate, barely win, or struggle? How many total goals did they score versus allow? Most importantly - what's their win-loss-tie record?"

Dr. Chen continues: "This is where 2D arrays become powerful. Think of each row as a game, and each column as a statistic. The first column is goals scored, the second is goals allowed. With NumPy, we can slice out entire columns in one operation. `games[:, 0]` gives us all the goals scored - that colon-zero means 'all rows, first column'. Then we can do array arithmetic: subtract goals allowed from goals scored to get the differential for every game simultaneously. No loops needed!"

She sketches on a notepad: "A positive differential means they won - they scored more than they allowed. Negative means they lost. Zero means a tie. We can use boolean operations to count these automatically. In NumPy, when you do `differential > 0`, it creates a True/False array. Then `np.sum()` counts how many True values there are - that's your win count! Same logic for losses and ties."

"The soccer coach also wants to know their offensive and defensive performance. Sum up the first column to get total goals scored - that's offensive power. Sum the second column for total goals allowed - that's defensive vulnerability. These aggregate statistics help identify whether they need to work on scoring more goals or preventing goals."

### Your Task

Implement `analyze_season_records(games_data)` that:
- Takes a list of lists, where each inner list is `[goals_scored, goals_allowed]`
- Converts the data to a NumPy 2D array
- Extracts columns for goals scored and goals allowed
- Calculates the goal differential for each game (scored - allowed)
- Calculates total goals scored and total goals allowed for the season
- Counts wins (differential > 0), losses (differential < 0), and ties (differential == 0)
- Returns all results in a dictionary

### Examples

```python
games = [[3, 1], [2, 2], [1, 0], [2, 3], [4, 1], 
         [1, 1], [3, 2], [0, 2], [2, 1], [3, 0]]

results = analyze_season_records(games)
print(results)
# Output:
# {
#     'differential': array([2, 0, 1, -1, 3, 0, 1, -2, 1, 3]),
#     'total_scored': 21,
#     'total_allowed': 13,
#     'wins': 6,
#     'losses': 2,
#     'ties': 2
# }
```

In [None]:
# YOUR CODE HERE: Implement the season records analysis function

def analyze_season_records(games_data):
    pass  # Delete this and write your implementation

# Test your code

# Women's soccer season data (10 games)
# Each game: [goals_scored, goals_allowed]
season_games = [
    [3, 1],   # Game 1: Won 3-1
    [2, 2],   # Game 2: Tied 2-2
    [1, 0],   # Game 3: Won 1-0
    [2, 3],   # Game 4: Lost 2-3
    [4, 1],   # Game 5: Won 4-1
    [1, 1],   # Game 6: Tied 1-1
    [3, 2],   # Game 7: Won 3-2
    [0, 2],   # Game 8: Lost 0-2
    [2, 1],   # Game 9: Won 2-1
    [3, 0]    # Game 10: Won 3-0
]

results = analyze_season_records(season_games)

# Calculate winning percentage
total_games = results['wins'] + results['losses'] + results['ties']
win_pct = (results['wins'] / total_games) * 100


---

## Problem 4: Player Performance Matrix Reshaping (15 points)

### The Situation

Dr. Chen brings you a new challenge from the baseball team. "The baseball coach just emailed us their batting statistics, but there's a problem - the data came in the wrong format. It's all in one long list, and we need it organized as a matrix."

She shows you the email: "We have 9 players on the team, and we're tracking 4 statistics for each player: hits, runs, RBIs, and batting average. That's 36 numbers total (9 players times 4 stats). But the data export gave us one continuous array of 36 values. We need to reshape this into a proper 9 by 4 matrix where each row represents a player and each column represents a statistic."

Dr. Chen pulls up her teaching notes. "This is a perfect use case for NumPy's `reshape()` function. Think of an array's data like a ribbon - it's stored linearly in memory. Reshaping is like folding that ribbon into rows and columns. The key is that the total number of elements must match. 9 players times 4 stats equals 36 elements, so we can reshape our 36-element array into a 9-by-4 matrix."

She continues: "Once we have the matrix properly shaped, we can do powerful analysis. We can calculate the average of each column to see team-wide stats - like average hits across all players. We can also calculate row averages to see each player's overall performance across all four categories. The `axis` parameter in NumPy is crucial here: axis=0 means 'down the columns' (giving us averages per statistic), and axis=1 means 'across the rows' (giving us averages per player)."

"There's also this operation called transpose," Dr. Chen explains, showing a diagram. "If we transpose our 9-by-4 matrix, it becomes 4-by-9. This flips rows and columns - now each row is a statistic and each column is a player. Sometimes you need data in one orientation, sometimes the other, depending on what analysis you're doing. NumPy makes this trivial with the `.T` attribute."

### Your Task

Implement `organize_batting_stats(flat_data)` that:
- Takes a list of 36 numbers (9 players × 4 statistics)
- Reshapes it into a 9×4 matrix (players as rows, stats as columns)
- Calculates the average of each statistic (mean along axis 0)
- Calculates the average for each player (mean along axis 1)
- Finds which player has the highest overall average
- Transposes the matrix to 4×9
- Returns all results in a dictionary

### Examples

```python
# 36 numbers: 9 players × 4 stats each
data = [45, 23, 38, 0.312,  # Player 0: hits, runs, RBIs, avg
        52, 28, 42, 0.325,  # Player 1
        ...
        40, 20, 35, 0.298]  # Player 8

result = organize_batting_stats(data)
# Returns:
# {
#     'matrix': array([[45, 23, 38, 0.312],
#                      [52, 28, 42, 0.325],
#                      ...]),  # 9×4
#     'avg_per_stat': array([...]),      # 4 averages
#     'avg_per_player': array([...]),    # 9 averages
#     'best_player_idx': 3,               # index of best player
#     'transposed': array([[...], ...])   # 4×9
# }
```

In [None]:
# YOUR CODE HERE: Implement the batting statistics organizer

def organize_batting_stats(flat_data):
    pass  # Delete this and write your implementation

# Test your code

# Batting statistics for 9 players × 4 stats
# Stats: [Hits, Runs, RBIs, Batting Average]
batting_data = [
    45, 23, 38, 0.312,  # Player 0
    52, 28, 42, 0.325,  # Player 1
    38, 19, 31, 0.285,  # Player 2
    61, 35, 48, 0.342,  # Player 3 (best overall)
    42, 21, 36, 0.301,  # Player 4
    48, 25, 40, 0.318,  # Player 5
    35, 17, 28, 0.276,  # Player 6
    55, 30, 45, 0.335,  # Player 7
    40, 20, 33, 0.295   # Player 8
]

result = organize_batting_stats(batting_data)

stat_names = ['Hits', 'Runs', 'RBIs', 'Batting Avg']
for i, (stat, avg) in enumerate(zip(stat_names, result['avg_per_stat'])):
    print(f"Team average {stat}: {avg:.2f}")

for i, avg in enumerate(result['avg_per_player']):
    print(f"Player {i} average: {avg:.2f}")


---

## Problem 5: Advanced Performance Filtering (20 points)

### The Situation

Dr. Chen calls an urgent meeting in the conference room. The team lead is present, looking concerned. "We have a critical task," the director begins. "The company is reviewing our development program and our athlete support services. We need to identify which student-athletes are excelling, which ones need academic or performance support, and everything in between."

Dr. Chen pulls up a database. "We have performance scores for 50 student-athletes across all sports, on a 0-100 scale. These scores combine performance metrics, academic standing, and leadership. The director has given us specific criteria to analyze."

The team lead lists the requirements: "First, we need to identify top performers - anyone scoring above 75. These are our top performers who deserve continued financial support. Second, find our elite athletes - the top 10 percent. They might be candidates for additional training opportunities or leadership roles. Third, flag anyone below 60. These students need immediate intervention - extra tutoring, modified training schedules, or counseling support. Fourth, there's a middle group between 70 and 85 who are doing well but have room to grow. We want to target them for development programs."

Dr. Chen turns to you: "This is where NumPy's boolean indexing becomes incredibly powerful. In traditional Python, you'd write multiple loops to filter data. With NumPy, you create what's called a boolean mask - an array of True and False values based on conditions. For example, `scores > 75` creates a mask where True appears for every score above 75. Then you can use that mask to filter the original array: `scores[scores > 75]` gives you only those high scores."

She continues: "You can also combine conditions using the `&` operator for AND and `|` for OR. But here's the tricky part - you need parentheses around each condition. So `(scores >= 70) & (scores <= 85)` finds everyone in that improvement range. NumPy's `np.where()` function is also useful - it lets you categorize values. You can create labels like 'High', 'Medium', 'Low' based on score thresholds."

"Finally," Dr. Chen adds, "we're using `np.percentile(scores, 90)` to find the 90th percentile threshold. This means 90% of athletes score below this value, and only the top 10% score at or above it. It's a robust way to identify elite performers regardless of the score distribution."

### Your Task

Implement `analyze_athlete_performance(scores)` that:
- Filters athletes scoring above 75 (top performers)
- Calculates the 90th percentile and identifies elite athletes (top 10%)
- Finds struggling athletes scoring below 60
- Identifies improvement candidates (scores between 70-85, inclusive)
- Creates categories for all athletes: 'High' (>= 75), 'Medium' (60-74), 'Low' (< 60)
- Returns comprehensive analysis in a dictionary

### Examples

```python
scores = np.array([45, 67, 89, 92, 58, 73, 81, ...])  # 50 athletes

result = analyze_athlete_performance(scores)
# Returns:
# {
#     'scholarship_count': 15,
#     'scholarship_avg': 83.4,
#     'elite_count': 5,
#     'elite_threshold': 88.0,
#     'struggling_count': 8,
#     'improvement_count': 12,
#     'categories': array(['Low', 'Medium', 'High', ...])
# }
```


In [None]:
# YOUR CODE HERE: Implement the athlete performance analyzer

def analyze_athlete_performance(scores):
    pass  # Delete this and write your implementation

# Test your code

# Generate realistic test data (50 student-athletes)
np.random.seed(42)  # For reproducible results
athlete_scores = np.random.randint(40, 100, size=50)

result = analyze_athlete_performance(athlete_scores)

unique, counts = np.unique(result['categories'], return_counts=True)
for category, count in zip(unique, counts):
    print(f"{category}: {count} athletes")

# Verify categorization
for i in range(10):
    print(f"Athlete {i}: {result['categories'][i]} (score: {athlete_scores[i]})")


---

## Problem 6: Multi-Sport Performance Comparison (20 points)

### The Situation

Dr. Chen rushes into the lab with exciting news. "The company is expanding our analytics program! We're no longer just analyzing one sport at a time - we need a unified system that works across basketball, soccer, and swimming. Each sport has completely different metrics, but we need to compare athlete performance fairly."

She pulls up three datasets on her screen: "Here's the challenge: Basketball tracks points scored per game. Soccer combines goals and assists. Swimming measures race times in seconds. These are fundamentally different - higher is better for basketball and soccer, but lower is better for swimming. How do we compare a basketball player who averages 18 points to a swimmer who completes races in 52 seconds? They're not comparable in raw form."

Dr. Chen explains the solution: "We normalize everything to a 0-100 scale. For sports where higher is better, we use standard min-max normalization: take each value, subtract the minimum, divide by the range (max minus min), and multiply by 100. This maps the worst performance to 0 and the best to 100. For swimming, where lower times are better, we invert the scale - the fastest swimmer gets 100, the slowest gets 0."

She continues: "Once normalized, we can identify the MVP for each sport. We calculate each player's average performance across all their games. For basketball and soccer, the highest average wins. For swimming, the lowest average time wins - remember, faster is better. We also want consistency metrics. Standard deviation tells us how reliable an athlete is. A player who scores 20, 18, 19, 21 is more consistent than one who scores 30, 10, 25, 15, even if their averages are similar."

"Finally," Dr. Chen adds, "we use broadcasting to compare individuals to team averages. If the team average is 15 points and a player averages 18, they're above average. Broadcasting lets NumPy compare each player's average to the team average without writing loops - it automatically expands the team average to match the array shape and does element-wise comparison."

### Your Task

Implement `analyze_multi_sport_performance(sport_data, sport_type)` that:
- Takes a 2D array (players × games) and sport type ('basketball', 'soccer', or 'swimming')
- Calculates player averages (mean across games, axis=1)
- Calculates consistency (standard deviation across games, axis=1)
- Calculates overall team average
- Normalizes all scores to 0-100 scale (invert for swimming)
- Identifies the MVP index (highest average for basketball/soccer, lowest for swimming)
- Counts how many players are above team average
- Returns comprehensive analysis dictionary

### Examples

```python
# Basketball: 12 players × 5 games (higher scores = better)
basketball = np.array([[15, 18, 16, 20, 17], ...])
result = analyze_multi_sport_performance(basketball, 'basketball')

# Swimming: 10 swimmers × 6 races (lower times = better)
swimming = np.array([[52.3, 51.8, 52.1, ...], ...])
result = analyze_multi_sport_performance(swimming, 'swimming')
```

In [None]:
# YOUR CODE HERE: Implement the multi-sport analyzer

def analyze_multi_sport_performance(sport_data, sport_type):
    pass  # Delete this and write your implementation

# Test your code with different sports

# Basketball data: 12 players × 5 games (points scored)
np.random.seed(42)
basketball_data = np.random.randint(10, 30, size=(12, 5))

b_result = analyze_multi_sport_performance(basketball_data, 'basketball')

# Soccer data: 15 players × 8 games (goals + assists)
soccer_data = np.random.randint(0, 6, size=(15, 8))

s_result = analyze_multi_sport_performance(soccer_data, 'soccer')

# Swimming data: 10 swimmers × 6 races (times in seconds - lower is better)
swimming_data = np.random.uniform(50.0, 60.0, size=(10, 6))

sw_result = analyze_multi_sport_performance(swimming_data, 'swimming')


---

## Problem 7: Season Trend Analysis with Linear Algebra (20 points)

### The Situation

Dr. Chen walks into the lab looking excited. "The volleyball coach just called with an urgent question: 'Is my team getting better or worse over the season?' We have their game-by-game performance scores for 20 matches, but just looking at the numbers doesn't tell the story. We need to use linear algebra to find the trend."

She pulls up a whiteboard and draws a scatter plot. "Imagine plotting game number on the x-axis and performance score on the y-axis. The points will scatter around, but there's usually an underlying trend - the team might be improving, declining, or staying steady. Linear regression fits a straight line through these points that best represents the overall pattern. The slope of that line tells us everything: positive means improving, negative means declining, zero means no trend."

Dr. Chen continues: "NumPy's `polyfit()` function does the heavy lifting. We give it our x-values (game numbers 1 through 20) and y-values (performance scores), and tell it we want a degree 1 polynomial - which is just a straight line. It returns two coefficients: the slope and the y-intercept. These define our trend line with the formula: performance = slope × game_number + intercept."

She writes on the board: "Once we have the slope and intercept, we can do three powerful things. First, calculate the correlation coefficient using `np.corrcoef()`. This number between -1 and 1 tells us how well the data fits our line. Values close to 1 mean a strong positive trend, close to -1 mean a strong negative trend, and close to 0 mean no clear trend. Second, we can predict future games by plugging game numbers 21, 22, 23 into our trend equation. Third, we calculate residuals - the differences between actual scores and our trend line - to see how much variance there is."

"The coach can use this analysis for strategic planning," Dr. Chen explains. "If the slope is positive and the correlation is strong, they know their training program is working. If it's negative, they need to change their approach. If there's no clear trend, performance might be erratic and they need to focus on consistency."

### Your Task

Implement `analyze_season_trends(game_scores)` that:
- Takes a 1D array of 20 game performance scores
- Creates game numbers array (1 through 20)
- Fits a linear trend line (degree 1 polynomial)
- Calculates the correlation coefficient
- Predicts scores for games 21, 22, 23
- Calculates residuals (actual - predicted)
- Determines if team is improving (slope > 0)
- Returns comprehensive trend analysis

### Examples

```python
# Improving team: scores increase over season
scores = np.array([65, 68, 67, 70, 72, 71, 74, 75, ...])
result = analyze_season_trends(scores)
# Returns:
# {
#     'slope': 0.85,          # Improving by 0.85 points per game
#     'intercept': 63.2,
#     'correlation': 0.94,    # Strong positive correlation
#     'improving': True,
#     'predictions': array([81.1, 82.0, 82.8]),  # Games 21-23
#     'avg_residual': 2.3
# }
```

In [None]:
# YOUR CODE HERE: Implement the season trend analyzer

def analyze_season_trends(game_scores):
    pass  # Delete this and write your implementation

# Test your code with different trend scenarios

# Scenario 1: Improving team
improving_scores = np.array([
    65, 68, 67, 70, 72, 71, 74, 75, 73, 76,
    78, 77, 80, 79, 81, 83, 82, 84, 85, 86
])

result1 = analyze_season_trends(improving_scores)

# Scenario 2: Declining team
declining_scores = np.array([
    85, 83, 84, 81, 80, 78, 79, 76, 75, 74,
    72, 73, 70, 69, 68, 66, 67, 65, 63, 62
])

result2 = analyze_season_trends(declining_scores)

# Scenario 3: Stable team (no clear trend)
np.random.seed(42)
stable_scores = np.random.normal(75, 5, 20)  # Mean 75, std 5

result3 = analyze_season_trends(stable_scores)


---

## Problem 8: Complete Athletics Analytics System (30 points)

### The Situation

It's your final week at the Data Lab, and Dr. Chen calls you into her office with the team lead. "You've done exceptional work this project cycle," the director begins. "Every piece you've built - from basic statistics to trend analysis - has been invaluable. But now we need you to bring it all together."

Dr. Chen pulls up a system architecture diagram on her screen. "The product team needs a complete, integrated analytics platform. Right now, when a coach comes to us with data, we run different Python scripts for different analyses. We need one unified system that can:

1. Import data from any sport in any format - lists, arrays, even CSV-like nested lists
2. Store that data efficiently with proper organization by sport
3. Calculate comprehensive statistics - means, medians, standard deviations, percentiles
4. Rank players automatically, handling both 'higher is better' and 'lower is better' sports
5. Identify performance trends across seasons
6. Generate complete reports that coachs can actually use"

The team lead adds: "This system will be used daily. Basketball, soccer, swimming, track, volleyball - every sport. It needs to be flexible enough to handle different data structures but consistent enough that coachs get reliable results every time."

Dr. Chen explains the design: "We're using a class-based approach for organization. Think of the class as a container that holds all the performance data and provides methods to analyze it. You'll import data into a dictionary where sport names are keys and NumPy arrays are values. Each analysis method - statistics, rankings, trends - operates on the data for one sport at a time. The `generate_report()` method brings everything together, creating a comprehensive analysis package."

She continues: "For rankings, you need to handle the direction. Basketball scores should be ranked high-to-low, so the highest scorer is ranked first. Swimming times should be ranked low-to-high, so the fastest swimmer is first. Use `np.argsort()` - it returns indices in sorted order. For descending order (basketball), reverse the result with `[::-1]`. For ascending order (swimming), use it as-is."

"The trend analysis method should only work with 2D arrays," Dr. Chen adds. "If a coach gives you player data across multiple games - like a 12 by 5 array for 12 players over 5 games - you can analyze how the team's average performance changed over those games. Calculate the mean score for each game (axis=0 averages down the columns), then fit a trend line through those game averages. If the data is just a 1D array, return None - there's no temporal trend to analyze."

The team lead concludes: "This system represents everything we hoped to build when we created this lab. Show us what you've learned, and create something the whole product team can use for years to come."

### Your Task

Create a `AthleticsAnalyticsSystem` class with the following methods:

**`__init__(self)`**
- Initialize `self.sports_data` as an empty dictionary

**`import_sport_data(self, sport_name, data, data_type='array')`**
- If `data_type` is 'list', convert data to NumPy array
- Otherwise, store data as-is
- Save to `self.sports_data[sport_name]`

**`calculate_team_statistics(self, sport_name)`**
- Calculate mean, median, std, min, max, 25th percentile, 75th percentile
- Return dictionary with all statistics

**`rank_players(self, sport_name, reverse=False)`**
- If data is 2D, calculate player averages (mean across games)
- If data is 1D, use scores directly
- Sort indices: ascending if `reverse=True` (swimming), descending otherwise
- Return array of ranked player indices

**`identify_trends(self, sport_name)`**
- Only works with 2D data (returns None for 1D)
- Calculate average score per game (mean across players, axis=0)
- Fit linear trend through game averages
- Return dictionary with slope, improving status, game averages

**`generate_report(self, sport_name)`**
- Call all analysis methods for the sport
- Combine results into comprehensive report dictionary
- Include sport name, statistics, top 3 players, trends (if available)

### Examples

```python
system = AthleticsAnalyticsSystem()

# Import basketball data (12 players × 5 games)
basketball = np.random.randint(10, 30, size=(12, 5))
system.import_sport_data('basketball', basketball)

# Generate complete report
report = system.generate_report('basketball')
print(report['statistics']['mean'])  # Team average
print(report['top_3_players'])       # [3, 7, 1] (player indices)
print(report['trends']['improving']) # True/False
```

In [None]:
# YOUR CODE HERE: Implement the complete AthleticsAnalyticsSystem class

class AthleticsAnalyticsSystem:
    pass  # Delete this and implement the entire class

# Comprehensive test of the complete system

system = AthleticsAnalyticsSystem()

# Import basketball data
np.random.seed(42)
basketball_data = np.random.randint(10, 30, size=(12, 5))
system.import_sport_data('basketball', basketball_data)

# Import soccer data
soccer_data = np.random.randint(0, 6, size=(15, 8))
system.import_sport_data('soccer', soccer_data)

# Import swimming data
swimming_data = np.random.uniform(50.0, 60.0, size=(10, 6))
system.import_sport_data('swimming', swimming_data)

# Generate and display reports
b_report = system.generate_report('basketball')

for i, player_idx in enumerate(b_report['top_3_players'], 1):
    avg = np.mean(basketball_data[player_idx])
    print(f"Basketball #{i}: Player {player_idx} (avg: {avg:.1f} pts)")

if b_report['trends']:
    print(f"Basketball trend: {'improving' if b_report['trends']['improving'] else 'declining'}")

s_report = system.generate_report('soccer')

for i, player_idx in enumerate(s_report['top_3_players'], 1):
    avg = np.mean(soccer_data[player_idx])
    print(f"Soccer #{i}: Player {player_idx} (avg: {avg:.1f} goals)")

sw_report = system.generate_report('swimming')

# For swimming, use reverse=True to rank by lowest time first
swimming_rankings = system.rank_players('swimming', reverse=True)
for i, swimmer_idx in enumerate(swimming_rankings[:3], 1):
    avg = np.mean(swimming_data[swimmer_idx])
    print(f"Swimming #{i}: Swimmer {swimmer_idx} (avg: {avg:.2f}s)")


---

## Congratulations on Completing Your Analytics Internship!

### What You've Accomplished

**Problem 1 (10 pts) - Player Statistics Arrays**
- Mastered NumPy array creation from Python lists
- Learned to inspect array properties: shape, dtype, length
- Understood why NumPy arrays are faster than Python lists

**Problem 2 (10 pts) - Basic Statistical Analysis**
- Applied fundamental NumPy statistical functions: mean, max, min
- Used argmax and argmin to find positions of extreme values
- Calculated meaningful sports statistics for coaching reports

**Problem 3 (15 pts) - Season Records Analysis**
- Worked with 2D NumPy arrays representing game-by-game data
- Mastered array slicing to extract columns of data
- Performed element-wise arithmetic for differential calculations
- Used boolean operations to count wins, losses, and ties

**Problem 4 (15 pts) - Matrix Reshaping**
- Reshaped 1D arrays into 2D matrices for better organization
- Calculated statistics along different axes (axis=0 vs axis=1)
- Applied transpose operations to flip data perspectives
- Identified top performers using aggregated metrics

**Problem 5 (20 pts) - Advanced Filtering**
- Implemented boolean indexing for sophisticated data filtering
- Combined multiple conditions with AND and OR operators
- Calculated percentiles to identify elite performers
- Used np.where for multi-level categorization
- Created actionable insights for athlete support programs

**Problem 6 (20 pts) - Multi-Sport Comparison**
- Normalized data across different sports for fair comparisons
- Handled inverted scales (swimming where lower is better)
- Applied broadcasting for efficient team-average comparisons
- Calculated consistency metrics with standard deviation
- Built flexible analysis supporting multiple sports

**Problem 7 (20 pts) - Trend Analysis**
- Applied linear algebra with polyfit for trend lines
- Calculated correlation coefficients to measure trend strength
- Made predictions using fitted mathematical models
- Computed residuals to assess model accuracy
- Provided actionable coaching insights from mathematical analysis

**Problem 8 (30 pts) - Complete Analytics System**
- Integrated all NumPy concepts into one cohesive system
- Built a class-based architecture for professional code organization
- Implemented flexible data import supporting multiple formats
- Created comprehensive reporting combining all analysis types
- Delivered a production-ready tool for real product team use

### Skills You've Mastered

**NumPy Fundamentals:**
- Array creation, reshaping, and transposition
- Indexing, slicing, and boolean masking
- Array arithmetic and broadcasting
- Axis-based operations for multidimensional data

**Statistical Analysis:**
- Descriptive statistics: mean, median, std, min, max
- Percentile calculations and rank ordering
- Correlation analysis and trend identification
- Data normalization for cross-domain comparisons

**Professional Development:**
- Writing efficient, vectorized code (no unnecessary loops)
- Building reusable, modular analysis systems
- Creating comprehensive reports from raw data
- Applying mathematical concepts to real-world problems

### Why These Skills Matter

You now have the foundational skills used by data scientists and analysts in:
- **Sports Analytics**: NBA, NFL, FIFA teams employ analysts like you
- **Finance**: Stock market analysis, risk assessment, portfolio optimization
- **Healthcare**: Patient data analysis, medical imaging, research studies
- **Research**: Scientific computing, simulation, experimental data analysis
- **Engineering**: Signal processing, optimization, computer vision

NumPy is the foundation for Python's entire data science ecosystem:
- **pandas**: Built on NumPy for data manipulation (coming next!)
- **scikit-learn**: Machine learning algorithms use NumPy arrays
- **TensorFlow/PyTorch**: Deep learning frameworks built on NumPy concepts
- **Matplotlib**: Visualization libraries work with NumPy data

### Dr. Chen's Final Words

"I'm incredibly proud of the work you've done this project cycle. When you started, you were learning basic Python syntax. Now you're building professional-grade analytics systems that could genuinely help our product team make better decisions.

You've learned to think like a data scientist: start with messy data, apply mathematical transformations, extract insights, and communicate results clearly. These are the exact skills that companies around the world are looking for.

More importantly, you've learned that data analysis isn't just about running calculations - it's about answering real questions that matter to people. Every function you wrote in this assignment solves an actual problem that coachs, athletes, and administrators face.

Keep this momentum going. The next part of the course builds on NumPy with pandas for even more powerful data analysis. You're ready for it.

Excellent work. Welcome to the world of data science."

---

*Assignment 4 for INF 605 - Introduction to Programming - Python*  
*Google - Prof. Rongyu Lin*  
*Google Data Analytics Lab - Dr. Chen, Senior Data Scientist*