# Final Project Topic Bank

These examples are meant to spark ideas, not to limit you. You are encouraged to propose your own question if it is statistically meaningful and feasible.

## How to Use This List

When browsing ideas, ask yourself:
- What is the real question?
- What variables would I need?
- Which methods from class would help answer it?
- Can I explain the result in plain language?

## Example Directions

| Area | Example question | Possible methods | Possible data sources |
|---|---|---|---|
| Sports | Do teams perform differently at home versus away? | confidence intervals, paired comparisons, regression | sports-reference, Kaggle sports data |
| Sports | Does rest time affect shooting percentage or scoring? | regression, correlation, visualization | NBA, WNBA, or NCAA game logs |
| Housing | Which apartment features are most associated with rent? | regression, residual analysis, visualization | Zillow, city open-data portals |
| Transportation | Do average delays differ by airline or airport? | ANOVA, confidence intervals, boxplots | U.S. Bureau of Transportation Statistics |
| Public health | Are county-level health outcomes associated with income or access variables? | correlation, regression, limitations discussion | CDC, County Health Rankings, data.gov |
| Education | Do outcomes differ across course sections, modalities, or majors? | proportion tests, chi-square, ANOVA | instructor-approved class or campus data |
| Business | Are customer ratings different across product categories? | ANOVA, confidence intervals, categorical summaries | public review datasets, Kaggle |
| Marketing | Is conversion rate different across two campaigns? | proportion intervals, hypothesis tests | simulated or public A/B testing data |
| Environment | Is air quality related to temperature, wind, or season? | regression, grouped comparisons, visualization | EPA, NOAA, local open-data portals |
| Media | Do longer movies receive higher ratings? | regression, correlation, outlier analysis | IMDb-style datasets, TMDb exports |
| Politics and policy | Are turnout rates associated with demographic or geographic variables? | regression, chi-square, proportions | election datasets, Census data |
| Campus life | Are wait times, attendance, or survey responses different across times or groups? | confidence intervals, ANOVA, chi-square | self-collected or instructor-approved campus data |
| Probability and risk | How variable would outcomes be in a short season, hiring pipeline, or screening process? | probability models, simulation, expected value | simulated data with a real motivating context |
| Operations | What is the chance a service system runs over capacity? | Poisson or other distributions, simulation | queueing-style public or simulated data |

## Advice for Choosing a Good Topic

A good topic usually has these features:
- one main question,
- a dataset you can actually explain,
- at least one comparison or relationship worth interpreting,
- a result that would matter to a real audience.

A weak topic usually has one of these problems:
- too many unrelated questions,
- a dataset with no context,
- methods chosen only because they look advanced,
- a question that cannot be answered with the available variables.
