BADM 554 - Data Foundations
Program-level details: See program/CURRICULUM.md
| Credits: 4 | Term: Fall 1 (Weeks 1-8) |
Course Vision
Students master SQL, relational data modeling, and Python data wrangling to become proficient at designing, querying, and maintaining data systems. By course end, students build a complete ETL pipeline using Python, design a normalized database schema, and understand cloud data infrastructure basics.
Learning Outcomes (L-C-E Framework)
Literacy (Foundational Awareness)
- L1: Understand relational database concepts (tables, keys, relationships) and explain why normalization matters
- L2: Read and interpret SQL queries to identify what data they retrieve
- L3: Recognize when data needs cleaning and describe common data quality issues
Competency (Applied Skills)
- C1: Write SQL queries to extract, transform, and aggregate data from relational databases
- C2: Design a normalized database schema (3NF) for a given business problem
- C3: Use Python (pandas, sqlalchemy) to build a complete ETL pipeline
- C4: Connect to cloud databases and run queries in browser-based SQL environments
Expertise (Advanced Application)
- E1: Evaluate design trade-offs (normalization vs. denormalization) for specific use cases
- E2: Optimize queries for performance and design effective indexes
- E3: Build a production-ready data pipeline with error handling and logging
Week-by-Week Breakdown
| Week | Topic | Lectures | Project Work | Studio Session | Assessment |
|---|---|---|---|---|---|
| 1 | Relational databases + SQL SELECT fundamentals | 3 videos (30 min each) | Project 1A: Design database for case | Project kickoff - database design walkthrough | Quiz (L1-L2) |
| 2 | JOINs, subqueries, aggregations | 3 videos | Project 1 work: Write queries | SQL deep-dive - JOIN patterns + common mistakes | Quiz (C1) |
| 3 | Data modeling + Entity-Relationship diagrams | 2 videos + Jupyter notebook | Project 1 work: Refine schema | ER diagram workshop - using Lucidchart/DrawIO | Project 1 due |
| 4 | Python fundamentals + pandas intro | 3 videos | Project 2A: Data wrangling | Pandas fundamentals - DataFrame operations in Jupyter | DataCamp assignment |
| 5 | ETL pipelines + Python (pandas, sqlalchemy) | 3 videos | Project 2 work: Build ETL | ETL pipeline workshop - error handling, logging | Code review (peer) |
| 6 | NoSQL basics (MongoDB) + API data ingestion | 2 videos | Project 2 work: Complete ETL | MongoDB + APIs - document databases, Yelp/weather API | Mid-course checkpoint |
| 7 | Indexing, optimization, cloud databases | 2 videos | Project 3A: Cloud database setup | AWS RDS setup - cloud database hands-on | AWS setup quiz |
| 8 | Data quality, governance, synthesis | 1 video + review | Project 3 complete + portfolio reflection | Final presentations - each student demos pipeline | Projects 2 & 3 due |
Projects (3 per course)
Project 1: Relational Database Design (Weeks 1-3, Individual, 20% of grade)
Problem Statement: You’ve been hired by a local e-commerce startup to design their database. They currently track customers, orders, products, and reviews in spreadsheets. Your task: design a normalized relational schema.
Deliverables:
- Entity-Relationship diagram (ERD) showing tables, keys, relationships
- SQL schema definition (CREATE TABLE statements)
- Normalization analysis (identify 3NF violations in original data)
- 5 sample SQL queries demonstrating the schema works
- GitHub repo with schema documentation
Rubric (5 dimensions):
| Dimension | Excellent (A) | Proficient (B) | Developing (C) |
|---|---|---|---|
| Schema Design | 3NF normalized, handles all requirements | Mostly 3NF with 1-2 denormalization choices | Normalization issues present |
| SQL Queries | All queries correct + efficient | 4/5 queries correct | 3+ query errors |
| Documentation | Clear ERD + written justification | Basic ERD, minimal explanation | Incomplete ERD |
| Code Quality | Well-organized GitHub repo, clear comments | Adequate organization | Messy file structure |
| Business Understanding | Explains why design serves business needs | Mentions business context | No business rationale |
Project 2: ETL Pipeline in Python (Weeks 4-6, Individual, 30% of grade)
Problem Statement: Build a complete Extract-Transform-Load pipeline in Python. You’ll fetch data from multiple sources (CSV, API, database), clean and standardize it, then load into a normalized database.
Datasets Available:
- Option A: Yelp API (business, review, user data)
- Option B: US Census Bureau API (demographic data by region)
- Option C: Stock market data (Yahoo Finance API)
- Option D: Weather + transportation data (public APIs)
Deliverables:
- Python script (pandas + sqlalchemy) that runs end-to-end
- Jupyter notebook explaining each step
- README with setup instructions (dependencies, API keys, credentials)
- Error handling documentation (what happens if API fails?)
- GitHub repo with all code + test datasets
Rubric (5 dimensions):
| Dimension | Excellent (A) | Proficient (B) | Developing (C) |
|---|---|---|---|
| Code Quality | Modular, documented, type hints | Good structure, minimal comments | Spaghetti code |
| Data Handling | Robust error handling, validates data | Handles happy path, basic validation | Fails on edge cases |
| Python Fluency | Efficient pandas/sqlalchemy usage | Standard approach, minor inefficiencies | Verbose or incorrect usage |
| Testing | Unit tests present, test data included | Manual testing documented | No testing shown |
| Documentation | Clear setup guide + code comments | Basic README | Minimal documentation |
Project 3: Cloud Database + Optimization (Weeks 7-8, Team of 3-4, 25% of grade)
Problem Statement: Deploy your ETL pipeline to AWS RDS (cloud database) as a team. Optimize it for performance. Build an automated job that runs nightly to update the database with fresh data.
Deliverables:
- AWS RDS database live and accessible
- Python script that updates data nightly (cron job or AWS Lambda)
- Performance analysis (query times before/after optimization)
- Indexes designed to improve 3 key queries
- Architecture diagram (data flow, storage, compute)
- Team oral defense: present database design + optimization results (20% of Project 3 grade)
- Peer evaluation of team contributions
- GitHub repo with all code + AWS setup guide
Rubric (5 dimensions):
| Dimension | Excellent (A) | Proficient (B) | Developing (C) |
|---|---|---|---|
| AWS Implementation | RDS properly configured, secure, monitored | RDS works, basic configuration | Connection/setup issues |
| Optimization | 30%+ performance gain, indexes chosen strategically | 10-20% improvement, reasonable indexes | Minimal or no optimization |
| Oral Defense | Clear explanation, confident demo, handles Q&A well | Adequate presentation | Unclear or unprepared |
| Costs | Careful cost optimization, documented | Aware of costs but not optimized | Unexpected high costs |
| Documentation | Complete architecture diagram + setup guide | Basic documentation | Incomplete instructions |
AI Tools Integration
Where AI Accelerates Learning:
- Week 1 (SQL Fundamentals): Use Claude/ChatGPT to:
- Explain SQL errors (“Why does this JOIN return NULL?”)
- Generate sample SQL queries for practice
- Validate your schema design
- Suggest refactoring for clarity
- Week 4-5 (Python ETL): Use AI to:
- Debug pandas errors (“How do I reshape this DataFrame?”)
- Optimize code performance (“How do I vectorize this loop?”)
- Generate error handling patterns
- Suggest pandas functions for complex transformations
- Week 7-8 (Cloud & Optimization): Use AI to:
- Troubleshoot AWS configuration issues
- Suggest index strategies for slow queries
- Review security practices
- Generate Lambda function code for automation
Studio Session Topics:
- Week 1: ER diagram design walkthrough + SQL SELECT fundamentals
- Week 2: Advanced SQL patterns (window functions, CTEs, complex JOINs)
- Week 3: Database normalization decisions + denormalization trade-offs
- Week 4: Pandas vs. SQL + when to use each
- Week 5: Error handling patterns in Python + logging best practices
- Week 6: API authentication + managing credentials safely
- Week 7: Query optimization + index strategies
- Week 8: Final presentations + portfolio feedback
Assessment Summary
| Component | Weight | Notes |
|---|---|---|
| Project 1 (Database Design) | 20% | Individual, schema-focused |
| Project 2 (ETL Pipeline) | 30% | Individual, Python-focused |
| Project 3 (Cloud + Optimization) | 25% | Team (includes oral defense), AWS-focused |
| Quizzes + DataCamp | 15% | Formative, spread across weeks |
| Studio participation + peer review | 10% | Weekly attendance, code review |
No traditional exam. All assessment is project-based + participation.
Technology Stack
- Database: SQLFiddle (learning), MySQL/PostgreSQL locally or AWS RDS (projects)
- Python Libraries: pandas, sqlalchemy, requests, logging
- Cloud: AWS RDS (relational), AWS Free Tier or student credits
- Tools: Jupyter Notebooks, GitHub, Lucidchart/DrawIO for ERDs
- Data Sources: Yelp API, Census Bureau, Yahoo Finance, public datasets
Prerequisites & Assumptions
- No SQL experience required
- Python fundamentals helpful (loops, functions, data types)
- Comfortable installing software + troubleshooting
Last Updated: February 2026