BADM 554 - Data Foundations

Program-level details: See program/CURRICULUM.md

Credits: 4

Term: Fall 1 (Weeks 1-8)

Course Vision

Students master SQL, relational data modeling, and Python data wrangling to become proficient at designing, querying, and maintaining data systems. By course end, students build a complete ETL pipeline using Python, design a normalized database schema, and understand cloud data infrastructure basics.

Learning Outcomes (L-C-E Framework)

Literacy (Foundational Awareness)

L1: Understand relational database concepts (tables, keys, relationships) and explain why normalization matters
L2: Read and interpret SQL queries to identify what data they retrieve
L3: Recognize when data needs cleaning and describe common data quality issues

Competency (Applied Skills)

C1: Write SQL queries to extract, transform, and aggregate data from relational databases
C2: Design a normalized database schema (3NF) for a given business problem
C3: Use Python (pandas, sqlalchemy) to build a complete ETL pipeline
C4: Connect to cloud databases and run queries in browser-based SQL environments

Expertise (Advanced Application)

E1: Evaluate design trade-offs (normalization vs. denormalization) for specific use cases
E2: Optimize queries for performance and design effective indexes
E3: Build a production-ready data pipeline with error handling and logging

Week-by-Week Breakdown

Week	Topic	Lectures	Project Work	Studio Session	Assessment
1	Relational databases + SQL SELECT fundamentals	3 videos (30 min each)	Project 1A: Design database for case	Project kickoff - database design walkthrough	Quiz (L1-L2)
2	JOINs, subqueries, aggregations	3 videos	Project 1 work: Write queries	SQL deep-dive - JOIN patterns + common mistakes	Quiz (C1)
3	Data modeling + Entity-Relationship diagrams	2 videos + Jupyter notebook	Project 1 work: Refine schema	ER diagram workshop - using Lucidchart/DrawIO	Project 1 due
4	Python fundamentals + pandas intro	3 videos	Project 2A: Data wrangling	Pandas fundamentals - DataFrame operations in Jupyter	DataCamp assignment
5	ETL pipelines + Python (pandas, sqlalchemy)	3 videos	Project 2 work: Build ETL	ETL pipeline workshop - error handling, logging	Code review (peer)
6	NoSQL basics (MongoDB) + API data ingestion	2 videos	Project 2 work: Complete ETL	MongoDB + APIs - document databases, Yelp/weather API	Mid-course checkpoint
7	Indexing, optimization, cloud databases	2 videos	Project 3A: Cloud database setup	AWS RDS setup - cloud database hands-on	AWS setup quiz
8	Data quality, governance, synthesis	1 video + review	Project 3 complete + portfolio reflection	Final presentations - each student demos pipeline	Projects 2 & 3 due

Projects (3 per course)

Project 1: Relational Database Design (Weeks 1-3, Individual, 20% of grade)

Problem Statement: You’ve been hired by a local e-commerce startup to design their database. They currently track customers, orders, products, and reviews in spreadsheets. Your task: design a normalized relational schema.

Deliverables:

Entity-Relationship diagram (ERD) showing tables, keys, relationships
SQL schema definition (CREATE TABLE statements)
Normalization analysis (identify 3NF violations in original data)
5 sample SQL queries demonstrating the schema works
GitHub repo with schema documentation

Rubric (5 dimensions):

Dimension	Excellent (A)	Proficient (B)	Developing (C)
Schema Design	3NF normalized, handles all requirements	Mostly 3NF with 1-2 denormalization choices	Normalization issues present
SQL Queries	All queries correct + efficient	4/5 queries correct	3+ query errors
Documentation	Clear ERD + written justification	Basic ERD, minimal explanation	Incomplete ERD
Code Quality	Well-organized GitHub repo, clear comments	Adequate organization	Messy file structure
Business Understanding	Explains why design serves business needs	Mentions business context	No business rationale

Project 2: ETL Pipeline in Python (Weeks 4-6, Individual, 30% of grade)

Problem Statement: Build a complete Extract-Transform-Load pipeline in Python. You’ll fetch data from multiple sources (CSV, API, database), clean and standardize it, then load into a normalized database.

Datasets Available:

Option A: Yelp API (business, review, user data)
Option B: US Census Bureau API (demographic data by region)
Option C: Stock market data (Yahoo Finance API)
Option D: Weather + transportation data (public APIs)

Deliverables:

Python script (pandas + sqlalchemy) that runs end-to-end
Jupyter notebook explaining each step
README with setup instructions (dependencies, API keys, credentials)
Error handling documentation (what happens if API fails?)
GitHub repo with all code + test datasets

Rubric (5 dimensions):

Dimension	Excellent (A)	Proficient (B)	Developing (C)
Code Quality	Modular, documented, type hints	Good structure, minimal comments	Spaghetti code
Data Handling	Robust error handling, validates data	Handles happy path, basic validation	Fails on edge cases
Python Fluency	Efficient pandas/sqlalchemy usage	Standard approach, minor inefficiencies	Verbose or incorrect usage
Testing	Unit tests present, test data included	Manual testing documented	No testing shown
Documentation	Clear setup guide + code comments	Basic README	Minimal documentation

Project 3: Cloud Database + Optimization (Weeks 7-8, Team of 3-4, 25% of grade)

Problem Statement: Deploy your ETL pipeline to AWS RDS (cloud database) as a team. Optimize it for performance. Build an automated job that runs nightly to update the database with fresh data.

Deliverables:

AWS RDS database live and accessible
Python script that updates data nightly (cron job or AWS Lambda)
Performance analysis (query times before/after optimization)
Indexes designed to improve 3 key queries
Architecture diagram (data flow, storage, compute)
Team oral defense: present database design + optimization results (20% of Project 3 grade)
Peer evaluation of team contributions
GitHub repo with all code + AWS setup guide

Rubric (5 dimensions):

Dimension	Excellent (A)	Proficient (B)	Developing (C)
AWS Implementation	RDS properly configured, secure, monitored	RDS works, basic configuration	Connection/setup issues
Optimization	30%+ performance gain, indexes chosen strategically	10-20% improvement, reasonable indexes	Minimal or no optimization
Oral Defense	Clear explanation, confident demo, handles Q&A well	Adequate presentation	Unclear or unprepared
Costs	Careful cost optimization, documented	Aware of costs but not optimized	Unexpected high costs
Documentation	Complete architecture diagram + setup guide	Basic documentation	Incomplete instructions

AI Tools Integration

Where AI Accelerates Learning:

Week 1 (SQL Fundamentals): Use Claude/ChatGPT to:
- Explain SQL errors (“Why does this JOIN return NULL?”)
- Generate sample SQL queries for practice
- Validate your schema design
- Suggest refactoring for clarity
Week 4-5 (Python ETL): Use AI to:
- Debug pandas errors (“How do I reshape this DataFrame?”)
- Optimize code performance (“How do I vectorize this loop?”)
- Generate error handling patterns
- Suggest pandas functions for complex transformations
Week 7-8 (Cloud & Optimization): Use AI to:
- Troubleshoot AWS configuration issues
- Suggest index strategies for slow queries
- Review security practices
- Generate Lambda function code for automation

Studio Session Topics:

Week 1: ER diagram design walkthrough + SQL SELECT fundamentals
Week 2: Advanced SQL patterns (window functions, CTEs, complex JOINs)
Week 3: Database normalization decisions + denormalization trade-offs
Week 4: Pandas vs. SQL + when to use each
Week 5: Error handling patterns in Python + logging best practices
Week 6: API authentication + managing credentials safely
Week 7: Query optimization + index strategies
Week 8: Final presentations + portfolio feedback

Assessment Summary

Component	Weight	Notes
Project 1 (Database Design)	20%	Individual, schema-focused
Project 2 (ETL Pipeline)	30%	Individual, Python-focused
Project 3 (Cloud + Optimization)	25%	Team (includes oral defense), AWS-focused
Quizzes + DataCamp	15%	Formative, spread across weeks
Studio participation + peer review	10%	Weekly attendance, code review

No traditional exam. All assessment is project-based + participation.

Technology Stack

Database: SQLFiddle (learning), MySQL/PostgreSQL locally or AWS RDS (projects)
Python Libraries: pandas, sqlalchemy, requests, logging
Cloud: AWS RDS (relational), AWS Free Tier or student credits
Tools: Jupyter Notebooks, GitHub, Lucidchart/DrawIO for ERDs
Data Sources: Yelp API, Census Bureau, Yahoo Finance, public datasets

Prerequisites & Assumptions

No SQL experience required
Python fundamentals helpful (loops, functions, data types)
Comfortable installing software + troubleshooting

Last Updated: February 2026

MSBAi Curriculum Site

MSBAi - Online Master of Science in Business Analytics

AI-First curriculum design documentation for the MSBAi program launching Fall 2026

BADM 554 - Data Foundations

Course Vision

Learning Outcomes (L-C-E Framework)

Literacy (Foundational Awareness)

Competency (Applied Skills)

Expertise (Advanced Application)

Week-by-Week Breakdown

Projects (3 per course)

Project 1: Relational Database Design (Weeks 1-3, Individual, 20% of grade)

Project 2: ETL Pipeline in Python (Weeks 4-6, Individual, 30% of grade)

Project 3: Cloud Database + Optimization (Weeks 7-8, Team of 3-4, 25% of grade)

AI Tools Integration

Assessment Summary

Technology Stack

Prerequisites & Assumptions