← MSBAi Home

BADM 554 - Data Foundations

Program-level details: See program/CURRICULUM.md

Credits: 4 Term: Fall 1 (Weeks 1-8)

Course Vision

Students master SQL, relational data modeling, and Python data wrangling to become proficient at designing, querying, and maintaining data systems. By course end, students build a complete ETL pipeline using Python, design a normalized database schema, and understand cloud data infrastructure basics.

Learning Outcomes (L-C-E Framework)

Literacy (Foundational Awareness)

Competency (Applied Skills)

Expertise (Advanced Application)

Week-by-Week Breakdown

Week Topic Lectures Project Work Studio Session Assessment
1 Relational databases + SQL SELECT fundamentals 3 videos (30 min each) Project 1A: Design database for case Project kickoff - database design walkthrough Quiz (L1-L2)
2 JOINs, subqueries, aggregations 3 videos Project 1 work: Write queries SQL deep-dive - JOIN patterns + common mistakes Quiz (C1)
3 Data modeling + Entity-Relationship diagrams 2 videos + Jupyter notebook Project 1 work: Refine schema ER diagram workshop - using Lucidchart/DrawIO Project 1 due
4 Python fundamentals + pandas intro 3 videos Project 2A: Data wrangling Pandas fundamentals - DataFrame operations in Jupyter DataCamp assignment
5 ETL pipelines + Python (pandas, sqlalchemy) 3 videos Project 2 work: Build ETL ETL pipeline workshop - error handling, logging Code review (peer)
6 NoSQL basics (MongoDB) + API data ingestion 2 videos Project 2 work: Complete ETL MongoDB + APIs - document databases, Yelp/weather API Mid-course checkpoint
7 Indexing, optimization, cloud databases 2 videos Project 3A: Cloud database setup AWS RDS setup - cloud database hands-on AWS setup quiz
8 Data quality, governance, synthesis 1 video + review Project 3 complete + portfolio reflection Final presentations - each student demos pipeline Projects 2 & 3 due

Projects (3 per course)

Project 1: Relational Database Design (Weeks 1-3, Individual, 20% of grade)

Problem Statement: You’ve been hired by a local e-commerce startup to design their database. They currently track customers, orders, products, and reviews in spreadsheets. Your task: design a normalized relational schema.

Deliverables:

Rubric (5 dimensions):

Dimension Excellent (A) Proficient (B) Developing (C)
Schema Design 3NF normalized, handles all requirements Mostly 3NF with 1-2 denormalization choices Normalization issues present
SQL Queries All queries correct + efficient 4/5 queries correct 3+ query errors
Documentation Clear ERD + written justification Basic ERD, minimal explanation Incomplete ERD
Code Quality Well-organized GitHub repo, clear comments Adequate organization Messy file structure
Business Understanding Explains why design serves business needs Mentions business context No business rationale

Project 2: ETL Pipeline in Python (Weeks 4-6, Individual, 30% of grade)

Problem Statement: Build a complete Extract-Transform-Load pipeline in Python. You’ll fetch data from multiple sources (CSV, API, database), clean and standardize it, then load into a normalized database.

Datasets Available:

Deliverables:

Rubric (5 dimensions):

Dimension Excellent (A) Proficient (B) Developing (C)
Code Quality Modular, documented, type hints Good structure, minimal comments Spaghetti code
Data Handling Robust error handling, validates data Handles happy path, basic validation Fails on edge cases
Python Fluency Efficient pandas/sqlalchemy usage Standard approach, minor inefficiencies Verbose or incorrect usage
Testing Unit tests present, test data included Manual testing documented No testing shown
Documentation Clear setup guide + code comments Basic README Minimal documentation

Project 3: Cloud Database + Optimization (Weeks 7-8, Team of 3-4, 25% of grade)

Problem Statement: Deploy your ETL pipeline to AWS RDS (cloud database) as a team. Optimize it for performance. Build an automated job that runs nightly to update the database with fresh data.

Deliverables:

Rubric (5 dimensions):

Dimension Excellent (A) Proficient (B) Developing (C)
AWS Implementation RDS properly configured, secure, monitored RDS works, basic configuration Connection/setup issues
Optimization 30%+ performance gain, indexes chosen strategically 10-20% improvement, reasonable indexes Minimal or no optimization
Oral Defense Clear explanation, confident demo, handles Q&A well Adequate presentation Unclear or unprepared
Costs Careful cost optimization, documented Aware of costs but not optimized Unexpected high costs
Documentation Complete architecture diagram + setup guide Basic documentation Incomplete instructions

AI Tools Integration

Where AI Accelerates Learning:

  1. Week 1 (SQL Fundamentals): Use Claude/ChatGPT to:
    • Explain SQL errors (“Why does this JOIN return NULL?”)
    • Generate sample SQL queries for practice
    • Validate your schema design
    • Suggest refactoring for clarity
  2. Week 4-5 (Python ETL): Use AI to:
    • Debug pandas errors (“How do I reshape this DataFrame?”)
    • Optimize code performance (“How do I vectorize this loop?”)
    • Generate error handling patterns
    • Suggest pandas functions for complex transformations
  3. Week 7-8 (Cloud & Optimization): Use AI to:
    • Troubleshoot AWS configuration issues
    • Suggest index strategies for slow queries
    • Review security practices
    • Generate Lambda function code for automation

Studio Session Topics:

Assessment Summary

Component Weight Notes
Project 1 (Database Design) 20% Individual, schema-focused
Project 2 (ETL Pipeline) 30% Individual, Python-focused
Project 3 (Cloud + Optimization) 25% Team (includes oral defense), AWS-focused
Quizzes + DataCamp 15% Formative, spread across weeks
Studio participation + peer review 10% Weekly attendance, code review

No traditional exam. All assessment is project-based + participation.

Technology Stack

Prerequisites & Assumptions


Last Updated: February 2026