BADM 554 - Enterprise Database Management
Program-level details: See program/curriculum.md
Status: In Development Instructor actively building content; recording starts April 15, 2026. Structure stable. Proposed MSBAi name: Data Foundations — pending formal rename approval
| Credits: 4 | Term: Fall 2026 (Weeks 1-8) | Instructor: Vishal |
Course Vision
Students master SQL, relational data modeling, and Python data wrangling to become proficient at designing, querying, and maintaining data systems. By course end, students build a complete ETL pipeline using Python, design a normalized database schema, and understand cloud data infrastructure basics.
Learning Outcomes (L-C-E Framework)
Literacy (Foundational Awareness)
- L1: Understand relational database concepts (tables, keys, relationships) and explain why normalization matters
- L2: Read and interpret SQL queries to identify what data they retrieve
- L3: Recognize when data needs cleaning and describe common data quality issues
Competency (Applied Skills)
- C1: Write SQL queries to extract, transform, and aggregate data from relational databases
- C2: Design a normalized database schema (3NF) for a given business problem
- C3: Use Python (pandas, sqlalchemy) to build a complete ETL pipeline
- C4: Connect to cloud databases and run queries in browser-based SQL environments
Expertise (Advanced Application)
- E1: Evaluate design trade-offs (normalization vs. denormalization) for specific use cases
- E2: Optimize queries for performance and design effective indexes
- E3: Build a production-ready data pipeline with error handling and logging
Week-by-Week Breakdown
| Week | Topic | Lectures | Weekly Assignment | Project Milestone | Studio Session |
|---|---|---|---|---|---|
| 1 | Relational databases + SQL SELECT fundamentals | 3 videos (30 min each) | SQL fundamentals exercises + quiz (L1-L2) | M1: Team formation + project dataset selection | Project kickoff - database design walkthrough |
| 2 | JOINs, subqueries, aggregations | 3 videos | SQL query problem set + quiz (C1) | M2: ERD draft + normalization analysis | SQL deep-dive - JOIN patterns + common mistakes |
| 3 | Data modeling + Entity-Relationship diagrams | 2 videos + Jupyter notebook | ER diagram case study + peer review | M3: Finalized schema (CREATE TABLE) + sample queries | ER diagram workshop - using Lucidchart/DrawIO |
| 4 | Python for SQL users — pandas + DataFrames | 3 videos | DataCamp assignment + pandas lab | M4: Data source identification + extraction script | SQL→Python translation - pandas mirrors SQL: SELECT→filter, GROUP BY→groupby, JOIN→merge |
| 5 | ETL pipelines + Python (pandas, sqlalchemy) | 3 videos | ETL coding exercises + peer code review | M5: Working ETL pipeline (extract + transform) | ETL pipeline workshop - error handling, logging |
| 6 | Data quality, cleaning + API integration | 2 videos | API integration lab + discussion post | M6: Complete ETL with error handling + load to local DB | Data quality + APIs - messy real-world data, Yelp/weather API ingestion |
| 7 | Indexing, optimization, cloud databases | 2 videos | Optimization quiz + index design exercise | M7: GCP Cloud SQL deployment + performance benchmarks | GCP Cloud SQL setup - cloud database hands-on |
| 8 | Data quality, governance, synthesis | 1 video + review | Portfolio reflection | Final deliverable + oral defense | Final presentations - team demos pipeline + Q&A |
Team Project: Cloud-Deployed Data Pipeline (Team of 3, Weeks 1-8)
Problem Statement: Your team has been hired by a local e-commerce startup to design, build, and deploy a production-ready data pipeline. The company tracks customers, orders, products, and reviews in spreadsheets and scattered APIs. Your task: design a normalized database, build an ETL pipeline to populate it from multiple sources, deploy it to AWS RDS, and optimize for performance.
Datasets Available:
- Option A: Yelp API (business, review, user data)
- Option B: US Census Bureau API (demographic data by region)
- Option C: Stock market data (Yahoo Finance API)
- Option D: Weather + transportation data (public APIs)
Project Milestones (20-30% of grade)
| Milestone | Due | Deliverables |
|---|---|---|
| M1: Team formation + dataset selection | Week 1 | Team contract, dataset choice with justification |
| M2: ERD draft + normalization analysis | Week 2 | Draft ERD, identify 3NF violations in source data |
| M3: Finalized schema + sample queries | Week 3 | CREATE TABLE statements, 5 SQL queries demonstrating schema, GitHub repo |
| M4: Data source identification + extraction | Week 4 | Python extraction script using pandas (mirroring SQL queries), data inventory |
| M5: Working ETL pipeline (extract + transform) | Week 5 | pandas + sqlalchemy pipeline with transform logic |
| M6: Complete ETL with error handling | Week 6 | End-to-end pipeline loading to local DB, error handling + logging |
| M7: GCP Cloud SQL deployment + benchmarks | Week 7 | Live Cloud SQL database, performance baseline, index strategy |
Each milestone is assessed with brief instructor feedback to keep teams on track. Milestones build cumulatively — each incorporates and extends prior work.
Final Project Deliverable (15-20% of grade)
Due: Week 8
Deliverables:
- GCP Cloud SQL database live and accessible
- Python ETL script that updates data nightly (cron job or Cloud Functions)
- Performance analysis (query times before/after optimization)
- Indexes designed to improve 3 key queries
- Architecture diagram (data flow, storage, compute)
- Jupyter notebook documenting the full pipeline with explanations
- Peer evaluation of team contributions
- GitHub repo with all code, schema documentation, and AWS setup guide
Rubric (5 dimensions):
| Dimension | Excellent (A) | Proficient (B) | Developing (C) |
|---|---|---|---|
| Schema Design | 3NF normalized, handles all requirements, justified trade-offs | Mostly 3NF with minor issues | Normalization problems or missing relationships |
| ETL Pipeline | Modular, documented, robust error handling, efficient pandas/sqlalchemy | Good structure, handles happy path | Fragile pipeline, minimal error handling |
| GCP Implementation | Cloud SQL properly configured, secure, 30%+ performance gain | Cloud SQL works, 10-20% improvement | Connection/setup issues, minimal optimization |
| Documentation | Complete architecture diagram, clear setup guide, code comments | Adequate documentation | Incomplete or unclear instructions |
| Business Understanding | Explains design decisions in business terms, cost-aware | Mentions business context | No business rationale |
Oral Defense (20-25% of grade)
Week 8 studio session. Each team presents their pipeline design, demonstrates the live system, and fields Q&A from instructor and peers.
Rubric (3 dimensions):
| Dimension | Excellent (A) | Proficient (B) | Developing (C) |
|---|---|---|---|
| Technical Explanation | Clear walkthrough of design decisions, trade-offs articulated | Adequate explanation of system | Unclear or surface-level description |
| Live Demo | System works end-to-end, handles follow-up queries confidently | Demo works with minor issues | Demo fails or cannot answer questions |
| Individual Contribution | Each member explains their role with depth | Members can describe their work | Uneven participation or vague answers |
Weekly Assignments (30-40% of grade)
Practice exercises, labs, discussions, and peer reviews that build skills for the project. Completed individually unless noted.
| Week | Assignment | Format |
|---|---|---|
| 1 | SQL fundamentals exercises + quiz | Practice problems + timed quiz |
| 2 | SQL query problem set + quiz | JOINs, subqueries, aggregations |
| 3 | ER diagram case study + peer review | Design exercise + review a classmate’s ERD |
| 4 | DataCamp assignment + pandas lab | DataCamp module + SQL→pandas translation exercises |
| 5 | ETL coding exercises + peer code review | Coding problems + review a classmate’s pipeline code |
| 6 | Data quality + API integration lab | Clean a messy real-world dataset + build an API connector |
| 7 | Optimization quiz + index design exercise | Timed quiz + design indexes for given queries |
| 8 | Portfolio reflection | Written reflection on learning journey and skill growth |
AI Tools Integration
Where AI Accelerates Learning:
- Week 1 (SQL Fundamentals): Use Claude/ChatGPT to:
- Explain SQL errors (“Why does this JOIN return NULL?”)
- Generate sample SQL queries for practice
- Validate your schema design
- Suggest refactoring for clarity
- Week 4-5 (Python ETL): Use AI to:
- Debug pandas errors (“How do I reshape this DataFrame?”)
- Optimize code performance (“How do I vectorize this loop?”)
- Generate error handling patterns
- Suggest pandas functions for complex transformations
- Week 7-8 (Cloud & Optimization): Use AI to:
- Troubleshoot GCP Cloud SQL configuration issues
- Suggest index strategies for slow queries
- Review security practices
- Generate Cloud Functions code for automation
Three-Options Prompting Protocol (Weeks 4-8, AIAS Level 2-3):
For all milestone submissions (M4-M7) and the final deliverable, students must use the three-options format when using AI for design or architectural decisions:
- Prompt AI for three distinct options (e.g., “Give me three different approaches to structuring this ETL pipeline”)
- Summarize each option briefly
- Select one and justify the choice in 2-3 sentences based on your specific project context
Submission format:
AI Prompt: [paste exact prompt]
Option A: [1-sentence summary]
Option B: [1-sentence summary]
Option C: [1-sentence summary]
Chosen: [A/B/C]
Rationale: [why this option fits your context better than the others]
This replaces generic AI attribution (“I used Copilot”) with evidence of evaluative judgment. Options must differ on at least one meaningful dimension (e.g., performance vs. simplicity vs. maintainability). The three-options protocol does not apply to Weeks 1-3 (pre-AI / AIAS Level 0-1 phases).
Studio Session Topics:
- Week 1: ER diagram design walkthrough + SQL SELECT fundamentals
- Week 2: Advanced SQL patterns (window functions, CTEs, complex JOINs)
- Week 3: Database normalization decisions + denormalization trade-offs
- Week 4: SQL→Python translation — SELECT→filter, GROUP BY→groupby, JOIN→merge
- Week 5: Error handling patterns in Python + logging best practices
- Week 6: Data quality + API authentication — cleaning messy data, managing credentials safely
- Week 7: Query optimization + index strategies + GCP Cloud SQL setup
- Week 8: Final presentations + portfolio feedback
Assessment Summary
| Component | Weight Range | Notes |
|---|---|---|
| Weekly assignments (exercises, quizzes, labs, peer reviews) | 30-40% | Individual, skill-building |
| Project milestones (M1-M7) | 20-30% | Team, progressive deliverables |
| Final project deliverable | 15-20% | Team, culminating system |
| Oral defense | 20-25% | Team, live demo + Q&A |
| Studio participation | 5-10% | Weekly attendance + engagement |
No traditional exam. All assessment is project-based + participation.
AI Usage Levels (AIAS)
| Assessment | AIAS Level | AI Permitted |
|---|---|---|
| Quizzes (Weeks 1, 2, 7) | 0 | No AI |
| Weekly labs + exercises | 1 | AI for ideation and exploration only |
| Project milestones M1-M3 (schema/ERD) | 1 | AI for ideation and schema brainstorming only |
| Project milestones M4-M6 (ETL pipeline) | 2 | AI for code assistance and debugging, with attribution |
| Project milestone M7 + final deliverable (cloud deployment) | 3 | AI as full collaborator for AWS troubleshooting and code review, with attribution |
| Oral defense | 0 | No AI |
| Studio participation | 1 | AI for exploration during exercises |
Technology Stack
- Database: SQLFiddle (learning), MySQL/PostgreSQL locally or GCP Cloud SQL (projects)
- Python Libraries: pandas, sqlalchemy, requests, logging
- Cloud: GCP Cloud SQL (relational), GCP Free Tier or student credits
- IDE: VS Code with GitHub Copilot; Google Colab (browser alternative)
- Notebooks: Jupyter Notebooks (via Colab or VS Code)
- Version Control: GitHub
- Diagramming: Lucidchart/DrawIO for ERDs
- Data Sources: Yelp API, Census Bureau, Yahoo Finance, public datasets
Prerequisites & Assumptions
- No SQL experience required — this course starts from zero
- Python fundamentals required (loops, functions, data types) — covered in the pre-program Python orientation
- Comfortable installing software + troubleshooting (Jupyter setup guide provided in onboarding)
Bridge Module: Python & Tools Orientation (Pre-Course, ~10 hours)
Complete before Week 1. Available in Canvas as a self-paced module with self-check quizzes.
| Unit | Topics | Format | Self-Check |
|---|---|---|---|
| 1. Python Basics (3 hrs) | Variables, data types, loops, conditionals, functions | Video walkthroughs + coding exercises in Jupyter | Quiz: write a function that filters a list |
| 2. Data Structures (2 hrs) | Lists, dictionaries, reading CSV files | Jupyter exercises with sample datasets | Quiz: load a CSV and answer 3 questions about it |
| 3. Jupyter Notebook Workflow (2 hrs) | Cells, markdown, running code, exporting notebooks | Guided walkthrough | Quiz: create a notebook with code + markdown cells |
| 4. GitHub Essentials (2 hrs) | Create account, clone repo, commit, push, write a README | Step-by-step tutorial | Quiz: push a Jupyter notebook to a new repo |
| 5. Environment Setup (1 hr) | Install Python, Jupyter, VS Code; verify pandas/sqlalchemy | Checklist + troubleshooting guide | Checkpoint: run provided test script successfully |
Readiness check: Students who pass all 5 self-check quizzes (70% threshold) are cleared for Week 1. Students who don’t pass receive targeted resources and can retake.
Pedagogical Notes for Faculty
Design suggestions grounded in program research — not requirements. Adapt to your course and teaching style. Full references in reference/articles/.
The scenic route (cognitive friction) Pre-AI phases aren’t punishment — they’re the scenic route. Schultz’s dopamine research shows the brain registers learning through prediction errors (the gap between expectation and outcome), not through frictionless delivery. When students design their ERD by hand before asking Copilot to generate CREATE TABLE statements, that struggle is the mechanism — not the obstacle. Consider where in each week students should formulate their own approach before consulting AI. The AIAS progression in this course (0→1→2→3 across weeks) already builds this in; the key is framing it for students as investment, not restriction. → Machulla (2026), Schultz et al. (1997)
The IKEA effect (completion matters) Students value what they build — but only when labor leads to a finished artifact. Building and then discarding produces no ownership effect. The M1→M7→Final Deliverable pipeline already does this well: each milestone closes a loop. The oral defense is the ultimate completion signal — the moment where effort converts to demonstrated competence. When possible, make each milestone feel like a working thing (a queryable schema, a running pipeline), not just a checkpoint document. → Norton, Mochon & Ariely (2012)
Prompt sophistication as skill progression Anthropic data shows r > 0.92 correlation between prompt sophistication and output quality, and multi-turn interactions succeed 67% of the time vs. 49% for single-turn. This course is most students’ first encounter with AI-assisted coding (AIAS 1→3). Consider explicitly teaching prompt patterns early — not as a sidebar, but as a core data foundations skill: “asking the right question of an AI is the same skill as writing the right SQL query.” → Means (2026, “How You Ask”)
Three AI iterations before human review For project milestones M4-M7 (ETL pipeline work at AIAS 2-3), consider requiring students to iterate with AI at least 3 times before submitting for human feedback. This builds the habit of treating AI as a drafting partner, not an answer machine, and produces richer AI Attribution Logs. → Means (2026, “Practice Gap”)
Attack your assessments Before the semester starts, have a confident AI user (TA, LD, or yourself) attempt each major assignment using current AI tools from a student’s perspective. Where can AI complete the task without genuine understanding? Those are the spots to add pre-AI phases or shift weight toward the oral defense. Repeat each semester — AI capabilities shift fast. → Furze (2026)
| Course Sequence: ← Previous: (first course) | Next: BDI 513 — Data Storytelling → |