Last updated: May 29, 2026

BADM 554 - Enterprise Database Management

Program-level details: See program/curriculum.md

Status: In Development Instructor actively building content; recording starts April 15, 2026. Structure stable.

Live LD team data (CLOs, module items, video lengths): see badm554/sync/ — machine-generated snapshots from the canonical Box Excel files (Course Map.xlsx, Instructional Activity Roster.xlsx). Last synced: 2026-05-29 (roster now populated — 8 modules, 136 min).

Credits: 4

Term: Fall 2026 (Weeks 1-8)

Instructor: Vishal

Course Vision

Learners arrive able to query data. They leave able to design it. BADM 554 follows an OLAP-first arc: weeks 1–3 build fluency querying a real analytical warehouse (BigQuery public datasets), weeks 4–6 shift to designing and building that warehouse from transactional source data, and weeks 7–8 validate, document, and defend the result. By course end, each team has a working ETL pipeline, a dimensional schema, and a defensible oral presentation — a portfolio artifact, not a tutorial completion.

Learning Outcomes

Two naming schemes are used in parallel: the L-C-E framework (Vishal’s design language, used in week-by-week planning and this file) and CLO numbers (Cheng’s LD framework, used in Course Map.xlsx and the activity roster). The mapping below is the bridge. When Cheng populates the Excel, he uses CLO numbers; this file shows both.

Excel amendment needed (for Cheng): Add an “L-C-E Label” column to the Course Overview CLO table in Course Map.xlsx. Populate with the L-C-E labels below. This lets the sync files carry both schemes and eliminates ambiguity for K-ai.

CLO #	L-C-E Label	Outcome
CLO 1	L1 + L2	Understand relational database concepts (tables, keys, relationships, normalization) and read/interpret SQL queries
CLO 2	L3	Recognize when data needs cleaning and describe common data quality issues
CLO 3	C1	Write SQL queries — SELECT through window functions and CTEs — to extract, transform, and aggregate data
CLO 4	C2	Design both a normalized 3NF schema and a dimensional (star) schema for a given analytical question
CLO 5	C3 + C4	Build a complete ETL pipeline (Python + pandas) and connect to cloud and local databases
CLO 6	C5	Choose and justify an appropriate SQL access pattern (notebook vs. CLI/MCP agent) for a given task
CLO 7	E1 + E2 + E3	Evaluate design trade-offs; build a production-ready pipeline; justify decisions in terms a technical stakeholder can evaluate
CLO 8	P1	Document the evolution of at least one key design decision across milestones (Black Box Window requirement)

L-C-E detail (for planning, pedagogy, and AIAS labels)

Literacy (Foundational Awareness)

L1: Understand relational database concepts (tables, keys, relationships) and explain why normalization matters
L2: Read and interpret SQL queries to identify what data they retrieve
L3: Recognize when data needs cleaning and describe common data quality issues

Competency (Applied Skills)

C1: Write SQL queries — SELECT, JOIN, GROUP BY, window functions, CTEs — to extract and aggregate data
C2: Design a normalized schema (3NF) and a dimensional schema (star) for a given analytical question
C3: Build a complete ETL pipeline using Python (pandas, sqlalchemy) from a transactional source to a dimensional warehouse
C4: Connect to and query cloud and local databases in multiple environments (Colab notebook, CLI, MCP agent)
C5: Choose the appropriate SQL access pattern (notebook vs. CLI/MCP) for a given task and articulate the trade-off

Expertise (Advanced Application)

E1: Evaluate design trade-offs (normalization vs. denormalization, 3NF vs. star) for specific use cases
E2: Build a production-ready data pipeline with validation checks, error handling, and documentation
E3: Justify data architecture and pipeline design decisions to a technical hiring manager — cost, scalability, fitness for the analytical use case

Process Learning Outcome (Black Box — see assessment_strategy.md §3.5)

P1: Document the evolution of at least one key design decision across milestones — including the initial approach, what prompted a revision or pivot, and what was learned from the change

Note on query optimization: Learners gain awareness that slow queries exist and why (indexes, scan patterns), covered in Week 7. Deep optimization is a DBA/data engineering concern beyond the scope of this first course — revisited in later courses as appropriate.

Course Arc (3 + 3 + 2)

Phase	Weeks	Schema lens	What learners do
1 — Question + Warehouse SQL	1–3	Read OLAP / dimensional	Query BigQuery public datasets; learn SELECT through window functions + CTEs; pick a stakeholder question and dataset
2 — Build Analytical Data Product	4–6	Build OLAP star; read OLTP 3NF	Design a dimensional schema; read Sakila (3NF reference) + personal-data exports (real-world source); build ETL pipeline into DuckDB; provider-group GitHub schema negotiation
3 — Validate + Defend + Awareness	7–8	Validate end-to-end; NoSQL awareness	Add validation checks + documentation; oral defense; awareness of when relational isn’t the answer

Every week: 4 framing videos (5–8 min) → curated practice → Jupyter lab → Live Session (90 min) → Project Studio (90 min) → concept assignment → InScribe post → project milestone.

Week-by-Week Breakdown

Week	Topic	CLO focus	Project milestone
1	Warehouse SQL: SELECT / WHERE / GROUP BY on BigQuery public data	CLO 1, 3	M1 Launch — team reg + dataset scoping (formative)
2	JOINs + aggregates: combining tables to answer real questions	CLO 1, 3	M1 Initial Pitch (5%) — 5-min team pitch: stakeholder, framing, 3–5 questions
3	Window functions, CTEs + parallel access patterns (CLI/MCP introduced)	CLO 3, 6	Proposal Draft — peer-reviewed in Studio (formative)
4	Dimensional modeling: star schemas, grain, facts and dimensions	CLO 4	M2 Proposal Final (15%) — schema design + ETL plan + feedback-closure memo; Wk 4 peer pulse (1%)
5	Source-data reality: 3NF (Sakila) vs personal-data exports (real-world mess)	CLO 2, 4, 5	Analysis Draft — Studio progress check-in (formative)
6	ETL: 3NF → DuckDB warehouse + provider-group GitHub schema negotiation	CLO 5, 6	Peer review of draft data product (participation)
7	Validation, documentation, AI Attribution Log wrap-up	CLO 2, 7	Revision + feedback-closure note (feeds Wk 8 final)
8	NoSQL awareness + Final Deliverable + Individual Oral Defense	CLO 7, 8	Final (17%) + Oral (20%) + Wk 8 peer eval (4%)

Team Project: Stakeholder-Driven Analytical Data Product (Team of max 3, Weeks 1–8)

Learner teams pick a stakeholder + dataset pairing from a curated list or propose their own (subject to instructor approval). The pairing defines a real analytical question the team will answer with a designed, built, and validated data product.

Curated dataset pairings (BigQuery public datasets — no download required):

NYC Taxi Trips + city transportation planner
Austin 311 Service Requests + city operations director
Stack Overflow survey + engineering hiring manager
USA Names (SSA) + school district demographer
Iowa Liquor Sales + regional retail buyer
Google Analytics sample + e-commerce growth analyst

Weeks 5–6 source data: Learners also work with their own personal-data exports (Spotify, YouTube, Strava) as a real-world messy source — uniquely AI-resistant because the data is individual. Provider-groups (learners with the same export source) collaborate via a shared GitHub repo to negotiate a shared schema.

3NF reference dataset (Weeks 5–6): Sakila via the Pagila Postgres port — a clean textbook 3NF schema used as a contrast to the real-world mess of personal-data exports.

Project Milestones

Milestone	Week	Weight	Deliverables
M1 Launch	1	Formative	Team registration, dataset + stakeholder pick, 1-page scoping notes
M1 Initial Pitch	2	5%	5-min pitch in Live Session: stakeholder, 3–5 questions, dataset rationale
Proposal Draft	3	Formative	Draft proposal, peer-reviewed in Studio
M2 Proposal Final	4	15%	Final proposal: stakeholder, questions, dimensional schema design, ETL plan, AI Attribution checkpoint, feedback-closure memo
Analysis Draft	5	Formative	Studio progress check-in
Peer Review	6	Participation	Submit draft data product for structured peer review
Revision	7	Formative	Revised draft + feedback-closure note
Final Deliverable	8	17%	Working ETL + dimensional schema + 2–3 validated analyses + GitHub repo + README
Individual Oral Defense	8	20%	12–15 min team presentation + individual Q&A (AIAS 0, no AI)
Wk 8 Peer Evaluation	8	4%	Summative anonymous teammate rating

Final Deliverable (17% of grade) — key components

DuckDB warehouse with fact + dimension tables, populated from source data
ETL notebook (idempotent, documented, with validation checks)
2–3 analyses answering the stakeholder’s original questions
Full AI Attribution Log + 1-paragraph reflection on how AI use evolved M1 → Final
GitHub repo with all code, schema documentation, and setup guide
Architecture diagram (data flow: source → transform → warehouse → analysis)

Rubric (6 dimensions):

Dimension	Excellent (A)	Proficient (B)	Developing (C)
Schema Design	Star schema properly grained, dimensions clean, trade-offs justified	Mostly correct schema with minor issues	Grain problems or missing relationships
ETL Pipeline	Modular, idempotent, documented, robust validation checks	Good structure, handles happy path	Fragile pipeline, minimal error handling
Source-Data Handling	Personal-data and/or BigQuery source read correctly; provenance documented	Source read works, provenance partial	Source integration missing or broken
Documentation	Complete architecture diagram, clear setup guide, code comments	Adequate documentation	Incomplete or unclear
Business Understanding	Analyses answer the stakeholder question; decisions explained in business terms	Mentions business context	No business rationale
Learning Trajectory	Clear pivot or error recovery with explicit rationale — shows how thinking evolved M1 → Final	Some revision with partial explanation	Final product with no visible iteration

Oral Defense (20-25% of grade)

Week 8 studio session. Each team presents their pipeline design, demonstrates the live system, and fields Q&A from instructor and peers.

Rubric (3 dimensions):

Dimension	Excellent (A)	Proficient (B)	Developing (C)
Technical Explanation	Clear walkthrough of design decisions, trade-offs articulated	Adequate explanation of system	Unclear or surface-level description
Live Demo	System works end-to-end, handles follow-up queries confidently	Demo works with minor issues	Demo fails or cannot answer questions
Individual Contribution	Each member explains their role with depth	Members can describe their work	Uneven participation or vague answers

Weekly Assignments (30-40% of grade)

Practice exercises, labs, discussions, and peer reviews that build skills for the project. Completed individually unless noted.

Week	Assignment	Format
1	SQL fundamentals exercises + quiz	Practice problems + timed quiz
2	SQL query problem set + quiz	JOINs, subqueries, aggregations
3	ER diagram case study + peer review	Design exercise + review a classmate’s ERD
4	DataCamp assignment + pandas lab	DataCamp module + SQL→pandas translation exercises
5	ETL coding exercises + peer code review	Coding problems + review a classmate’s pipeline code
6	Data quality + API integration lab	Clean a messy real-world dataset + build an API connector
7	Cloud deployment lab + automation exercise	Deploy pipeline to GCP BigQuery + configure automated scheduling
8	Portfolio reflection	Written reflection on learning journey and skill growth

AI Tools Integration

Where AI Accelerates Learning:

Week 1 (SQL Fundamentals): Use Claude/ChatGPT to:
- Explain SQL errors (“Why does this JOIN return NULL?”)
- Generate sample SQL queries for practice
- Validate your schema design
- Suggest refactoring for clarity
Week 4-5 (Python ETL): Use AI to:
- Debug pandas errors (“How do I reshape this DataFrame?”)
- Optimize code performance (“How do I vectorize this loop?”)
- Generate error handling patterns
- Suggest pandas functions for complex transformations
Week 7-8 (Cloud & Optimization): Use AI to:
- Troubleshoot GCP BigQuery configuration issues
- Suggest index strategies for slow queries
- Review security practices
- Generate Cloud Functions code for automation

Three-Options Prompting Protocol (Weeks 4-8, AIAS Level 2-3):

For all milestone submissions (M4-M7) and the final deliverable, students must use the three-options format when using AI for design or architectural decisions:

Prompt AI for three distinct options (e.g., “Give me three different approaches to structuring this ETL pipeline”)
Summarize each option briefly
Select one and justify the choice in 2-3 sentences based on your specific project context

Submission format:

AI Prompt: [paste exact prompt]
Option A: [1-sentence summary]
Option B: [1-sentence summary]
Option C: [1-sentence summary]
Chosen: [A/B/C]
Rationale: [why this option fits your context better than the others]

This replaces generic AI attribution (“I used Copilot”) with evidence of evaluative judgment. Options must differ on at least one meaningful dimension (e.g., performance vs. simplicity vs. maintainability). The three-options protocol does not apply to Weeks 1-3 (pre-AI / AIAS Level 0-1 phases).

Per-Milestone AI Attribution Log (program-standard format):

Aligned with design/project_milestone_template.md. Required at four checkpoints:

Milestone	What to document
M2 (Wk 2)	ERD design decisions — what AI suggested, what you changed, why
M4 (Wk 4)	Extraction script — AI-suggested approaches accepted vs. rejected, with rationale
M6 (Wk 6)	ETL pipeline — which logic was AI-generated vs. human-specified; flag any AI output accepted without verification
Final (Wk 8)	Full Attribution Log covering all AI use across the project + 1-paragraph reflection: how did AI use evolve from M1 to Final? What would you do differently?

Format for each entry:

Milestone: [M2 / M4 / M6 / Final]
AI Tool Used: [Claude / Copilot / ChatGPT / other]
Prompt (summarized): [what you asked]
Output Used: [what you kept]
Output Rejected: [what you discarded and why]
Human Decision: [what judgment you applied that AI could not]

Studio Session Topics:

Week 1: ER diagram design walkthrough + SQL SELECT fundamentals
Week 2: Advanced SQL patterns (window functions, CTEs, complex JOINs)
Week 3: Database normalization decisions + denormalization trade-offs
Week 4: SQL→Python translation — SELECT→filter, GROUP BY→groupby, JOIN→merge
Week 5: Error handling patterns in Python + logging best practices
Week 6: Data quality + API authentication — cleaning messy data, managing credentials safely
Week 7: GCP BigQuery setup + pipeline automation — deploy, connect, schedule, troubleshoot
Week 8: Final presentations + portfolio feedback

Assessment Summary

Component	Weight Range	Notes
Weekly assignments (exercises, quizzes, labs)	30–40%	Individual, skill-building
Project milestones (M1 Wk2, M2 Wk4 + formatives)	20–30%	Team
Final project deliverable (Wk 8)	15–20%	Team
Individual oral defense (Wk 8)	20–25%	Individual, AIAS 0
Peer evaluation	5%	Wk 4 formative 1% + Wk 8 summative 4%
InScribe Live Session engagement	3–5%	Weekly posts

No traditional exam. All assessment is project-based, skills-based, and participation.

AI Usage Levels (AIAS)

Assessment	AIAS Level	AI Permitted
Quizzes (Weeks 1, 2)	0	No AI
Weekly labs + exercises	1	AI for ideation and exploration only
Project milestones M1-M3 (schema/ERD)	1	AI for ideation and schema brainstorming only
Project milestones M4-M6 (ETL pipeline)	2	AI for code assistance and debugging, with attribution
Project milestone M7 + final deliverable (cloud deployment)	3	AI as full collaborator for GCP troubleshooting and code review, with attribution
Oral defense	0	No AI
Studio participation	1	AI for exploration during exercises

Technology Stack

Query targets: BigQuery public datasets (Wks 1–4, cloud notebook); local Postgres with Sakila/Pagila (Wks 5–6, 3NF reference); personal-data exports in DuckDB (Wks 5–6, real-world source)
Analytical warehouse: DuckDB (local, embedded — fast setup, no server; learners build their star schema here)
Python libraries: pandas, sqlalchemy, psycopg2, requests, duckdb
Access patterns: Jupyter/Colab notebook (primary Wks 1–4); VS Code + bq CLI + MCP agent (introduced Wk 3, load-bearing Wks 4–7)
MCP config: Course-provided pin-versioned bundle (BigQuery + Postgres + DuckDB + filesystem + GitHub MCP servers) — imported in Wk 3
IDE: VS Code with GitHub Copilot; Google Colab (browser fallback)
Version control: GitHub (individual repos Wks 1–5; provider-group shared repos Wk 6)
Diagramming: Lucidchart or DrawIO for ERDs and architecture diagrams

Prerequisites & Assumptions

No SQL experience required — this course starts from zero
Python fundamentals required (loops, functions, data types) — covered in the pre-program Python orientation
Comfortable installing software + troubleshooting (Jupyter setup guide provided in onboarding)

Preparatory Coursework Alignment (Staggered Model, 2026-05-18)

Per the program-wide staggered preparatory coursework model (see program/curriculum.md “Preparatory Coursework” and DECISIONS.md 2026-05-18), students arrive in BADM 554 with the following prep status:

Module	Required by	Tier
GitHub + VS Code (DataCamp)	Before Day 1 (BADM 554 Week 1)	Gated — completion required
Gen AI literacy	End of Week 4	Gated — runs in parallel with this course
Python & Tools prep (Coursera — specific course TBD)	Before BDI 513 (Week 5)	Self-diagnostic; students may still be working through during weeks 1-4
Stats MOOCs 1 + 2	Before FIN 550 (Week 9)	Self-diagnostic

Implications for BADM 554:

Students have GitHub + VS Code set up by Day 1 (course can assume the environment)
Some students may still be building Python fluency through Week 4 (especially career pivoters); design weekly assignments accordingly
Gen AI literacy may be in-progress during the first 4 weeks; calibrate AI-aware framing in early weeks
Stats are not yet assumed; defer any statistics-dependent content to BDI 513 or later

Faculty office hours during Weeks 1-3 are recommended for students still finishing Python self-diagnostic prep — see Ron Guymon’s 2026-04-15 note on zero-coding-experience learners.

Pedagogical Notes for Faculty

Design suggestions grounded in program research — not requirements. Adapt to your course and teaching style. Full references in reference/articles/.

The scenic route (cognitive friction) Pre-AI phases aren’t punishment — they’re the scenic route. Schultz’s dopamine research shows the brain registers learning through prediction errors (the gap between expectation and outcome), not through frictionless delivery. When students design their ERD by hand before asking Copilot to generate CREATE TABLE statements, that struggle is the mechanism — not the obstacle. Consider where in each week students should formulate their own approach before consulting AI. The AIAS progression in this course (0→1→2→3 across weeks) already builds this in; the key is framing it for students as investment, not restriction. → Machulla (2026), Schultz et al. (1997)

The IKEA effect (completion matters) Students value what they build — but only when labor leads to a finished artifact. Building and then discarding produces no ownership effect. The M1→M7→Final Deliverable pipeline already does this well: each milestone closes a loop. The oral defense is the ultimate completion signal — the moment where effort converts to demonstrated competence. When possible, make each milestone feel like a working thing (a queryable schema, a running pipeline), not just a checkpoint document. → Norton, Mochon & Ariely (2012)

Prompt sophistication as skill progression Anthropic data shows r > 0.92 correlation between prompt sophistication and output quality, and multi-turn interactions succeed 67% of the time vs. 49% for single-turn. This course is most students’ first encounter with AI-assisted coding (AIAS 1→3). Consider explicitly teaching prompt patterns early — not as a sidebar, but as a core data foundations skill: “asking the right question of an AI is the same skill as writing the right SQL query.” → Means (2026, “How You Ask”)

Three AI iterations before human review For project milestones M4-M7 (ETL pipeline work at AIAS 2-3), consider requiring students to iterate with AI at least 3 times before submitting for human feedback. This builds the habit of treating AI as a drafting partner, not an answer machine, and produces richer AI Attribution Logs. → Means (2026, “Practice Gap”)

Attack your assessments Before the semester starts, have a confident AI user (TA, LD, or yourself) attempt each major assignment using current AI tools from a student’s perspective. Where can AI complete the task without genuine understanding? Those are the spots to add pre-AI phases or shift weight toward the oral defense. Repeat each semester — AI capabilities shift fast. → Furze (2026)

The cognitive offloading U-curve (Zone 2 is the danger zone) Learning outcomes follow a three-zone pattern: Zone 1 (no AI) = full cognitive load, slow but real learning; Zone 2 (scattered, unstructured AI use) = worse outcomes than no AI at all — coordination overhead without genuine cognitive reallocation; Zone 3 (committed, structured delegation) = superior outcomes through deliberate offloading of routine work while investing freed capacity in higher-order reasoning. The AIAS progression in this course (0→1→2→3) is designed to move learners through Zone 1 early, through Zone 2 quickly, and into Zone 3 by Weeks 5-8. The risk is AIAS Level 1 (“AI for ideation only”) becoming a Zone 2 trap if students engage with AI vaguely and inconsistently. Mitigate by giving Level 1 tasks a clear structure: “Before using AI, write your own answer. Then ask AI. Then compare and justify any changes.” That sequence is Zone 3 behavior, even at low AI permission levels. Separately: a 2026 study of 912 learners found that “partnership orientation” — treating AI as a thinking partner rather than an answer machine — independently predicted deeper learning regardless of how much AI was used. The mental model learners bring to AI matters as much as the permission level. Frame Week 1’s AI orientation around this: “Using AI well is the same skill as writing the right SQL query — the quality of your question determines the quality of the answer.” → Hardman (2026), “The Cognitive Offloading Paradox”; Wang & Zhang (2026)

AI as material for thinking, not a shortcut around cognition (design thinking checklist) Before finalizing any assignment or milestone, apply the three diagnostic questions from assessment_strategy.md §3.4: (1) What is the learning outcome this task targets? (2) What cognitive work must remain with the student to achieve that outcome? (3) What kind of AI assistance makes that cognitive work more visible, more rigorous, or more equitable — rather than replacing it? For BADM 554: the cognitive work that must stay with students is schema design judgment (Wks 1-3) and pipeline architecture decisions (Wks 4-7). AI assistance at those stages should make the student’s reasoning more explicit — not substitute for it. The per-milestone attribution log is the mechanism. → Vander (2026), “Design Thinking for AI-Integrated Pedagogy”

Faculty action required: Before T&L course design review, document your answers to the three diagnostic questions above in writing for this course. See assessment_strategy.md §3.4 for the submission format.

Recruiting Talking Points

Drafted by Vishal Sachdev. Adopted as the model for program-wide faculty talking points (Ravi Mehta directive, 2026-05-27). See DECISIONS.md “Faculty course talking points: three-question framework for recruiting.”

What does the course cover? BADM 554 is the first course of MSBAi: 8 weeks of database fluency for working professionals pivoting toward analyst work. Students learn SQL, schema design, and ETL on real enterprise data; no prior SQL required. Each week pairs a stakeholder conversation with a pre-AI / AI-mediated / post-AI workflow that treats AI as a thinking partner, not a shortcut.

What do students build? Teams of three adopt a stakeholder and a real dataset in Week 1, then spend 8 weeks building a portfolio GitHub repository: a dimensional warehouse, an ETL notebook ingesting from a live public data source, an AI Attribution Log, and a cloud-hosted version of the warehouse with a queryable URL. Each student defends the team’s work in an individual oral examination at Week 8.

Why does it matter for careers? SQL and schema judgment are baseline. What separates hireable analysts is talking to a non-technical stakeholder, working with AI without being replaced by it, and shipping a defensible artifact another person can audit. BADM 554 builds those three skills explicitly.

Course Sequence: ← Previous: (first course)

Next: BDI 513 — Data Storytelling →

MSBAi Curriculum Site

MSBAi - Online Master of Science in Business Analytics

AI-First curriculum design documentation for the MSBAi program launching Fall 2026