Last updated: May 29, 2026

← MSBAi Home

BADM 554 - Enterprise Database Management

Program-level details: See program/curriculum.md

Status: In Development Instructor actively building content; recording starts April 15, 2026. Structure stable.

Live LD team data (CLOs, module items, video lengths): see badm554/sync/ — machine-generated snapshots from the canonical Box Excel files (Course Map.xlsx, Instructional Activity Roster.xlsx). Last synced: 2026-05-29 (roster now populated — 8 modules, 136 min).

Credits: 4 Term: Fall 2026 (Weeks 1-8) Instructor: Vishal

Course Vision

Learners arrive able to query data. They leave able to design it. BADM 554 follows an OLAP-first arc: weeks 1–3 build fluency querying a real analytical warehouse (BigQuery public datasets), weeks 4–6 shift to designing and building that warehouse from transactional source data, and weeks 7–8 validate, document, and defend the result. By course end, each team has a working ETL pipeline, a dimensional schema, and a defensible oral presentation — a portfolio artifact, not a tutorial completion.

Learning Outcomes

Two naming schemes are used in parallel: the L-C-E framework (Vishal’s design language, used in week-by-week planning and this file) and CLO numbers (Cheng’s LD framework, used in Course Map.xlsx and the activity roster). The mapping below is the bridge. When Cheng populates the Excel, he uses CLO numbers; this file shows both.

Excel amendment needed (for Cheng): Add an “L-C-E Label” column to the Course Overview CLO table in Course Map.xlsx. Populate with the L-C-E labels below. This lets the sync files carry both schemes and eliminates ambiguity for K-ai.

CLO # L-C-E Label Outcome
CLO 1 L1 + L2 Understand relational database concepts (tables, keys, relationships, normalization) and read/interpret SQL queries
CLO 2 L3 Recognize when data needs cleaning and describe common data quality issues
CLO 3 C1 Write SQL queries — SELECT through window functions and CTEs — to extract, transform, and aggregate data
CLO 4 C2 Design both a normalized 3NF schema and a dimensional (star) schema for a given analytical question
CLO 5 C3 + C4 Build a complete ETL pipeline (Python + pandas) and connect to cloud and local databases
CLO 6 C5 Choose and justify an appropriate SQL access pattern (notebook vs. CLI/MCP agent) for a given task
CLO 7 E1 + E2 + E3 Evaluate design trade-offs; build a production-ready pipeline; justify decisions in terms a technical stakeholder can evaluate
CLO 8 P1 Document the evolution of at least one key design decision across milestones (Black Box Window requirement)

L-C-E detail (for planning, pedagogy, and AIAS labels)

Literacy (Foundational Awareness)

Competency (Applied Skills)

Expertise (Advanced Application)

Process Learning Outcome (Black Box — see assessment_strategy.md §3.5)

Note on query optimization: Learners gain awareness that slow queries exist and why (indexes, scan patterns), covered in Week 7. Deep optimization is a DBA/data engineering concern beyond the scope of this first course — revisited in later courses as appropriate.

Course Arc (3 + 3 + 2)

Phase Weeks Schema lens What learners do
1 — Question + Warehouse SQL 1–3 Read OLAP / dimensional Query BigQuery public datasets; learn SELECT through window functions + CTEs; pick a stakeholder question and dataset
2 — Build Analytical Data Product 4–6 Build OLAP star; read OLTP 3NF Design a dimensional schema; read Sakila (3NF reference) + personal-data exports (real-world source); build ETL pipeline into DuckDB; provider-group GitHub schema negotiation
3 — Validate + Defend + Awareness 7–8 Validate end-to-end; NoSQL awareness Add validation checks + documentation; oral defense; awareness of when relational isn’t the answer

Every week: 4 framing videos (5–8 min) → curated practice → Jupyter lab → Live Session (90 min) → Project Studio (90 min) → concept assignment → InScribe post → project milestone.

Week-by-Week Breakdown

Week Topic CLO focus Project milestone
1 Warehouse SQL: SELECT / WHERE / GROUP BY on BigQuery public data CLO 1, 3 M1 Launch — team reg + dataset scoping (formative)
2 JOINs + aggregates: combining tables to answer real questions CLO 1, 3 M1 Initial Pitch (5%) — 5-min team pitch: stakeholder, framing, 3–5 questions
3 Window functions, CTEs + parallel access patterns (CLI/MCP introduced) CLO 3, 6 Proposal Draft — peer-reviewed in Studio (formative)
4 Dimensional modeling: star schemas, grain, facts and dimensions CLO 4 M2 Proposal Final (15%) — schema design + ETL plan + feedback-closure memo; Wk 4 peer pulse (1%)
5 Source-data reality: 3NF (Sakila) vs personal-data exports (real-world mess) CLO 2, 4, 5 Analysis Draft — Studio progress check-in (formative)
6 ETL: 3NF → DuckDB warehouse + provider-group GitHub schema negotiation CLO 5, 6 Peer review of draft data product (participation)
7 Validation, documentation, AI Attribution Log wrap-up CLO 2, 7 Revision + feedback-closure note (feeds Wk 8 final)
8 NoSQL awareness + Final Deliverable + Individual Oral Defense CLO 7, 8 Final (17%) + Oral (20%) + Wk 8 peer eval (4%)

Team Project: Stakeholder-Driven Analytical Data Product (Team of max 3, Weeks 1–8)

Learner teams pick a stakeholder + dataset pairing from a curated list or propose their own (subject to instructor approval). The pairing defines a real analytical question the team will answer with a designed, built, and validated data product.

Curated dataset pairings (BigQuery public datasets — no download required):

Weeks 5–6 source data: Learners also work with their own personal-data exports (Spotify, YouTube, Strava) as a real-world messy source — uniquely AI-resistant because the data is individual. Provider-groups (learners with the same export source) collaborate via a shared GitHub repo to negotiate a shared schema.

3NF reference dataset (Weeks 5–6): Sakila via the Pagila Postgres port — a clean textbook 3NF schema used as a contrast to the real-world mess of personal-data exports.

Project Milestones

Milestone Week Weight Deliverables
M1 Launch 1 Formative Team registration, dataset + stakeholder pick, 1-page scoping notes
M1 Initial Pitch 2 5% 5-min pitch in Live Session: stakeholder, 3–5 questions, dataset rationale
Proposal Draft 3 Formative Draft proposal, peer-reviewed in Studio
M2 Proposal Final 4 15% Final proposal: stakeholder, questions, dimensional schema design, ETL plan, AI Attribution checkpoint, feedback-closure memo
Analysis Draft 5 Formative Studio progress check-in
Peer Review 6 Participation Submit draft data product for structured peer review
Revision 7 Formative Revised draft + feedback-closure note
Final Deliverable 8 17% Working ETL + dimensional schema + 2–3 validated analyses + GitHub repo + README
Individual Oral Defense 8 20% 12–15 min team presentation + individual Q&A (AIAS 0, no AI)
Wk 8 Peer Evaluation 8 4% Summative anonymous teammate rating

Final Deliverable (17% of grade) — key components

Rubric (6 dimensions):

Dimension Excellent (A) Proficient (B) Developing (C)
Schema Design Star schema properly grained, dimensions clean, trade-offs justified Mostly correct schema with minor issues Grain problems or missing relationships
ETL Pipeline Modular, idempotent, documented, robust validation checks Good structure, handles happy path Fragile pipeline, minimal error handling
Source-Data Handling Personal-data and/or BigQuery source read correctly; provenance documented Source read works, provenance partial Source integration missing or broken
Documentation Complete architecture diagram, clear setup guide, code comments Adequate documentation Incomplete or unclear
Business Understanding Analyses answer the stakeholder question; decisions explained in business terms Mentions business context No business rationale
Learning Trajectory Clear pivot or error recovery with explicit rationale — shows how thinking evolved M1 → Final Some revision with partial explanation Final product with no visible iteration

Oral Defense (20-25% of grade)

Week 8 studio session. Each team presents their pipeline design, demonstrates the live system, and fields Q&A from instructor and peers.

Rubric (3 dimensions):

Dimension Excellent (A) Proficient (B) Developing (C)
Technical Explanation Clear walkthrough of design decisions, trade-offs articulated Adequate explanation of system Unclear or surface-level description
Live Demo System works end-to-end, handles follow-up queries confidently Demo works with minor issues Demo fails or cannot answer questions
Individual Contribution Each member explains their role with depth Members can describe their work Uneven participation or vague answers

Weekly Assignments (30-40% of grade)

Practice exercises, labs, discussions, and peer reviews that build skills for the project. Completed individually unless noted.

Week Assignment Format
1 SQL fundamentals exercises + quiz Practice problems + timed quiz
2 SQL query problem set + quiz JOINs, subqueries, aggregations
3 ER diagram case study + peer review Design exercise + review a classmate’s ERD
4 DataCamp assignment + pandas lab DataCamp module + SQL→pandas translation exercises
5 ETL coding exercises + peer code review Coding problems + review a classmate’s pipeline code
6 Data quality + API integration lab Clean a messy real-world dataset + build an API connector
7 Cloud deployment lab + automation exercise Deploy pipeline to GCP BigQuery + configure automated scheduling
8 Portfolio reflection Written reflection on learning journey and skill growth

AI Tools Integration

Where AI Accelerates Learning:

  1. Week 1 (SQL Fundamentals): Use Claude/ChatGPT to:
    • Explain SQL errors (“Why does this JOIN return NULL?”)
    • Generate sample SQL queries for practice
    • Validate your schema design
    • Suggest refactoring for clarity
  2. Week 4-5 (Python ETL): Use AI to:
    • Debug pandas errors (“How do I reshape this DataFrame?”)
    • Optimize code performance (“How do I vectorize this loop?”)
    • Generate error handling patterns
    • Suggest pandas functions for complex transformations
  3. Week 7-8 (Cloud & Optimization): Use AI to:
    • Troubleshoot GCP BigQuery configuration issues
    • Suggest index strategies for slow queries
    • Review security practices
    • Generate Cloud Functions code for automation

Three-Options Prompting Protocol (Weeks 4-8, AIAS Level 2-3):

For all milestone submissions (M4-M7) and the final deliverable, students must use the three-options format when using AI for design or architectural decisions:

  1. Prompt AI for three distinct options (e.g., “Give me three different approaches to structuring this ETL pipeline”)
  2. Summarize each option briefly
  3. Select one and justify the choice in 2-3 sentences based on your specific project context

Submission format:

AI Prompt: [paste exact prompt]
Option A: [1-sentence summary]
Option B: [1-sentence summary]
Option C: [1-sentence summary]
Chosen: [A/B/C]
Rationale: [why this option fits your context better than the others]

This replaces generic AI attribution (“I used Copilot”) with evidence of evaluative judgment. Options must differ on at least one meaningful dimension (e.g., performance vs. simplicity vs. maintainability). The three-options protocol does not apply to Weeks 1-3 (pre-AI / AIAS Level 0-1 phases).

Per-Milestone AI Attribution Log (program-standard format):

Aligned with design/project_milestone_template.md. Required at four checkpoints:

Milestone What to document
M2 (Wk 2) ERD design decisions — what AI suggested, what you changed, why
M4 (Wk 4) Extraction script — AI-suggested approaches accepted vs. rejected, with rationale
M6 (Wk 6) ETL pipeline — which logic was AI-generated vs. human-specified; flag any AI output accepted without verification
Final (Wk 8) Full Attribution Log covering all AI use across the project + 1-paragraph reflection: how did AI use evolve from M1 to Final? What would you do differently?

Format for each entry:

Milestone: [M2 / M4 / M6 / Final]
AI Tool Used: [Claude / Copilot / ChatGPT / other]
Prompt (summarized): [what you asked]
Output Used: [what you kept]
Output Rejected: [what you discarded and why]
Human Decision: [what judgment you applied that AI could not]

Studio Session Topics:

Assessment Summary

Component Weight Range Notes
Weekly assignments (exercises, quizzes, labs) 30–40% Individual, skill-building
Project milestones (M1 Wk2, M2 Wk4 + formatives) 20–30% Team
Final project deliverable (Wk 8) 15–20% Team
Individual oral defense (Wk 8) 20–25% Individual, AIAS 0
Peer evaluation 5% Wk 4 formative 1% + Wk 8 summative 4%
InScribe Live Session engagement 3–5% Weekly posts

No traditional exam. All assessment is project-based, skills-based, and participation.

AI Usage Levels (AIAS)

Assessment AIAS Level AI Permitted
Quizzes (Weeks 1, 2) 0 No AI
Weekly labs + exercises 1 AI for ideation and exploration only
Project milestones M1-M3 (schema/ERD) 1 AI for ideation and schema brainstorming only
Project milestones M4-M6 (ETL pipeline) 2 AI for code assistance and debugging, with attribution
Project milestone M7 + final deliverable (cloud deployment) 3 AI as full collaborator for GCP troubleshooting and code review, with attribution
Oral defense 0 No AI
Studio participation 1 AI for exploration during exercises

Technology Stack

Prerequisites & Assumptions

Preparatory Coursework Alignment (Staggered Model, 2026-05-18)

Per the program-wide staggered preparatory coursework model (see program/curriculum.md “Preparatory Coursework” and DECISIONS.md 2026-05-18), students arrive in BADM 554 with the following prep status:

Module Required by Tier
GitHub + VS Code (DataCamp) Before Day 1 (BADM 554 Week 1) Gated — completion required
Gen AI literacy End of Week 4 Gated — runs in parallel with this course
Python & Tools prep (Coursera — specific course TBD) Before BDI 513 (Week 5) Self-diagnostic; students may still be working through during weeks 1-4
Stats MOOCs 1 + 2 Before FIN 550 (Week 9) Self-diagnostic

Implications for BADM 554:

Faculty office hours during Weeks 1-3 are recommended for students still finishing Python self-diagnostic prep — see Ron Guymon’s 2026-04-15 note on zero-coding-experience learners.


Pedagogical Notes for Faculty

Design suggestions grounded in program research — not requirements. Adapt to your course and teaching style. Full references in reference/articles/.

The scenic route (cognitive friction) Pre-AI phases aren’t punishment — they’re the scenic route. Schultz’s dopamine research shows the brain registers learning through prediction errors (the gap between expectation and outcome), not through frictionless delivery. When students design their ERD by hand before asking Copilot to generate CREATE TABLE statements, that struggle is the mechanism — not the obstacle. Consider where in each week students should formulate their own approach before consulting AI. The AIAS progression in this course (0→1→2→3 across weeks) already builds this in; the key is framing it for students as investment, not restriction. → Machulla (2026), Schultz et al. (1997)

The IKEA effect (completion matters) Students value what they build — but only when labor leads to a finished artifact. Building and then discarding produces no ownership effect. The M1→M7→Final Deliverable pipeline already does this well: each milestone closes a loop. The oral defense is the ultimate completion signal — the moment where effort converts to demonstrated competence. When possible, make each milestone feel like a working thing (a queryable schema, a running pipeline), not just a checkpoint document. → Norton, Mochon & Ariely (2012)

Prompt sophistication as skill progression Anthropic data shows r > 0.92 correlation between prompt sophistication and output quality, and multi-turn interactions succeed 67% of the time vs. 49% for single-turn. This course is most students’ first encounter with AI-assisted coding (AIAS 1→3). Consider explicitly teaching prompt patterns early — not as a sidebar, but as a core data foundations skill: “asking the right question of an AI is the same skill as writing the right SQL query.” → Means (2026, “How You Ask”)

Three AI iterations before human review For project milestones M4-M7 (ETL pipeline work at AIAS 2-3), consider requiring students to iterate with AI at least 3 times before submitting for human feedback. This builds the habit of treating AI as a drafting partner, not an answer machine, and produces richer AI Attribution Logs. → Means (2026, “Practice Gap”)

Attack your assessments Before the semester starts, have a confident AI user (TA, LD, or yourself) attempt each major assignment using current AI tools from a student’s perspective. Where can AI complete the task without genuine understanding? Those are the spots to add pre-AI phases or shift weight toward the oral defense. Repeat each semester — AI capabilities shift fast. → Furze (2026)

The cognitive offloading U-curve (Zone 2 is the danger zone) Learning outcomes follow a three-zone pattern: Zone 1 (no AI) = full cognitive load, slow but real learning; Zone 2 (scattered, unstructured AI use) = worse outcomes than no AI at all — coordination overhead without genuine cognitive reallocation; Zone 3 (committed, structured delegation) = superior outcomes through deliberate offloading of routine work while investing freed capacity in higher-order reasoning. The AIAS progression in this course (0→1→2→3) is designed to move learners through Zone 1 early, through Zone 2 quickly, and into Zone 3 by Weeks 5-8. The risk is AIAS Level 1 (“AI for ideation only”) becoming a Zone 2 trap if students engage with AI vaguely and inconsistently. Mitigate by giving Level 1 tasks a clear structure: “Before using AI, write your own answer. Then ask AI. Then compare and justify any changes.” That sequence is Zone 3 behavior, even at low AI permission levels. Separately: a 2026 study of 912 learners found that “partnership orientation” — treating AI as a thinking partner rather than an answer machine — independently predicted deeper learning regardless of how much AI was used. The mental model learners bring to AI matters as much as the permission level. Frame Week 1’s AI orientation around this: “Using AI well is the same skill as writing the right SQL query — the quality of your question determines the quality of the answer.”Hardman (2026), “The Cognitive Offloading Paradox”; Wang & Zhang (2026)

AI as material for thinking, not a shortcut around cognition (design thinking checklist) Before finalizing any assignment or milestone, apply the three diagnostic questions from assessment_strategy.md §3.4: (1) What is the learning outcome this task targets? (2) What cognitive work must remain with the student to achieve that outcome? (3) What kind of AI assistance makes that cognitive work more visible, more rigorous, or more equitable — rather than replacing it? For BADM 554: the cognitive work that must stay with students is schema design judgment (Wks 1-3) and pipeline architecture decisions (Wks 4-7). AI assistance at those stages should make the student’s reasoning more explicit — not substitute for it. The per-milestone attribution log is the mechanism. → Vander (2026), “Design Thinking for AI-Integrated Pedagogy”

Faculty action required: Before T&L course design review, document your answers to the three diagnostic questions above in writing for this course. See assessment_strategy.md §3.4 for the submission format.


Recruiting Talking Points

Drafted by Vishal Sachdev. Adopted as the model for program-wide faculty talking points (Ravi Mehta directive, 2026-05-27). See DECISIONS.md “Faculty course talking points: three-question framework for recruiting.”

What does the course cover? BADM 554 is the first course of MSBAi: 8 weeks of database fluency for working professionals pivoting toward analyst work. Students learn SQL, schema design, and ETL on real enterprise data; no prior SQL required. Each week pairs a stakeholder conversation with a pre-AI / AI-mediated / post-AI workflow that treats AI as a thinking partner, not a shortcut.

What do students build? Teams of three adopt a stakeholder and a real dataset in Week 1, then spend 8 weeks building a portfolio GitHub repository: a dimensional warehouse, an ETL notebook ingesting from a live public data source, an AI Attribution Log, and a cloud-hosted version of the warehouse with a queryable URL. Each student defends the team’s work in an individual oral examination at Week 8.

Why does it matter for careers? SQL and schema judgment are baseline. What separates hireable analysts is talking to a non-technical stakeholder, working with AI without being replaced by it, and shipping a defensible artifact another person can audit. BADM 554 builds those three skills explicitly.


Course Sequence: ← Previous: (first course) Next: BDI 513 — Data Storytelling