BADM 554 - Enterprise Database Management
Program-level details: See program/curriculum.md
Status: In Development Instructor actively building content; recording starts April 15, 2026. Structure stable.
Live LD team data (CLOs, module items, video lengths): see
badm554/sync/— machine-generated snapshots from the canonical Box Excel files (Course Map.xlsx,Instructional Activity Roster.xlsx). Last synced: 2026-05-29 (roster now populated — 8 modules, 136 min).
| Credits: 4 | Term: Fall 2026 (Weeks 1-8) | Instructor: Vishal |
Course Vision
Learners arrive able to query data. They leave able to design it. BADM 554 follows an OLAP-first arc: weeks 1–3 build fluency querying a real analytical warehouse (BigQuery public datasets), weeks 4–6 shift to designing and building that warehouse from transactional source data, and weeks 7–8 validate, document, and defend the result. By course end, each team has a working ETL pipeline, a dimensional schema, and a defensible oral presentation — a portfolio artifact, not a tutorial completion.
Learning Outcomes
Two naming schemes are used in parallel: the L-C-E framework (Vishal’s design language, used in week-by-week planning and this file) and CLO numbers (Cheng’s LD framework, used in Course Map.xlsx and the activity roster). The mapping below is the bridge. When Cheng populates the Excel, he uses CLO numbers; this file shows both.
Excel amendment needed (for Cheng): Add an “L-C-E Label” column to the Course Overview CLO table in Course Map.xlsx. Populate with the L-C-E labels below. This lets the sync files carry both schemes and eliminates ambiguity for K-ai.
| CLO # | L-C-E Label | Outcome |
|---|---|---|
| CLO 1 | L1 + L2 | Understand relational database concepts (tables, keys, relationships, normalization) and read/interpret SQL queries |
| CLO 2 | L3 | Recognize when data needs cleaning and describe common data quality issues |
| CLO 3 | C1 | Write SQL queries — SELECT through window functions and CTEs — to extract, transform, and aggregate data |
| CLO 4 | C2 | Design both a normalized 3NF schema and a dimensional (star) schema for a given analytical question |
| CLO 5 | C3 + C4 | Build a complete ETL pipeline (Python + pandas) and connect to cloud and local databases |
| CLO 6 | C5 | Choose and justify an appropriate SQL access pattern (notebook vs. CLI/MCP agent) for a given task |
| CLO 7 | E1 + E2 + E3 | Evaluate design trade-offs; build a production-ready pipeline; justify decisions in terms a technical stakeholder can evaluate |
| CLO 8 | P1 | Document the evolution of at least one key design decision across milestones (Black Box Window requirement) |
L-C-E detail (for planning, pedagogy, and AIAS labels)
Literacy (Foundational Awareness)
- L1: Understand relational database concepts (tables, keys, relationships) and explain why normalization matters
- L2: Read and interpret SQL queries to identify what data they retrieve
- L3: Recognize when data needs cleaning and describe common data quality issues
Competency (Applied Skills)
- C1: Write SQL queries — SELECT, JOIN, GROUP BY, window functions, CTEs — to extract and aggregate data
- C2: Design a normalized schema (3NF) and a dimensional schema (star) for a given analytical question
- C3: Build a complete ETL pipeline using Python (pandas, sqlalchemy) from a transactional source to a dimensional warehouse
- C4: Connect to and query cloud and local databases in multiple environments (Colab notebook, CLI, MCP agent)
- C5: Choose the appropriate SQL access pattern (notebook vs. CLI/MCP) for a given task and articulate the trade-off
Expertise (Advanced Application)
- E1: Evaluate design trade-offs (normalization vs. denormalization, 3NF vs. star) for specific use cases
- E2: Build a production-ready data pipeline with validation checks, error handling, and documentation
- E3: Justify data architecture and pipeline design decisions to a technical hiring manager — cost, scalability, fitness for the analytical use case
Process Learning Outcome (Black Box — see assessment_strategy.md §3.5)
- P1: Document the evolution of at least one key design decision across milestones — including the initial approach, what prompted a revision or pivot, and what was learned from the change
Note on query optimization: Learners gain awareness that slow queries exist and why (indexes, scan patterns), covered in Week 7. Deep optimization is a DBA/data engineering concern beyond the scope of this first course — revisited in later courses as appropriate.
Course Arc (3 + 3 + 2)
| Phase | Weeks | Schema lens | What learners do |
|---|---|---|---|
| 1 — Question + Warehouse SQL | 1–3 | Read OLAP / dimensional | Query BigQuery public datasets; learn SELECT through window functions + CTEs; pick a stakeholder question and dataset |
| 2 — Build Analytical Data Product | 4–6 | Build OLAP star; read OLTP 3NF | Design a dimensional schema; read Sakila (3NF reference) + personal-data exports (real-world source); build ETL pipeline into DuckDB; provider-group GitHub schema negotiation |
| 3 — Validate + Defend + Awareness | 7–8 | Validate end-to-end; NoSQL awareness | Add validation checks + documentation; oral defense; awareness of when relational isn’t the answer |
Every week: 4 framing videos (5–8 min) → curated practice → Jupyter lab → Live Session (90 min) → Project Studio (90 min) → concept assignment → InScribe post → project milestone.
Week-by-Week Breakdown
| Week | Topic | CLO focus | Project milestone |
|---|---|---|---|
| 1 | Warehouse SQL: SELECT / WHERE / GROUP BY on BigQuery public data | CLO 1, 3 | M1 Launch — team reg + dataset scoping (formative) |
| 2 | JOINs + aggregates: combining tables to answer real questions | CLO 1, 3 | M1 Initial Pitch (5%) — 5-min team pitch: stakeholder, framing, 3–5 questions |
| 3 | Window functions, CTEs + parallel access patterns (CLI/MCP introduced) | CLO 3, 6 | Proposal Draft — peer-reviewed in Studio (formative) |
| 4 | Dimensional modeling: star schemas, grain, facts and dimensions | CLO 4 | M2 Proposal Final (15%) — schema design + ETL plan + feedback-closure memo; Wk 4 peer pulse (1%) |
| 5 | Source-data reality: 3NF (Sakila) vs personal-data exports (real-world mess) | CLO 2, 4, 5 | Analysis Draft — Studio progress check-in (formative) |
| 6 | ETL: 3NF → DuckDB warehouse + provider-group GitHub schema negotiation | CLO 5, 6 | Peer review of draft data product (participation) |
| 7 | Validation, documentation, AI Attribution Log wrap-up | CLO 2, 7 | Revision + feedback-closure note (feeds Wk 8 final) |
| 8 | NoSQL awareness + Final Deliverable + Individual Oral Defense | CLO 7, 8 | Final (17%) + Oral (20%) + Wk 8 peer eval (4%) |
Team Project: Stakeholder-Driven Analytical Data Product (Team of max 3, Weeks 1–8)
Learner teams pick a stakeholder + dataset pairing from a curated list or propose their own (subject to instructor approval). The pairing defines a real analytical question the team will answer with a designed, built, and validated data product.
Curated dataset pairings (BigQuery public datasets — no download required):
- NYC Taxi Trips + city transportation planner
- Austin 311 Service Requests + city operations director
- Stack Overflow survey + engineering hiring manager
- USA Names (SSA) + school district demographer
- Iowa Liquor Sales + regional retail buyer
- Google Analytics sample + e-commerce growth analyst
Weeks 5–6 source data: Learners also work with their own personal-data exports (Spotify, YouTube, Strava) as a real-world messy source — uniquely AI-resistant because the data is individual. Provider-groups (learners with the same export source) collaborate via a shared GitHub repo to negotiate a shared schema.
3NF reference dataset (Weeks 5–6): Sakila via the Pagila Postgres port — a clean textbook 3NF schema used as a contrast to the real-world mess of personal-data exports.
Project Milestones
| Milestone | Week | Weight | Deliverables |
|---|---|---|---|
| M1 Launch | 1 | Formative | Team registration, dataset + stakeholder pick, 1-page scoping notes |
| M1 Initial Pitch | 2 | 5% | 5-min pitch in Live Session: stakeholder, 3–5 questions, dataset rationale |
| Proposal Draft | 3 | Formative | Draft proposal, peer-reviewed in Studio |
| M2 Proposal Final | 4 | 15% | Final proposal: stakeholder, questions, dimensional schema design, ETL plan, AI Attribution checkpoint, feedback-closure memo |
| Analysis Draft | 5 | Formative | Studio progress check-in |
| Peer Review | 6 | Participation | Submit draft data product for structured peer review |
| Revision | 7 | Formative | Revised draft + feedback-closure note |
| Final Deliverable | 8 | 17% | Working ETL + dimensional schema + 2–3 validated analyses + GitHub repo + README |
| Individual Oral Defense | 8 | 20% | 12–15 min team presentation + individual Q&A (AIAS 0, no AI) |
| Wk 8 Peer Evaluation | 8 | 4% | Summative anonymous teammate rating |
Final Deliverable (17% of grade) — key components
- DuckDB warehouse with fact + dimension tables, populated from source data
- ETL notebook (idempotent, documented, with validation checks)
- 2–3 analyses answering the stakeholder’s original questions
- Full AI Attribution Log + 1-paragraph reflection on how AI use evolved M1 → Final
- GitHub repo with all code, schema documentation, and setup guide
- Architecture diagram (data flow: source → transform → warehouse → analysis)
Rubric (6 dimensions):
| Dimension | Excellent (A) | Proficient (B) | Developing (C) |
|---|---|---|---|
| Schema Design | Star schema properly grained, dimensions clean, trade-offs justified | Mostly correct schema with minor issues | Grain problems or missing relationships |
| ETL Pipeline | Modular, idempotent, documented, robust validation checks | Good structure, handles happy path | Fragile pipeline, minimal error handling |
| Source-Data Handling | Personal-data and/or BigQuery source read correctly; provenance documented | Source read works, provenance partial | Source integration missing or broken |
| Documentation | Complete architecture diagram, clear setup guide, code comments | Adequate documentation | Incomplete or unclear |
| Business Understanding | Analyses answer the stakeholder question; decisions explained in business terms | Mentions business context | No business rationale |
| Learning Trajectory | Clear pivot or error recovery with explicit rationale — shows how thinking evolved M1 → Final | Some revision with partial explanation | Final product with no visible iteration |
Oral Defense (20-25% of grade)
Week 8 studio session. Each team presents their pipeline design, demonstrates the live system, and fields Q&A from instructor and peers.
Rubric (3 dimensions):
| Dimension | Excellent (A) | Proficient (B) | Developing (C) |
|---|---|---|---|
| Technical Explanation | Clear walkthrough of design decisions, trade-offs articulated | Adequate explanation of system | Unclear or surface-level description |
| Live Demo | System works end-to-end, handles follow-up queries confidently | Demo works with minor issues | Demo fails or cannot answer questions |
| Individual Contribution | Each member explains their role with depth | Members can describe their work | Uneven participation or vague answers |
Weekly Assignments (30-40% of grade)
Practice exercises, labs, discussions, and peer reviews that build skills for the project. Completed individually unless noted.
| Week | Assignment | Format |
|---|---|---|
| 1 | SQL fundamentals exercises + quiz | Practice problems + timed quiz |
| 2 | SQL query problem set + quiz | JOINs, subqueries, aggregations |
| 3 | ER diagram case study + peer review | Design exercise + review a classmate’s ERD |
| 4 | DataCamp assignment + pandas lab | DataCamp module + SQL→pandas translation exercises |
| 5 | ETL coding exercises + peer code review | Coding problems + review a classmate’s pipeline code |
| 6 | Data quality + API integration lab | Clean a messy real-world dataset + build an API connector |
| 7 | Cloud deployment lab + automation exercise | Deploy pipeline to GCP BigQuery + configure automated scheduling |
| 8 | Portfolio reflection | Written reflection on learning journey and skill growth |
AI Tools Integration
Where AI Accelerates Learning:
- Week 1 (SQL Fundamentals): Use Claude/ChatGPT to:
- Explain SQL errors (“Why does this JOIN return NULL?”)
- Generate sample SQL queries for practice
- Validate your schema design
- Suggest refactoring for clarity
- Week 4-5 (Python ETL): Use AI to:
- Debug pandas errors (“How do I reshape this DataFrame?”)
- Optimize code performance (“How do I vectorize this loop?”)
- Generate error handling patterns
- Suggest pandas functions for complex transformations
- Week 7-8 (Cloud & Optimization): Use AI to:
- Troubleshoot GCP BigQuery configuration issues
- Suggest index strategies for slow queries
- Review security practices
- Generate Cloud Functions code for automation
Three-Options Prompting Protocol (Weeks 4-8, AIAS Level 2-3):
For all milestone submissions (M4-M7) and the final deliverable, students must use the three-options format when using AI for design or architectural decisions:
- Prompt AI for three distinct options (e.g., “Give me three different approaches to structuring this ETL pipeline”)
- Summarize each option briefly
- Select one and justify the choice in 2-3 sentences based on your specific project context
Submission format:
AI Prompt: [paste exact prompt]
Option A: [1-sentence summary]
Option B: [1-sentence summary]
Option C: [1-sentence summary]
Chosen: [A/B/C]
Rationale: [why this option fits your context better than the others]
This replaces generic AI attribution (“I used Copilot”) with evidence of evaluative judgment. Options must differ on at least one meaningful dimension (e.g., performance vs. simplicity vs. maintainability). The three-options protocol does not apply to Weeks 1-3 (pre-AI / AIAS Level 0-1 phases).
Per-Milestone AI Attribution Log (program-standard format):
Aligned with design/project_milestone_template.md. Required at four checkpoints:
| Milestone | What to document |
|---|---|
| M2 (Wk 2) | ERD design decisions — what AI suggested, what you changed, why |
| M4 (Wk 4) | Extraction script — AI-suggested approaches accepted vs. rejected, with rationale |
| M6 (Wk 6) | ETL pipeline — which logic was AI-generated vs. human-specified; flag any AI output accepted without verification |
| Final (Wk 8) | Full Attribution Log covering all AI use across the project + 1-paragraph reflection: how did AI use evolve from M1 to Final? What would you do differently? |
Format for each entry:
Milestone: [M2 / M4 / M6 / Final]
AI Tool Used: [Claude / Copilot / ChatGPT / other]
Prompt (summarized): [what you asked]
Output Used: [what you kept]
Output Rejected: [what you discarded and why]
Human Decision: [what judgment you applied that AI could not]
Studio Session Topics:
- Week 1: ER diagram design walkthrough + SQL SELECT fundamentals
- Week 2: Advanced SQL patterns (window functions, CTEs, complex JOINs)
- Week 3: Database normalization decisions + denormalization trade-offs
- Week 4: SQL→Python translation — SELECT→filter, GROUP BY→groupby, JOIN→merge
- Week 5: Error handling patterns in Python + logging best practices
- Week 6: Data quality + API authentication — cleaning messy data, managing credentials safely
- Week 7: GCP BigQuery setup + pipeline automation — deploy, connect, schedule, troubleshoot
- Week 8: Final presentations + portfolio feedback
Assessment Summary
| Component | Weight Range | Notes |
|---|---|---|
| Weekly assignments (exercises, quizzes, labs) | 30–40% | Individual, skill-building |
| Project milestones (M1 Wk2, M2 Wk4 + formatives) | 20–30% | Team |
| Final project deliverable (Wk 8) | 15–20% | Team |
| Individual oral defense (Wk 8) | 20–25% | Individual, AIAS 0 |
| Peer evaluation | 5% | Wk 4 formative 1% + Wk 8 summative 4% |
| InScribe Live Session engagement | 3–5% | Weekly posts |
No traditional exam. All assessment is project-based, skills-based, and participation.
AI Usage Levels (AIAS)
| Assessment | AIAS Level | AI Permitted |
|---|---|---|
| Quizzes (Weeks 1, 2) | 0 | No AI |
| Weekly labs + exercises | 1 | AI for ideation and exploration only |
| Project milestones M1-M3 (schema/ERD) | 1 | AI for ideation and schema brainstorming only |
| Project milestones M4-M6 (ETL pipeline) | 2 | AI for code assistance and debugging, with attribution |
| Project milestone M7 + final deliverable (cloud deployment) | 3 | AI as full collaborator for GCP troubleshooting and code review, with attribution |
| Oral defense | 0 | No AI |
| Studio participation | 1 | AI for exploration during exercises |
Technology Stack
- Query targets: BigQuery public datasets (Wks 1–4, cloud notebook); local Postgres with Sakila/Pagila (Wks 5–6, 3NF reference); personal-data exports in DuckDB (Wks 5–6, real-world source)
- Analytical warehouse: DuckDB (local, embedded — fast setup, no server; learners build their star schema here)
- Python libraries: pandas, sqlalchemy, psycopg2, requests, duckdb
- Access patterns: Jupyter/Colab notebook (primary Wks 1–4); VS Code +
bqCLI + MCP agent (introduced Wk 3, load-bearing Wks 4–7) - MCP config: Course-provided pin-versioned bundle (BigQuery + Postgres + DuckDB + filesystem + GitHub MCP servers) — imported in Wk 3
- IDE: VS Code with GitHub Copilot; Google Colab (browser fallback)
- Version control: GitHub (individual repos Wks 1–5; provider-group shared repos Wk 6)
- Diagramming: Lucidchart or DrawIO for ERDs and architecture diagrams
Prerequisites & Assumptions
- No SQL experience required — this course starts from zero
- Python fundamentals required (loops, functions, data types) — covered in the pre-program Python orientation
- Comfortable installing software + troubleshooting (Jupyter setup guide provided in onboarding)
Preparatory Coursework Alignment (Staggered Model, 2026-05-18)
Per the program-wide staggered preparatory coursework model (see program/curriculum.md “Preparatory Coursework” and DECISIONS.md 2026-05-18), students arrive in BADM 554 with the following prep status:
| Module | Required by | Tier |
|---|---|---|
| GitHub + VS Code (DataCamp) | Before Day 1 (BADM 554 Week 1) | Gated — completion required |
| Gen AI literacy | End of Week 4 | Gated — runs in parallel with this course |
| Python & Tools prep (Coursera — specific course TBD) | Before BDI 513 (Week 5) | Self-diagnostic; students may still be working through during weeks 1-4 |
| Stats MOOCs 1 + 2 | Before FIN 550 (Week 9) | Self-diagnostic |
Implications for BADM 554:
- Students have GitHub + VS Code set up by Day 1 (course can assume the environment)
- Some students may still be building Python fluency through Week 4 (especially career pivoters); design weekly assignments accordingly
- Gen AI literacy may be in-progress during the first 4 weeks; calibrate AI-aware framing in early weeks
- Stats are not yet assumed; defer any statistics-dependent content to BDI 513 or later
Faculty office hours during Weeks 1-3 are recommended for students still finishing Python self-diagnostic prep — see Ron Guymon’s 2026-04-15 note on zero-coding-experience learners.
Pedagogical Notes for Faculty
Design suggestions grounded in program research — not requirements. Adapt to your course and teaching style. Full references in reference/articles/.
The scenic route (cognitive friction) Pre-AI phases aren’t punishment — they’re the scenic route. Schultz’s dopamine research shows the brain registers learning through prediction errors (the gap between expectation and outcome), not through frictionless delivery. When students design their ERD by hand before asking Copilot to generate CREATE TABLE statements, that struggle is the mechanism — not the obstacle. Consider where in each week students should formulate their own approach before consulting AI. The AIAS progression in this course (0→1→2→3 across weeks) already builds this in; the key is framing it for students as investment, not restriction. → Machulla (2026), Schultz et al. (1997)
The IKEA effect (completion matters) Students value what they build — but only when labor leads to a finished artifact. Building and then discarding produces no ownership effect. The M1→M7→Final Deliverable pipeline already does this well: each milestone closes a loop. The oral defense is the ultimate completion signal — the moment where effort converts to demonstrated competence. When possible, make each milestone feel like a working thing (a queryable schema, a running pipeline), not just a checkpoint document. → Norton, Mochon & Ariely (2012)
Prompt sophistication as skill progression Anthropic data shows r > 0.92 correlation between prompt sophistication and output quality, and multi-turn interactions succeed 67% of the time vs. 49% for single-turn. This course is most students’ first encounter with AI-assisted coding (AIAS 1→3). Consider explicitly teaching prompt patterns early — not as a sidebar, but as a core data foundations skill: “asking the right question of an AI is the same skill as writing the right SQL query.” → Means (2026, “How You Ask”)
Three AI iterations before human review For project milestones M4-M7 (ETL pipeline work at AIAS 2-3), consider requiring students to iterate with AI at least 3 times before submitting for human feedback. This builds the habit of treating AI as a drafting partner, not an answer machine, and produces richer AI Attribution Logs. → Means (2026, “Practice Gap”)
Attack your assessments Before the semester starts, have a confident AI user (TA, LD, or yourself) attempt each major assignment using current AI tools from a student’s perspective. Where can AI complete the task without genuine understanding? Those are the spots to add pre-AI phases or shift weight toward the oral defense. Repeat each semester — AI capabilities shift fast. → Furze (2026)
The cognitive offloading U-curve (Zone 2 is the danger zone) Learning outcomes follow a three-zone pattern: Zone 1 (no AI) = full cognitive load, slow but real learning; Zone 2 (scattered, unstructured AI use) = worse outcomes than no AI at all — coordination overhead without genuine cognitive reallocation; Zone 3 (committed, structured delegation) = superior outcomes through deliberate offloading of routine work while investing freed capacity in higher-order reasoning. The AIAS progression in this course (0→1→2→3) is designed to move learners through Zone 1 early, through Zone 2 quickly, and into Zone 3 by Weeks 5-8. The risk is AIAS Level 1 (“AI for ideation only”) becoming a Zone 2 trap if students engage with AI vaguely and inconsistently. Mitigate by giving Level 1 tasks a clear structure: “Before using AI, write your own answer. Then ask AI. Then compare and justify any changes.” That sequence is Zone 3 behavior, even at low AI permission levels. Separately: a 2026 study of 912 learners found that “partnership orientation” — treating AI as a thinking partner rather than an answer machine — independently predicted deeper learning regardless of how much AI was used. The mental model learners bring to AI matters as much as the permission level. Frame Week 1’s AI orientation around this: “Using AI well is the same skill as writing the right SQL query — the quality of your question determines the quality of the answer.” → Hardman (2026), “The Cognitive Offloading Paradox”; Wang & Zhang (2026)
AI as material for thinking, not a shortcut around cognition (design thinking checklist) Before finalizing any assignment or milestone, apply the three diagnostic questions from assessment_strategy.md §3.4: (1) What is the learning outcome this task targets? (2) What cognitive work must remain with the student to achieve that outcome? (3) What kind of AI assistance makes that cognitive work more visible, more rigorous, or more equitable — rather than replacing it? For BADM 554: the cognitive work that must stay with students is schema design judgment (Wks 1-3) and pipeline architecture decisions (Wks 4-7). AI assistance at those stages should make the student’s reasoning more explicit — not substitute for it. The per-milestone attribution log is the mechanism. → Vander (2026), “Design Thinking for AI-Integrated Pedagogy”
Faculty action required: Before T&L course design review, document your answers to the three diagnostic questions above in writing for this course. See assessment_strategy.md §3.4 for the submission format.
Recruiting Talking Points
Drafted by Vishal Sachdev. Adopted as the model for program-wide faculty talking points (Ravi Mehta directive, 2026-05-27). See DECISIONS.md “Faculty course talking points: three-question framework for recruiting.”
What does the course cover? BADM 554 is the first course of MSBAi: 8 weeks of database fluency for working professionals pivoting toward analyst work. Students learn SQL, schema design, and ETL on real enterprise data; no prior SQL required. Each week pairs a stakeholder conversation with a pre-AI / AI-mediated / post-AI workflow that treats AI as a thinking partner, not a shortcut.
What do students build? Teams of three adopt a stakeholder and a real dataset in Week 1, then spend 8 weeks building a portfolio GitHub repository: a dimensional warehouse, an ETL notebook ingesting from a live public data source, an AI Attribution Log, and a cloud-hosted version of the warehouse with a queryable URL. Each student defends the team’s work in an individual oral examination at Week 8.
Why does it matter for careers? SQL and schema judgment are baseline. What separates hireable analysts is talking to a non-technical stakeholder, working with AI without being replaced by it, and shipping a defensible artifact another person can audit. BADM 554 builds those three skills explicitly.
| Course Sequence: ← Previous: (first course) | Next: BDI 513 — Data Storytelling → |