BADM 576 - Data Science and Analytics (ML II)
Program-level details: See program/curriculum.md
Status: Draft Initial outline; pending instructor review. Proposed MSBAi name: Data Science & Machine Learning — pending formal rename approval
| Credits: 4 | Term: Fall 2027 (Weeks 1-8) | Instructor: Zilong |
Course Vision
Building on supervised learning foundations from FIN 550 (ML I), students master advanced ML techniques and the full deployment lifecycle. This course covers advanced ensembles, unsupervised learning, NLP/text analytics, time series, neural networks, and model deployment with MLOps and LLMOps. By course end, students can build, deploy, and monitor production ML systems.
Learning Outcomes (L-C-E Framework)
Literacy (Foundational Awareness)
- L1: Understand advanced ML paradigms (unsupervised learning, deep learning, NLP) and when each applies
- L2: Explain the ML deployment lifecycle (training, serving, monitoring, retraining)
- L3: Recognize ethical issues in ML deployment (bias, fairness, transparency, model drift)
Competency (Applied Skills)
- C1: Apply advanced ensemble methods and regularization techniques to improve model performance
- C2: Implement advanced unsupervised learning (K-means, DBSCAN, hierarchical clustering, PCA, t-SNE) beyond introductory segmentation
- C3: Build NLP/text analytics pipelines (TF-IDF, embeddings, text classification)
- C4: Deploy and monitor ML models in production with MLOps practices
Expertise (Advanced Application)
- E1: Build end-to-end ML systems from raw data to production deployment with monitoring
- E2: Integrate LLMOps practices (agentic AI deployment, model evaluation, prompt management) alongside traditional MLOps
- E3: Evaluate models for fairness, bias, and ethical deployment with comprehensive model documentation
Week-by-Week Breakdown
| Week | Topic | Lectures | Project Work | Studio Session | Assessment |
|---|---|---|---|---|---|
| 1 | Advanced ensembles + regularization | 2 videos | Team formation + problem scoping | Regularization deep-dive - Ridge, Lasso, ElasticNet, ensemble tuning | Weekly assignment 1 |
| 2 | Unsupervised learning: clustering + dimensionality reduction | 3 videos | EDA + initial segmentation | Clustering workshop - K-means, DBSCAN, hierarchical, PCA, t-SNE | Weekly assignment 2 + Milestone M1 |
| 3 | NLP/text analytics | 2 videos | Text analytics for project | NLP with scikit-learn - TF-IDF, word embeddings, text classification | Weekly assignment 3 |
| 4 | Time series analysis | 3 videos | Forecasting component | Time series workshop - ARIMA, Prophet, evaluation metrics | Weekly assignment 4 + Milestone M2 |
| 5 | Neural networks intro | 2 videos | Neural net model for project | Neural networks - architectures, keras/tensorflow basics | Weekly assignment 5 |
| 6 | Deep learning applications | 2 videos | System architecture + API design | Deep learning - CNNs for tabular data, transfer learning | Weekly assignment 6 + Milestone M3 |
| 7 | Model deployment + MLOps + LLMOps | 2 videos | Deploy model + monitoring | ML in production - Docker, APIs, monitoring, agentic AI deployment | Final deliverable work |
| 8 | Ethics, synthesis, portfolio showcase | 1 video | Final deliverable + reflection | Ethics in ML - bias, fairness, model cards + final presentations | Final deliverable + team oral defense |
Team Project: Production ML System (Team of 3)
One major team project runs across all 8 weeks. Teams build a production-ready ML system that incorporates advanced analytics, forecasting, deep learning, and full deployment with MLOps/LLMOps practices.
Problem Options:
- Customer intelligence platform (segmentation + churn prediction + demand forecasting)
- Financial analytics system (market analysis + sentiment + price forecasting)
- Operations intelligence platform (supply chain optimization + anomaly detection + demand forecasting)
- Student choice (approved)
Weekly Assignments (Weeks 1-6, Individual)
Hands-on exercises that build technical skills feeding into the team project:
| Week | Assignment | Focus |
|---|---|---|
| 1 | Ensemble methods lab | Ridge, Lasso, ElasticNet comparison; advanced ensemble tuning |
| 2 | Clustering + dimensionality reduction | K-means, DBSCAN, hierarchical clustering; PCA, t-SNE visualization |
| 3 | Text analytics pipeline | TF-IDF, embeddings, text classification with scikit-learn |
| 4 | Time series forecasting | ARIMA, Prophet models; MAE, MAPE, RMSE evaluation |
| 5 | Neural network fundamentals | Architecture design, keras/tensorflow basics, training documentation |
| 6 | Deep learning application | CNNs for tabular data, transfer learning, comparison with traditional methods |
Rubric per assignment (3 dimensions):
| Dimension | Excellent (A) | Proficient (B) | Developing (C) |
|---|---|---|---|
| Technical Execution | Correct implementation, well-tuned parameters | Functional code, reasonable choices | Incomplete or poorly tuned |
| Interpretation | Clear business insights from results | Adequate explanation | Minimal interpretation |
| Code Quality | Clean, commented, reproducible | Readable code | Disorganized or undocumented |
Project Milestones (Progressive, Team)
Milestones build progressively toward the final deployed system:
| Milestone | Due | Deliverable |
|---|---|---|
| M1: Problem Scoping + Data | End of Week 2 | Problem definition, dataset selection, EDA, initial clustering/segmentation analysis, team charter |
| M2: Model Development | End of Week 4 | Trained models (ensemble + time series + baseline neural net), evaluation metrics, model comparison |
| M3: System Architecture | End of Week 6 | System design document, API specification, deployment plan, LLMOps integration plan |
Rubric per milestone (3 dimensions):
| Dimension | Excellent (A) | Proficient (B) | Developing (C) |
|---|---|---|---|
| Progress | On track, all deliverables complete | Most deliverables complete | Behind schedule or incomplete |
| Technical Depth | Rigorous analysis, justified decisions | Sound approach | Superficial or unjustified |
| Team Collaboration | Clear task division, all members contributing | Adequate collaboration | Uneven contributions |
Final Project Deliverable (Week 7-8, Team)
Deliverables:
- Model Documentation:
- Model card (what it does, performance, limitations, ethical considerations)
- Data sheet (dataset provenance, bias analysis)
- System design document (inputs, outputs, dependencies)
- Deployment:
- Containerized model (Docker)
- REST API (Flask/FastAPI)
- Cloud deployment (AWS Lambda, Heroku, or similar)
- Monitoring + Operations:
- Unit tests + integration tests
- Performance monitoring (track model accuracy over time)
- Data drift detection (alert if input distribution changes)
- Retraining strategy (how often to retrain?)
- LLMOps Component:
- Integration of agentic AI concepts (how LLM-based tools fit in the ML system)
- Prompt management and versioning strategy (if applicable)
- Fairness + Ethics Analysis:
- Evaluate model for bias (across demographic groups if applicable)
- Document limitations + intended use cases
- Identify risks + mitigation strategies
- Peer evaluation of team contributions
- GitHub repo with all code + tests + Docker file + documentation
Rubric (5 dimensions):
| Dimension | Excellent (A) | Proficient (B) | Developing (C) |
|---|---|---|---|
| Deployment | Production-ready, containerized, accessible | Works on cloud | Local only |
| MLOps/LLMOps | Comprehensive monitoring, drift detection, LLM integration | Basic tracking | No monitoring |
| Model Quality | Multiple well-tuned models with rigorous comparison | Functional models | Single or poorly tuned model |
| Fairness Analysis | Thoughtful bias evaluation + mitigation | Addresses fairness | Ignores fairness |
| Documentation | Model card + system design complete | Adequate docs | Minimal documentation |
Oral Defense (Week 8, Team)
Teams present their deployed system, demonstrate the live API, walk through the model card, and answer questions on design decisions, fairness analysis, and deployment trade-offs.
Rubric (3 dimensions):
| Dimension | Excellent (A) | Proficient (B) | Developing (C) |
|---|---|---|---|
| Technical Depth | Clear explanation of architecture, model choices, and trade-offs | Adequate explanation | Superficial or confused |
| Live Demo | Confident demo of deployed system, handles edge cases | System works but limited demo | Demo fails or only screenshots |
| Q&A | Handles questions confidently, demonstrates deep understanding | Answers most questions | Unable to answer or deflects |
AI Tools Integration
Weeks 1-3 (Weekly Assignments + Project M1):
- Use Claude/ChatGPT to:
- Explain regularization trade-offs
- Debug clustering and NLP pipeline issues
- Suggest dimensionality reduction approaches
- Generate feature engineering code
Weeks 4-6 (Weekly Assignments + Project M2-M3):
- Use AI to:
- Explain ARIMA parameter selection
- Debug neural network training issues
- Suggest architecture choices
- Generate evaluation code
Weeks 7-8 (Final Deliverable + Deployment):
- Use AI to:
- Write Docker/API code
- Generate monitoring and drift detection code
- Create model cards and documentation
- Review design for production readiness
Studio Session Topics:
- Week 1: Regularization + advanced ensembles
- Week 2: Clustering + dimensionality reduction visualization
- Week 3: NLP pipelines + text feature engineering
- Week 4: Time series decomposition + ARIMA
- Week 5: Neural network training + debugging
- Week 6: Deep learning applications + transfer learning
- Week 7: Model deployment + containerization + LLMOps
- Week 8: ML ethics + fairness + team presentations
Assessment Summary
| Component | Weight | Notes |
|---|---|---|
| Weekly assignments | 30% | Weeks 1-6, individual |
| Project milestones | 25% | M1 (Wk 2), M2 (Wk 4), M3 (Wk 6), team |
| Final project deliverable | 20% | Weeks 7-8, team |
| Oral defense | 20% | Week 8, team |
| Studio participation | 5% | Weekly attendance + peer feedback |
No traditional exam. One major team project with weekly individual skill-building assignments.
AI Usage Levels (AIAS)
| Assessment | AIAS Level | AI Permitted |
|---|---|---|
| Weekly Assignments | 2 | AI for debugging, parameter guidance, code explanation — with attribution |
| Project Milestones | 2 | AI for EDA, model selection guidance, architecture suggestions — with attribution |
| Final Project Deliverable | 3 | AI as collaborator for Docker/API code, model cards, monitoring scripts — with full disclosure |
| Oral Defense | 0 | No AI |
| Studio Participation | 1 | AI for exploration during exercises |
Technology Stack
- ML Libraries: scikit-learn, XGBoost, LightGBM, keras/tensorflow
- Data: pandas, numpy, feature-engine
- Text: scikit-learn (TF-IDF), gensim, spacy (optional)
- Time Series: statsmodels, prophet
- Deep Learning: keras, tensorflow
- Deployment: Docker, Flask/FastAPI, AWS Lambda
- Monitoring: evidently (for ML monitoring), custom scripts
- Testing: pytest, unittest
- IDE: VS Code with GitHub Copilot; Google Colab (browser alternative)
- Notebooks: Jupyter Notebooks (via Colab or VS Code)
- Version Control: GitHub
Prerequisites
- Completion of FIN 550 (supervised ML foundations) + BADM 558 (cloud infrastructure)
- Comfortable with Python programming + SQL
Bridge Module: ML Refresher (Pre-Course, ~3 hours)
Complete before Week 1. Available in Canvas at the start of Fall 2027. There is an approximately 8-month gap between ML I (FIN 550, Fall 2026) and ML II (this course). This module helps students rebuild fluency before diving into advanced topics.
| Unit | Topics | Format | Self-Check |
|---|---|---|---|
| 1. Supervised Learning Review (1 hr) | Train/test splits, cross-validation, overfitting/underfitting, bias-variance tradeoff | Narrated Jupyter notebook walkthrough | Quiz: identify overfitting in a learning curve, explain train/test split |
| 2. Model Evaluation Refresher (1 hr) | Accuracy, precision, recall, F1, ROC/AUC, confusion matrix, regression metrics (MAE, RMSE, R²) | Interactive Jupyter exercises with pre-built models | Quiz: interpret a confusion matrix, choose the right metric for a scenario |
| 3. Core Algorithms Quick Review (1 hr) | Linear/logistic regression, decision trees, random forest, gradient boosting — when to use each | Cheat sheet + short exercises comparing model outputs | Quiz: given a problem description, recommend an algorithm and justify |
Readiness check: Students who pass all 3 self-check quizzes (70% threshold) are ready for Week 1. This module is strongly recommended for all students, not just those who feel rusty.
| Course Sequence: ← BADM 557 — Business Intelligence | Next: Capstone → |