Contents
- The "Hidden Technical Debt" Problem in ML
- What is MLOps? The Three-Way Intersection
- The 5 Pillars of MLOps Every Data Scientist Needs
- Pillar 1: Version Control for Data and Models — DVC and Git
- Pillar 2: CI/CD for ML — Automating Pipelines and Deployment
- Pillar 3: Model Monitoring — Detecting Data Drift Before It Kills Your Business
- Pillar 4: Feature Stores — Centralising the Intelligence Layer
- Pillar 5: Scalability — From Jupyter Notebook to Cloud Production
- Traditional DevOps vs. MLOps: Understanding the Difference
- MLOps Data Scientists Need to Know 2026: The Career Impact
- The Salary Trajectory Is Unambiguous
- The Path to Lead and Architect Roles
- The Mumbai-Specific Context
- Building Your MLOps Skill Stack: The 90-Day Plan
- Deploy Your First Model: Your Next Step
Here is a number that should change how you think about your career: 80% of machine learning models that are built never reach production.
Not because the models are bad. Not because the business problem was wrong. Because the infrastructure, processes, and operational discipline to take a model from a Jupyter notebook to a live, monitored, production system were never built. The model works perfectly on the training data. It works on the test data. Then it sits on a data scientist's laptop — or at best, in a Git repository — while the business waits for value that never arrives.
This is the silent killer of AI projects. And knowing how to prevent it — knowing mlops data scientists need to know 2026 — is what separates the data scientists who are offered Lead and Architect roles at Mumbai's top BFSI and FinTech firms from the ones stuck at mid-level for three consecutive performance cycles.
The "Hidden Technical Debt" Problem in ML
In 2015, Google published a research paper titled "Machine Learning: The High-Interest Credit Card of Technical Debt." Its central argument: building an ML model is the easy part. The infrastructure surrounding that model — data pipelines, serving systems, monitoring, retraining workflows, dependency management — is often 90% of the total engineering effort, and it is the part that is almost never planned for during the initial model development phase.
This "hidden technical debt" manifests in ways that are familiar to anyone who has worked in a real data science team:
- The notebook handoff problem: A data scientist builds an excellent model in a Jupyter notebook and "hands it over to engineering to deploy." Engineering receives 800 lines of exploratory notebook code, undocumented dependencies, hardcoded file paths, and no reproducibility guarantee. Months pass.
- The retraining gap: A credit risk model trained in January 2026 starts making worse predictions in September 2026 — not because it was poorly built, but because customer behaviour shifted after a macroeconomic event and no one built a system to detect this. The business continues relying on a model that is silently degrading.
- The experiment reproducibility failure: The model that was deployed three months ago produced result X. Someone asks why. Nobody can reproduce it — the data has been updated, the package versions are not recorded, and the exact training code has been overwritten.
- The feature inconsistency bug: The model was trained using features calculated one way. The production system calculates them slightly differently. The model performs worse in production than in development, and tracking down the discrepancy takes weeks.
Every one of these failures is preventable. MLOps is the discipline of preventing them.
What is MLOps? The Three-Way Intersection
MLOps is a portmanteau of Machine Learning and DevOps — but its full definition requires a third domain: Data Engineering.

Machine Learning contributes the core goal: building models that learn from data and generate predictions of business value.
DevOps contributes the operational discipline: version control, automated testing, continuous integration and deployment, infrastructure as code, and monitoring that catches failures before they affect users.
Data Engineering contributes the data foundation: reliable pipelines that deliver clean, consistent, versioned data to models in both training and production environments.
MLOps sits at the intersection of all three. It is the practice of applying software engineering rigour to the full lifecycle of a machine learning system — from data collection through model training, deployment, monitoring, and retraining — so that models deliver value reliably, can be updated safely, and can be debugged when they fail.
Concretely, MLOps answers questions like:
- How do we ensure that the model we deploy is exactly the model we tested?
- How do we know when a deployed model's performance is degrading and needs retraining?
- How do we retrain a model on new data without breaking the production system?
- How do we collaborate as a team on model development without overwriting each other's work?
- How do we run the same model training pipeline on a local machine, a staging server, and production cloud infrastructure without code changes?
The 5 Pillars of MLOps Every Data Scientist Needs
Pillar 1: Version Control for Data and Models — DVC and Git
Software engineers version control their code. Data scientists need to version control three things: their code, their data, and their models — and these three artefacts must be linkable, so you can always reconstruct exactly which version of the code, trained on which version of the data, produced which model.
Git handles code versioning — this is familiar. DVC (Data Version Control) extends Git's principles to large data files and model artefacts that cannot be stored in Git repositories.
# Initialise DVC in your project
dvc init
# Track a large dataset
dvc add data/credit_applications_jan2026.csv
# The .dvc file is committed to Git; the actual data goes to remote storage (S3, GCS, etc.)
git add data/credit_applications_jan2026.csv.dvc .gitignore
git commit -m "Add January 2026 credit application training data"
# Track the trained model
dvc add models/credit_risk_v2.pkl
git add models/credit_risk_v2.pkl.dvc
git commit -m "Train credit risk model v2 on Jan 2026 data"
With this setup, any team member can check out any historical commit and reproduce the exact experiment — same code, same data, same model — months or years later. The notebook handoff problem disappears because everything is reproducible and documented.
Model Registry: Tools like MLflow Model Registry or Weights & Biases extend this further — providing a centralised catalogue of all trained models, their performance metrics, their training parameters, and their deployment status (staging, production, archived).
import mlflow
import mlflow.sklearn
from sklearn.ensemble import GradientBoostingClassifier
# Log an experiment run
with mlflow.start_run():
model = GradientBoostingClassifier(n_estimators=200, max_depth=5)
model.fit(X_train, y_train)
mlflow.log_param("n_estimators", 200)
mlflow.log_param("max_depth", 5)
mlflow.log_metric("auc_roc", roc_auc_score(y_test, model.predict_proba(X_test)[:,1]))
mlflow.log_metric("precision", precision_score(y_test, model.predict(X_test)))
mlflow.sklearn.log_model(model, "credit_risk_model")
Every experiment is tracked. You can compare runs, roll back to a previous version, and promote the best model to production with a single command.
Pillar 2: CI/CD for ML — Automating Pipelines and Deployment
In software engineering, CI/CD (Continuous Integration / Continuous Deployment) means that every code change is automatically tested and, if it passes, deployed. The same principle applies to ML — but with additional complexity because you are not just testing code, you are testing the behaviour of a model trained on data.
What a CI/CD pipeline for ML automates:
- Data validation: Automatically check that incoming training data conforms to expected schema, value ranges, and statistical distributions before training begins. A schema violation that would cause a silent model failure is caught and flagged immediately.
- Model training: Trigger a model retrain when new data arrives or when a schedule is met — without manual intervention
- Model evaluation: Automatically compare the new model's performance against the current production model on a held-out evaluation set. Block deployment if performance regresses.
- Deployment: If the new model passes evaluation, automatically package it, update the serving infrastructure, and route traffic to it
Tools in Mumbai's production AI stack:
- GitHub Actions or GitLab CI for pipeline orchestration
- Apache Airflow or Prefect for scheduling data and training pipelines
- Docker for packaging the model and its dependencies into a reproducible container
- Kubernetes for orchestrating those containers at scale
# .github/workflows/ml_pipeline.yml — simplified example
name: ML Training Pipeline
on:
schedule:
- cron: '0 2 * * 1' # Run every Monday at 2 AM
push:
paths:
- 'src/train.py'
- 'src/features.py'
jobs:
train-and-evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Pull latest training data
run: dvc pull data/training_data.csv.dvc
- name: Run data validation
run: python src/validate_data.py
- name: Train model
run: python src/train.py --output models/new_model.pkl
- name: Evaluate against production baseline
run: python src/evaluate.py --new models/new_model.pkl --baseline models/production_model.pkl
- name: Deploy if improved
run: python src/deploy.py --model models/new_model.pkl
if: ${{ steps.evaluate.outputs.improved == 'true' }}
This pipeline runs automatically, validates data quality, trains the model, evaluates it against the production baseline, and deploys it only if performance improves — all without a data scientist manually running anything.
Pillar 3: Model Monitoring — Detecting Data Drift Before It Kills Your Business
Deploying a model is not the end of the work. It is the beginning of a different kind of work: ensuring the model continues to perform as expected in a changing real world.
Data drift is what happens when the statistical distribution of the inputs your model receives in production diverges from the distribution it was trained on. This is not a failure of the model — it worked correctly for its training data. It is a failure to recognise that the world changes.
A credit risk model trained on customer behaviour from 2023–2024 will encounter different patterns in 2026 — post-interest-rate changes, post-economic shifts, post-new-product-launches. A fraud detection model trained before a new type of UPI fraud pattern emerged will miss that new pattern entirely. A churn prediction model for a telecom will degrade when the company launches a new pricing structure that changes the features that previously indicated churn risk.
The two types of drift to monitor:
Feature drift (covariate shift): The distribution of input features changes. The average loan amount being submitted to your credit model has shifted from ₹2.5L to ₹8.5L because the bank launched a new product tier. The model was not trained on this range.
Concept drift (label drift): The relationship between features and the outcome you are predicting changes. Pre-COVID, certain spending patterns predicted high credit risk. Post-COVID, those same patterns might be associated with normal, recovered behaviour.
Implementing drift detection in Python:
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset
import pandas as pd
# Reference data = what the model was trained on
reference_data = pd.read_parquet("data/training_data_2025.parquet")
# Current data = what is arriving in production today
current_data = pd.read_parquet("data/production_data_march2026.parquet")
# Generate drift report
report = Report(metrics=[DataDriftPreset(), TargetDriftPreset()])
report.run(reference_data=reference_data, current_data=current_data)
report.save_html("drift_report_march2026.html")
# Programmatically check if drift exceeds threshold
report_dict = report.as_dict()
drift_detected = report_dict["metrics"][0]["result"]["dataset_drift"]
if drift_detected:
# Trigger alert and retraining pipeline
send_slack_alert("⚠️ Data drift detected in credit risk model. Initiating retraining.")
trigger_retraining_pipeline()
Tools used in Mumbai's production environment: Evidently AI, WhyLabs, Arize AI, and for large-scale deployments, custom monitoring built on top of cloud-native observability stacks (AWS CloudWatch, Azure Monitor, Google Cloud Monitoring).
Pillar 4: Feature Stores — Centralising the Intelligence Layer
A feature store is a centralised repository for the computed features that ML models depend on. It solves one of the most insidious problems in team-based data science: the same feature — say, "customer's average transaction value over the last 30 days" — is defined and computed differently by three different data scientists working on three different models. The models appear to work correctly in isolation but behave unpredictably when their outputs are combined.
The feature store ensures that every model that needs a given feature retrieves it from a single, authoritative, versioned source — eliminating inconsistency between training and serving (the "training-serving skew" problem) and enabling feature reuse across the team.
The training-serving skew problem concretely:
# Training time: calculate feature using pandas on historical data
train_data['avg_txn_30d'] = (
train_data.groupby('customer_id')['txn_amount']
.transform(lambda x: x.rolling(30).mean())
)
# Serving time: calculate same feature using SQL in the production database
# (subtle difference: 'rolling 30' in pandas uses row count; SQL uses calendar days)
# The feature values differ slightly — the model behaves differently in production vs. dev
A feature store eliminates this problem by making the feature computation logic a first-class artefact — defined once, tested once, and used consistently in both training and serving.
Popular feature stores: Feast (open source), Tecton, Databricks Feature Store, Hopsworks. AWS SageMaker and Vertex AI both include managed feature stores.
Pillar 5: Scalability — From Jupyter Notebook to Cloud Production
The final pillar is the one that most visibly separates a data scientist who "knows MLOps" from one who only knows modelling: the ability to move a model from a local development environment to a scalable cloud production system.
The containerisation layer — Docker:
Docker packages a model, its Python environment, all dependencies, and the serving code into a single, portable container that runs identically on your laptop, a colleague's machine, a staging server, and a cloud production cluster.
# Dockerfile for a FastAPI model serving endpoint
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY models/credit_risk_v3.pkl models/
COPY src/serve.py .
EXPOSE 8080
CMD ["uvicorn", "serve:app", "--host", "0.0.0.0", "--port", "8080"]
# src/serve.py — FastAPI model serving endpoint
from fastapi import FastAPI
from pydantic import BaseModel
import pickle
import numpy as np
app = FastAPI()
with open("models/credit_risk_v3.pkl", "rb") as f:
model = pickle.load(f)
class LoanApplication(BaseModel):
annual_income: float
loan_amount: float
employment_years: float
existing_debt_ratio: float
credit_history_months: int
@app.post("/predict")
def predict_default_risk(application: LoanApplication):
features = np.array([[
application.annual_income,
application.loan_amount,
application.employment_years,
application.existing_debt_ratio,
application.credit_history_months
]])
probability = model.predict_proba(features)[0][1]
return {
"default_probability": round(probability, 4),
"risk_tier": "HIGH" if probability > 0.35 else "MEDIUM" if probability > 0.15 else "LOW"
}
The orchestration layer — Kubernetes:
When your model needs to serve thousands of requests per second (as a fraud detection model at Razorpay or NPCI does), a single Docker container is not enough. Kubernetes orchestrates multiple containers, handles load balancing, auto-scales capacity up during peak periods, and maintains availability if individual containers fail.
Cloud ML platforms for Mumbai's market:
- AWS SageMaker — the most commonly used managed ML platform in Mumbai's BFSI sector. Handles training, hosting, monitoring, and auto-scaling with minimal infrastructure management.
- Azure Machine Learning — dominant in enterprises using the Microsoft stack (HDFC and Axis use Azure heavily).
- Google Vertex AI — increasingly used at data-intensive e-commerce and analytics firms.
Traditional DevOps vs. MLOps: Understanding the Difference
| Dimension | Traditional DevOps | MLOps |
|---|---|---|
| Primary artefact | Application code | Code + Data + Model (three artefacts) |
| Version control | Git for code | Git for code + DVC for data/models |
| Testing | Unit tests, integration tests | Data validation, model performance tests, bias tests |
| Build trigger | Code commit | Code commit OR new data OR scheduled retraining |
| Deployment criteria | Tests pass | Tests pass AND model outperforms production baseline |
| Monitoring | Uptime, latency, error rate | Uptime + feature drift + prediction drift + model accuracy |
| Failure mode | Application crashes | Model silently degrades (no crash, wrong predictions) |
| Rollback strategy | Deploy previous code version | Roll back to previous model version + previous data pipeline |
| Key challenge | System reliability | Model performance in a changing world |
| Team composition | Dev + Ops | Data Scientist + ML Engineer + Data Engineer + DevOps |
The critical insight in this table is the failure mode row. Traditional software fails loudly — the application crashes, the API returns a 500 error, users cannot access the system. ML failures are silent — the system keeps running, keeps returning predictions, but those predictions are quietly becoming worse. No alarm sounds. No error log is generated. Business decisions are being made on increasingly unreliable model outputs, and the only signal is a slow degradation in business metrics that takes weeks to trace back to the model.
This silent failure mode is why model monitoring — Pillar 3 — is not optional. It is the entire point.
MLOps Data Scientists Need to Know 2026: The Career Impact
The Salary Trajectory Is Unambiguous
Mumbai's 2026 data science compensation data tells a clear story. Data scientists who understand MLOps — who can not only build models but deploy, monitor, and maintain them in production — command a salary premium that consistently runs 30–50% above the baseline for their experience level.
The mechanic is straightforward: a data scientist who can take a project from exploration to production independently has replaced what would otherwise be a two or three-person handoff between a data scientist, an ML engineer, and a DevOps engineer. That breadth of capability is scarce, is extremely valuable to organisations trying to move faster, and is priced accordingly.
Salary comparison — Mumbai, mid-level (4–6 years):
| Profile | Salary Range |
|---|---|
| Data Scientist, no MLOps skills | ₹18L–₹24L |
| Data Scientist with MLOps skills (Docker, SageMaker, MLflow) | ₹26L–₹36L |
| ML Engineer / MLOps Specialist | ₹30L–₹45L |
| Senior ML Engineer / ML Platform Lead | ₹45L–₹65L |
The Path to Lead and Architect Roles
In Mumbai's BFSI and FinTech environment, the promotion from Senior Data Scientist to Lead or Principal requires demonstrating more than modelling skill. It requires the ability to:
- Design the ML system architecture — not just which model to use, but how data flows from source to feature to model to prediction to monitoring, and how that pipeline scales
- Own production reliability — take accountability for a model's live performance, not just its development-time accuracy
- Enable the team — build the MLOps infrastructure that makes every data scientist on the team more productive and their models more reliable
None of these capabilities are demonstrated by notebook work. All of them are demonstrated by MLOps competency.
The data scientists who reach ₹40L+ Lead roles at Fractal Analytics, Jio Platforms, or HDFC's AI CoE are almost uniformly the ones who built end-to-end MLOps skills — not just the ones with the highest Kaggle scores.
The Mumbai-Specific Context
Mumbai's BFSI sector creates a particular urgency around MLOps that other cities' data science markets do not feel as strongly. When a credit risk model deployed at HDFC Bank influences loan approvals for lakhs of customers monthly, or when a fraud detection model at NPCI processes billions of UPI transactions annually, the consequences of model failure — both to customers and to regulatory compliance — are significant.
The RBI's increasing scrutiny of AI model governance in financial services means that BFSI firms are actively building audit trails, model versioning systems, and monitoring dashboards that satisfy both business performance requirements and regulatory reporting requirements. Data scientists who can build and maintain these systems are not just better engineers — they are a compliance asset.
Building Your MLOps Skill Stack: The 90-Day Plan
If you currently have strong modelling skills but limited MLOps exposure, here is the practical 90-day path to building the fundamentals:
Days 1–20: Git and DVC mastery Set up a project with full Git + DVC versioning. Practise tracking data changes, model versions, and linking them to code commits. Goal: every experiment you run from this point should be reproducible from the commit history.
Days 21–40: Docker and model serving Containerise an existing model project with Docker. Build a FastAPI endpoint that serves predictions. Run the container locally. Deploy it to a free-tier cloud instance. Goal: your model can receive an HTTP request and return a prediction from anywhere.
Days 41–60: MLflow experiment tracking and model registry Add MLflow logging to your training scripts. Build a model registry with at least three versions of a model and practice promoting a challenger model to production. Goal: all your experiments are logged; you can answer "which version of the model is currently in production and why was it chosen?" in under 60 seconds.
Days 61–80: Monitoring with Evidently Implement a drift detection report that runs weekly against production data. Set up an alert that triggers when feature drift exceeds a threshold. Goal: you have a working monitoring system that would have caught a real drift scenario.
Days 81–90: Cloud deployment on AWS SageMaker or Azure ML Deploy your containerised model to a cloud ML platform. Configure auto-scaling. Set up CloudWatch or Azure Monitor dashboards. Goal: your model is live on a cloud endpoint, auto-scales under load, and generates performance metrics you can show in a portfolio.
Deploy Your First Model: Your Next Step
Reading about MLOps is the beginning of understanding it. Building it is the beginning of owning it.
TechPaathshala's Applied MLOps & Production AI Bootcamp is an intensive, hands-on programme for data scientists and ML engineers who are ready to close the gap between "I can build models" and "I can deploy, monitor, and scale them in production" — the gap that is currently worth ₹8L–₹20L annually in Mumbai's job market.
In the bootcamp, you will:
- Deploy your first model to the cloud — a complete, production-grade ML system using Docker, FastAPI, and AWS SageMaker or Azure ML, starting from a model you build in the programme and ending with a live API endpoint that auto-scales under load
- Build a full CI/CD pipeline for ML — using GitHub Actions to automate data validation, model training, performance evaluation, and conditional deployment — so your model retrains and redeploys automatically when new data arrives
- Implement real-time drift monitoring — using Evidently AI to detect and alert on feature drift and prediction drift, with a retraining trigger that responds to detected degradation
- Master MLflow for experiment tracking — logging every training run, building a model registry, and practising the model promotion workflow from staging to production
- Work on Mumbai-specific BFSI case studies — credit risk model deployment with RBI-aligned audit trails, fraud detection pipeline monitoring, and customer analytics system scaling — so your portfolio speaks directly to the problems Mumbai's top employers are hiring to solve
👉 Apply for TechPaathshala's Applied MLOps & Production AI Bootcamp — and take your first model to production before your next salary negotiation.
TechPaathshala is a Mumbai-based technology education platform helping data scientists and ML engineers build the production AI skills that Mumbai's BFSI and FinTech sector is actively hiring for — from model building to cloud-scale deployment and monitoring.

