Contents
- What SQL Actually Is — In Plain Language
- Why Databases Instead of Spreadsheets?
- The Five SQL Concepts That Cover 80% of Analyst Work
- 1. SELECT and FROM — Retrieving Data
- 2. WHERE — Filtering Rows
- 3. GROUP BY and Aggregate Functions — Summarising Data
- 4. JOIN — Combining Data from Multiple Tables
- 5. Subqueries and CTEs — Writing Readable Complex Queries
- A Real-World Analyst Scenario: Putting It All Together
- Why SQL Is Non-Negotiable for Data Analysts in India in 2026
- How to Start: Your SQL Learning Roadmap
- The Honest Answer to "How Long Does SQL Take to Learn?"
Here is something that surprises almost every person who is new to data analytics.
You do not need to know Python to get your first data analyst job. You do not need machine learning. You do not need a statistics degree. You do not need to understand neural networks or build dashboards in Tableau on Day 1.
But SQL? You need SQL. Almost always. Almost everywhere.
Walk through the job descriptions for data analyst roles in Mumbai right now — at banks, at startups, at consulting firms, at e-commerce companies — and you will find one skill listed more consistently than any other. Not Power BI. Not Python. Not Excel. SQL. In 2026, SQL is still the single most reliable prerequisite for a data analyst career in India, and it has been for over a decade.
The good news — and this is genuinely good news for complete beginners and career switchers — is that SQL is one of the most learnable technical skills in existence. It is not a programming language in the way Python or Java is. It does not require a computer science background. It is a query language: a way of asking questions to a database in a structured, English-like syntax that most people can read and understand within days of first exposure.
This guide is for anyone who has never written a SQL query and wants to understand what it is, why it matters, and how to start building this skill in a way that translates directly into job readiness.
What SQL Actually Is — In Plain Language
SQL stands for Structured Query Language. It is the standard language used to communicate with relational databases — the organised, table-based systems where most of the world's business data lives.
Think about the data that exists behind any business you interact with daily:
- Your bank has a database of accounts, transactions, and customers
- Flipkart has a database of products, orders, and users
- Your company has a database of employees, salaries, and attendance
- A hospital has a database of patients, appointments, and diagnoses
All of this data lives in tables — rows and columns, like a very structured spreadsheet. SQL is the language you use to ask questions about those tables: retrieve specific rows, combine data from multiple tables, calculate totals and averages, find patterns, filter by conditions, and sort results.
If you have ever written an Excel formula like =SUMIF(A:A, "Mumbai", B:B) — summing column B wherever column A says "Mumbai" — you already understand the concept behind SQL. You are asking the data a specific question and getting a specific answer. SQL does this, but for databases with millions of rows, multiple connected tables, and far more flexibility than any spreadsheet can offer.
Why Databases Instead of Spreadsheets?
A natural question from anyone with an Excel background: if I can already analyse data in spreadsheets, why do I need SQL?
Scale. Excel handles thousands of rows comfortably. It struggles with hundreds of thousands and breaks down entirely at millions. Real business databases routinely have tens of millions of rows. A SQL query on a properly indexed database can return results from a 50-million-row table in seconds. Excel cannot.
Structure and relationships. Business data is not a single flat table. A customer has orders. Each order has line items. Each line item has a product. Each product has a category and a supplier. These relationships are modelled across multiple connected tables in a relational database. SQL is designed specifically to navigate and combine these relationships. Spreadsheets are not.
Repeatability and sharing. A SQL query is code — it can be saved, shared, version-controlled, and scheduled to run automatically. The VLOOKUP formula you built in Excel to combine two sheets is a one-off operation that breaks when the sheet structure changes. A SQL JOIN is a repeatable, reliable operation on the underlying database structure.
Access. As a data analyst, the data you need to analyse almost always lives in a database — not in a file someone sends you. SQL is the key that opens the door to that data.
The Five SQL Concepts That Cover 80% of Analyst Work
SQL has many commands and functions. But the foundational layer — the part that covers the vast majority of what a working data analyst does every day — is built from five core concepts. Learn these well, and you have the skill that most analyst job descriptions are checking for.
1. SELECT and FROM — Retrieving Data
Every SQL query starts with these two words. SELECT tells the database which columns you want. FROM tells it which table to look in.
SELECT first_name, last_name, city
FROM customers;
This query retrieves three columns — first_name, last_name, and city — from a table called customers. Simple. English-readable. Intuitive.
To retrieve all columns without naming them individually:
SELECT *
FROM customers;
The * is shorthand for "everything." Useful for exploration, but in production queries you should always name the specific columns you need — it makes the query faster and the output easier to understand.
2. WHERE — Filtering Rows
SELECT and FROM retrieve data. WHERE filters it — telling the database which rows you want, not just which columns.
SELECT first_name, last_name, city
FROM customers
WHERE city = 'Mumbai';
This retrieves only the rows where the city is Mumbai. The WHERE clause can use comparison operators (=, !=, >, <, >=, <=), string matching (LIKE), range checking (BETWEEN), and list membership (IN).
-- Customers from Mumbai or Pune
SELECT first_name, last_name, city
FROM customers
WHERE city IN ('Mumbai', 'Pune');
-- Orders above ₹10,000
SELECT order_id, customer_id, order_amount
FROM orders
WHERE order_amount > 10000;
-- Orders placed in January 2026
SELECT order_id, order_date, order_amount
FROM orders
WHERE order_date BETWEEN '2026-01-01' AND '2026-01-31';
-- Products whose name contains "premium"
SELECT product_name, price
FROM products
WHERE product_name LIKE '%premium%';
The -- prefix is a SQL comment — a note to yourself or others reading the query. It does not affect execution.
3. GROUP BY and Aggregate Functions — Summarising Data
This is where SQL becomes genuinely powerful for analysis. Instead of retrieving individual rows, GROUP BY collapses rows into groups and lets you calculate summaries — totals, averages, counts, maximums, minimums — for each group.
The functions used for these summaries are called aggregate functions:
COUNT()— counts rowsSUM()— adds up valuesAVG()— calculates an averageMAX()— finds the highest valueMIN()— finds the lowest value
-- Total sales amount by city
SELECT city, SUM(order_amount) AS total_sales
FROM orders
GROUP BY city;
This query produces one row per city, with the total order amount for each. The AS total_sales part renames the calculated column — giving it a readable label in the output.
-- Number of orders and average order value by product category
SELECT
category,
COUNT(order_id) AS number_of_orders,
AVG(order_amount) AS average_order_value,
MAX(order_amount) AS largest_order
FROM orders
GROUP BY category
ORDER BY total_revenue DESC;
ORDER BY sorts the results. DESC means descending (highest to lowest). ASC (the default) means ascending.
This kind of query — group by a dimension, calculate a metric, sort by the result — is the backbone of most business reporting. Revenue by region. Orders by product category. Customers by acquisition channel. Transactions by month. If you understand GROUP BY with aggregate functions, you understand the core of analytical SQL.
4. JOIN — Combining Data from Multiple Tables
This is the concept that intimidates beginners the most — and the one that separates SQL from spreadsheet analysis most clearly. JOIN is how you combine data from two or more tables based on a shared column.
Imagine a database with two tables:
customers table:
| customer_id | first_name | city |
|---|---|---|
| 1 | Priya | Mumbai |
| 2 | Rahul | Pune |
| 3 | Anita | Mumbai |
orders table:
| order_id | customer_id | order_amount |
|---|---|---|
| 101 | 1 | 4500 |
| 102 | 1 | 8200 |
| 103 | 3 | 3100 |
The orders table has a customer_id column that connects it to the customers table. To get a combined view — the customer's name alongside their order details — you use a JOIN:
SELECT
c.first_name,
c.city,
o.order_id,
o.order_amount
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;
Result:
| first_name | city | order_id | order_amount |
|---|---|---|---|
| Priya | Mumbai | 101 | 4500 |
| Priya | Mumbai | 102 | 8200 |
| Anita | Mumbai | 103 | 3100 |
The c and o are aliases — shorthand names for the tables that make the query easier to read. ON c.customer_id = o.customer_id is the join condition — the column that links the two tables.
Rahul (customer_id = 2) does not appear in the result because they have no orders in the orders table. This is an INNER JOIN (the default) — it only returns rows that have matching records in both tables.
The JOIN types every analyst should know:
-- INNER JOIN: only rows with matches in both tables (shown above)
SELECT c.first_name, o.order_amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;
-- LEFT JOIN: all rows from the left table, matched rows from the right
-- Customers WITHOUT orders will appear with NULL in the order columns
SELECT c.first_name, o.order_amount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;
-- This LEFT JOIN is how you find customers who have never placed an order
SELECT c.first_name, c.city
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;
Understanding JOINs is what allows you to combine data from across an organisation's database — connecting customer records with transaction records, product records with sales records, employee records with performance records — to answer the business questions that matter.
5. Subqueries and CTEs — Writing Readable Complex Queries
As your analytical questions get more complex, your queries need to build on intermediate results. There are two standard ways to do this: subqueries and Common Table Expressions (CTEs).
Subqueries embed one query inside another:
-- Find customers whose total spend is above the average customer spend
SELECT customer_id, SUM(order_amount) AS total_spend
FROM orders
GROUP BY customer_id
HAVING SUM(order_amount) > (
SELECT AVG(customer_total)
FROM (
SELECT customer_id, SUM(order_amount) AS customer_total
FROM orders
GROUP BY customer_id
) AS customer_totals
);
This works, but nested subqueries become hard to read quickly. CTEs are the more readable alternative.
CTEs (Common Table Expressions) use the WITH keyword to define named intermediate results that subsequent queries can reference:
-- Same query, written with a CTE — much more readable
WITH customer_totals AS (
SELECT customer_id, SUM(order_amount) AS total_spend
FROM orders
GROUP BY customer_id
),
average_spend AS (
SELECT AVG(total_spend) AS avg_spend
FROM customer_totals
)
SELECT
ct.customer_id,
ct.total_spend
FROM customer_totals ct, average_spend av
WHERE ct.total_spend > av.avg_spend
ORDER BY ct.total_spend DESC;
The logic is identical. The CTE version names each step, making the query self-documenting and significantly easier to debug when something goes wrong.
CTEs are the standard in production analyst SQL — every data analyst role worth having will expect you to be comfortable reading and writing them.
A Real-World Analyst Scenario: Putting It All Together
Here is the kind of question a business stakeholder asks a data analyst — and the SQL that answers it.
The question: "Which cities generated the most revenue last quarter, and how many unique customers placed orders in each city?"
WITH quarterly_orders AS (
-- Step 1: Filter to last quarter's orders only
SELECT
o.order_id,
o.customer_id,
o.order_amount,
o.order_date
FROM orders o
WHERE o.order_date BETWEEN '2025-10-01' AND '2025-12-31'
),
city_metrics AS (
-- Step 2: Join with customers to get city, then aggregate
SELECT
c.city,
SUM(qo.order_amount) AS total_revenue,
COUNT(DISTINCT qo.customer_id) AS unique_customers,
COUNT(qo.order_id) AS total_orders
FROM quarterly_orders qo
JOIN customers c ON qo.customer_id = c.customer_id
GROUP BY c.city
)
-- Step 3: Return results, highest revenue first
SELECT
city,
total_revenue,
unique_customers,
total_orders,
ROUND(total_revenue / unique_customers, 0) AS revenue_per_customer
FROM city_metrics
ORDER BY total_revenue DESC
LIMIT 10;
This query uses every concept covered above: SELECT, FROM, WHERE, JOIN, GROUP BY, aggregate functions (SUM, COUNT, ROUND), CTEs, and ORDER BY. It answers a real business question in a way that could go directly into a Power BI or Tableau data source.
This is analyst SQL. It is learnable. And writing queries like this is what you are working toward.
Why SQL Is Non-Negotiable for Data Analysts in India in 2026
The question of whether SQL is still relevant in an era of Python, AI, and automated analytics tools comes up frequently. The answer is unambiguous: yes, more than ever.
It is in every job description. Scan Naukri, LinkedIn, and Instahire for data analyst roles in Mumbai, Bengaluru, and Pune. SQL appears as a required skill in over 90% of postings — consistently more than Python, Power BI, or Tableau individually.
It is the language of databases, and databases are everywhere. Every organisation that stores structured data — which is every organisation — uses a relational database or a system that speaks SQL. PostgreSQL, MySQL, Microsoft SQL Server, Google BigQuery, Amazon Redshift, Snowflake — these are all SQL-based systems. The specific dialect varies slightly; the core language is the same.
AI tools do not eliminate SQL — they use it. AI-powered analytics tools generate SQL queries on your behalf. Being able to read, understand, and validate that generated SQL is itself a skill. Analysts who do not understand SQL cannot verify whether the AI-generated query is actually answering the question correctly.
It is the skill that gives you data independence. An analyst who cannot write SQL depends on a data engineer or developer to extract the data they need. An analyst who can write SQL can answer their own questions, iterate on their own analysis, and move at the speed of the business problem rather than the speed of a data request queue.
How to Start: Your SQL Learning Roadmap
The path from "I have never written SQL" to "I can confidently write analyst-level SQL" is shorter than most people expect. Here is a realistic, structured roadmap.
Week 1 — The Fundamentals
SELECT,FROM,WHEREwith various filter conditionsORDER BYfor sortingLIMITfor controlling result size- Practice tool: SQLiteOnline.com (no installation, runs in browser) or DB Fiddle
Week 2 — Aggregation
GROUP BYwithCOUNT,SUM,AVG,MAX,MINHAVING(filtering groups, not rows — the GROUP BY equivalent of WHERE)DISTINCTfor removing duplicates- Practice: Find a public dataset (Indian census data, sales datasets on Kaggle) and write 10 aggregation queries
Week 3 — Joins
INNER JOIN,LEFT JOIN,RIGHT JOIN- Joining three or more tables
- Using table aliases
- Practice: Build a small sample database (customers + orders + products) and join across all three tables
Week 4 — Intermediate Techniques
- Subqueries (in
WHEREclause, inFROMclause) - CTEs with the
WITHkeyword - Window functions:
ROW_NUMBER(),RANK(),LAG(),LEAD()(advanced but high-value for analyst roles) - String functions:
UPPER(),LOWER(),TRIM(),CONCAT(),SUBSTRING() - Date functions:
YEAR(),MONTH(),DATEDIFF(),DATE_TRUNC()
By end of Month 1: Write 5 queries that answer real business questions on a dataset you care about. Publish them to GitHub. This is the beginning of your SQL portfolio.
The tools to practice on:
- SQLiteOnline.com — browser-based, no setup, good for beginners
- PostgreSQL (free, open source) — the standard for most analyst roles
- Google BigQuery — free tier available, used heavily in Mumbai's analytics ecosystem
- MySQL Workbench — widely used in India's SME sector
The Honest Answer to "How Long Does SQL Take to Learn?"
For the fundamentals covered in this guide — SELECT, WHERE, JOIN, GROUP BY, CTEs — most dedicated learners reach practical proficiency in 4–6 weeks at 45–60 minutes per day.
This is not a certification milestone. It is the level at which you can answer real analytical questions independently, write queries that would pass a basic SQL screening test, and start contributing to a data team from Day 1.
Advanced SQL — window functions, query optimisation, indexing strategy, stored procedures — takes longer and is learned through practice on real problems, not through a structured beginner curriculum. The intermediate level is enough to start. The advanced level comes with experience.
The most important thing about the SQL learning timeline is this: it starts when you open a SQL editor and write your first query. Not when you finish reading about SQL. Not when you feel ready. When you write the first line.

