SQL for Beginners — Why Every Data Analyst Must Know It in 2026

Written by: Techpaathshala
20 Min Read
SQL for Beginners — Why Every Data Analyst Must Know It in 2026

Here is something that surprises almost every person who is new to data analytics.

You do not need to know Python to get your first data analyst job. You do not need machine learning. You do not need a statistics degree. You do not need to understand neural networks or build dashboards in Tableau on Day 1.

But SQL? You need SQL. Almost always. Almost everywhere.

Walk through the job descriptions for data analyst roles in Mumbai right now — at banks, at startups, at consulting firms, at e-commerce companies — and you will find one skill listed more consistently than any other. Not Power BI. Not Python. Not Excel. SQL. In 2026, SQL is still the single most reliable prerequisite for a data analyst career in India, and it has been for over a decade.

The good news — and this is genuinely good news for complete beginners and career switchers — is that SQL is one of the most learnable technical skills in existence. It is not a programming language in the way Python or Java is. It does not require a computer science background. It is a query language: a way of asking questions to a database in a structured, English-like syntax that most people can read and understand within days of first exposure.

This guide is for anyone who has never written a SQL query and wants to understand what it is, why it matters, and how to start building this skill in a way that translates directly into job readiness.


What SQL Actually Is — In Plain Language

SQL stands for Structured Query Language. It is the standard language used to communicate with relational databases — the organised, table-based systems where most of the world's business data lives.

Think about the data that exists behind any business you interact with daily:

  • Your bank has a database of accounts, transactions, and customers
  • Flipkart has a database of products, orders, and users
  • Your company has a database of employees, salaries, and attendance
  • A hospital has a database of patients, appointments, and diagnoses

All of this data lives in tables — rows and columns, like a very structured spreadsheet. SQL is the language you use to ask questions about those tables: retrieve specific rows, combine data from multiple tables, calculate totals and averages, find patterns, filter by conditions, and sort results.

If you have ever written an Excel formula like =SUMIF(A:A, "Mumbai", B:B) — summing column B wherever column A says "Mumbai" — you already understand the concept behind SQL. You are asking the data a specific question and getting a specific answer. SQL does this, but for databases with millions of rows, multiple connected tables, and far more flexibility than any spreadsheet can offer.

Advertisement

Why Databases Instead of Spreadsheets?

A natural question from anyone with an Excel background: if I can already analyse data in spreadsheets, why do I need SQL?

Scale. Excel handles thousands of rows comfortably. It struggles with hundreds of thousands and breaks down entirely at millions. Real business databases routinely have tens of millions of rows. A SQL query on a properly indexed database can return results from a 50-million-row table in seconds. Excel cannot.

Structure and relationships. Business data is not a single flat table. A customer has orders. Each order has line items. Each line item has a product. Each product has a category and a supplier. These relationships are modelled across multiple connected tables in a relational database. SQL is designed specifically to navigate and combine these relationships. Spreadsheets are not.

Repeatability and sharing. A SQL query is code — it can be saved, shared, version-controlled, and scheduled to run automatically. The VLOOKUP formula you built in Excel to combine two sheets is a one-off operation that breaks when the sheet structure changes. A SQL JOIN is a repeatable, reliable operation on the underlying database structure.

Access. As a data analyst, the data you need to analyse almost always lives in a database — not in a file someone sends you. SQL is the key that opens the door to that data.


The Five SQL Concepts That Cover 80% of Analyst Work

SQL has many commands and functions. But the foundational layer — the part that covers the vast majority of what a working data analyst does every day — is built from five core concepts. Learn these well, and you have the skill that most analyst job descriptions are checking for.


1. SELECT and FROM — Retrieving Data

Every SQL query starts with these two words. SELECT tells the database which columns you want. FROM tells it which table to look in.

SELECT first_name, last_name, city
FROM customers;

This query retrieves three columns — first_name, last_name, and city — from a table called customers. Simple. English-readable. Intuitive.

To retrieve all columns without naming them individually:

SELECT *
FROM customers;

The * is shorthand for "everything." Useful for exploration, but in production queries you should always name the specific columns you need — it makes the query faster and the output easier to understand.


2. WHERE — Filtering Rows

SELECT and FROM retrieve data. WHERE filters it — telling the database which rows you want, not just which columns.

SELECT first_name, last_name, city
FROM customers
WHERE city = 'Mumbai';

This retrieves only the rows where the city is Mumbai. The WHERE clause can use comparison operators (=, !=, >, <, >=, <=), string matching (LIKE), range checking (BETWEEN), and list membership (IN).

-- Customers from Mumbai or Pune
SELECT first_name, last_name, city
FROM customers
WHERE city IN ('Mumbai', 'Pune');

-- Orders above ₹10,000
SELECT order_id, customer_id, order_amount
FROM orders
WHERE order_amount > 10000;

-- Orders placed in January 2026
SELECT order_id, order_date, order_amount
FROM orders
WHERE order_date BETWEEN '2026-01-01' AND '2026-01-31';

-- Products whose name contains "premium"
SELECT product_name, price
FROM products
WHERE product_name LIKE '%premium%';

The -- prefix is a SQL comment — a note to yourself or others reading the query. It does not affect execution.


3. GROUP BY and Aggregate Functions — Summarising Data

This is where SQL becomes genuinely powerful for analysis. Instead of retrieving individual rows, GROUP BY collapses rows into groups and lets you calculate summaries — totals, averages, counts, maximums, minimums — for each group.

The functions used for these summaries are called aggregate functions:

  • COUNT() — counts rows
  • SUM() — adds up values
  • AVG() — calculates an average
  • MAX() — finds the highest value
  • MIN() — finds the lowest value
-- Total sales amount by city
SELECT city, SUM(order_amount) AS total_sales
FROM orders
GROUP BY city;

This query produces one row per city, with the total order amount for each. The AS total_sales part renames the calculated column — giving it a readable label in the output.

-- Number of orders and average order value by product category
SELECT 
    category,
    COUNT(order_id) AS number_of_orders,
    AVG(order_amount) AS average_order_value,
    MAX(order_amount) AS largest_order
FROM orders
GROUP BY category
ORDER BY total_revenue DESC;

ORDER BY sorts the results. DESC means descending (highest to lowest). ASC (the default) means ascending.

This kind of query — group by a dimension, calculate a metric, sort by the result — is the backbone of most business reporting. Revenue by region. Orders by product category. Customers by acquisition channel. Transactions by month. If you understand GROUP BY with aggregate functions, you understand the core of analytical SQL.


4. JOIN — Combining Data from Multiple Tables

This is the concept that intimidates beginners the most — and the one that separates SQL from spreadsheet analysis most clearly. JOIN is how you combine data from two or more tables based on a shared column.

Imagine a database with two tables:

customers table:

customer_idfirst_namecity
1PriyaMumbai
2RahulPune
3AnitaMumbai

orders table:

order_idcustomer_idorder_amount
10114500
10218200
10333100

The orders table has a customer_id column that connects it to the customers table. To get a combined view — the customer's name alongside their order details — you use a JOIN:

SELECT 
    c.first_name,
    c.city,
    o.order_id,
    o.order_amount
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;

Result:

first_namecityorder_idorder_amount
PriyaMumbai1014500
PriyaMumbai1028200
AnitaMumbai1033100

The c and o are aliases — shorthand names for the tables that make the query easier to read. ON c.customer_id = o.customer_id is the join condition — the column that links the two tables.

Rahul (customer_id = 2) does not appear in the result because they have no orders in the orders table. This is an INNER JOIN (the default) — it only returns rows that have matching records in both tables.

The JOIN types every analyst should know:

-- INNER JOIN: only rows with matches in both tables (shown above)
SELECT c.first_name, o.order_amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;

-- LEFT JOIN: all rows from the left table, matched rows from the right
-- Customers WITHOUT orders will appear with NULL in the order columns
SELECT c.first_name, o.order_amount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;

-- This LEFT JOIN is how you find customers who have never placed an order
SELECT c.first_name, c.city
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;

Understanding JOINs is what allows you to combine data from across an organisation's database — connecting customer records with transaction records, product records with sales records, employee records with performance records — to answer the business questions that matter.


5. Subqueries and CTEs — Writing Readable Complex Queries

As your analytical questions get more complex, your queries need to build on intermediate results. There are two standard ways to do this: subqueries and Common Table Expressions (CTEs).

Subqueries embed one query inside another:

-- Find customers whose total spend is above the average customer spend
SELECT customer_id, SUM(order_amount) AS total_spend
FROM orders
GROUP BY customer_id
HAVING SUM(order_amount) > (
    SELECT AVG(customer_total)
    FROM (
        SELECT customer_id, SUM(order_amount) AS customer_total
        FROM orders
        GROUP BY customer_id
    ) AS customer_totals
);

This works, but nested subqueries become hard to read quickly. CTEs are the more readable alternative.

CTEs (Common Table Expressions) use the WITH keyword to define named intermediate results that subsequent queries can reference:

-- Same query, written with a CTE — much more readable
WITH customer_totals AS (
    SELECT customer_id, SUM(order_amount) AS total_spend
    FROM orders
    GROUP BY customer_id
),
average_spend AS (
    SELECT AVG(total_spend) AS avg_spend
    FROM customer_totals
)
SELECT 
    ct.customer_id,
    ct.total_spend
FROM customer_totals ct, average_spend av
WHERE ct.total_spend > av.avg_spend
ORDER BY ct.total_spend DESC;

The logic is identical. The CTE version names each step, making the query self-documenting and significantly easier to debug when something goes wrong.

CTEs are the standard in production analyst SQL — every data analyst role worth having will expect you to be comfortable reading and writing them.


A Real-World Analyst Scenario: Putting It All Together

Here is the kind of question a business stakeholder asks a data analyst — and the SQL that answers it.

The question: "Which cities generated the most revenue last quarter, and how many unique customers placed orders in each city?"

WITH quarterly_orders AS (
    -- Step 1: Filter to last quarter's orders only
    SELECT 
        o.order_id,
        o.customer_id,
        o.order_amount,
        o.order_date
    FROM orders o
    WHERE o.order_date BETWEEN '2025-10-01' AND '2025-12-31'
),
city_metrics AS (
    -- Step 2: Join with customers to get city, then aggregate
    SELECT 
        c.city,
        SUM(qo.order_amount)        AS total_revenue,
        COUNT(DISTINCT qo.customer_id) AS unique_customers,
        COUNT(qo.order_id)          AS total_orders
    FROM quarterly_orders qo
    JOIN customers c ON qo.customer_id = c.customer_id
    GROUP BY c.city
)
-- Step 3: Return results, highest revenue first
SELECT 
    city,
    total_revenue,
    unique_customers,
    total_orders,
    ROUND(total_revenue / unique_customers, 0) AS revenue_per_customer
FROM city_metrics
ORDER BY total_revenue DESC
LIMIT 10;

This query uses every concept covered above: SELECT, FROM, WHERE, JOIN, GROUP BY, aggregate functions (SUM, COUNT, ROUND), CTEs, and ORDER BY. It answers a real business question in a way that could go directly into a Power BI or Tableau data source.

This is analyst SQL. It is learnable. And writing queries like this is what you are working toward.


Why SQL Is Non-Negotiable for Data Analysts in India in 2026

The question of whether SQL is still relevant in an era of Python, AI, and automated analytics tools comes up frequently. The answer is unambiguous: yes, more than ever.

It is in every job description. Scan Naukri, LinkedIn, and Instahire for data analyst roles in Mumbai, Bengaluru, and Pune. SQL appears as a required skill in over 90% of postings — consistently more than Python, Power BI, or Tableau individually.

It is the language of databases, and databases are everywhere. Every organisation that stores structured data — which is every organisation — uses a relational database or a system that speaks SQL. PostgreSQL, MySQL, Microsoft SQL Server, Google BigQuery, Amazon Redshift, Snowflake — these are all SQL-based systems. The specific dialect varies slightly; the core language is the same.

AI tools do not eliminate SQL — they use it. AI-powered analytics tools generate SQL queries on your behalf. Being able to read, understand, and validate that generated SQL is itself a skill. Analysts who do not understand SQL cannot verify whether the AI-generated query is actually answering the question correctly.

It is the skill that gives you data independence. An analyst who cannot write SQL depends on a data engineer or developer to extract the data they need. An analyst who can write SQL can answer their own questions, iterate on their own analysis, and move at the speed of the business problem rather than the speed of a data request queue.


How to Start: Your SQL Learning Roadmap

The path from "I have never written SQL" to "I can confidently write analyst-level SQL" is shorter than most people expect. Here is a realistic, structured roadmap.

Week 1 — The Fundamentals

  • SELECT, FROM, WHERE with various filter conditions
  • ORDER BY for sorting
  • LIMIT for controlling result size
  • Practice tool: SQLiteOnline.com (no installation, runs in browser) or DB Fiddle

Week 2 — Aggregation

  • GROUP BY with COUNT, SUM, AVG, MAX, MIN
  • HAVING (filtering groups, not rows — the GROUP BY equivalent of WHERE)
  • DISTINCT for removing duplicates
  • Practice: Find a public dataset (Indian census data, sales datasets on Kaggle) and write 10 aggregation queries

Week 3 — Joins

  • INNER JOIN, LEFT JOIN, RIGHT JOIN
  • Joining three or more tables
  • Using table aliases
  • Practice: Build a small sample database (customers + orders + products) and join across all three tables

Week 4 — Intermediate Techniques

  • Subqueries (in WHERE clause, in FROM clause)
  • CTEs with the WITH keyword
  • Window functions: ROW_NUMBER(), RANK(), LAG(), LEAD() (advanced but high-value for analyst roles)
  • String functions: UPPER(), LOWER(), TRIM(), CONCAT(), SUBSTRING()
  • Date functions: YEAR(), MONTH(), DATEDIFF(), DATE_TRUNC()

By end of Month 1: Write 5 queries that answer real business questions on a dataset you care about. Publish them to GitHub. This is the beginning of your SQL portfolio.

The tools to practice on:

  • SQLiteOnline.com — browser-based, no setup, good for beginners
  • PostgreSQL (free, open source) — the standard for most analyst roles
  • Google BigQuery — free tier available, used heavily in Mumbai's analytics ecosystem
  • MySQL Workbench — widely used in India's SME sector

The Honest Answer to "How Long Does SQL Take to Learn?"

For the fundamentals covered in this guide — SELECT, WHERE, JOIN, GROUP BY, CTEs — most dedicated learners reach practical proficiency in 4–6 weeks at 45–60 minutes per day.

This is not a certification milestone. It is the level at which you can answer real analytical questions independently, write queries that would pass a basic SQL screening test, and start contributing to a data team from Day 1.

Advanced SQL — window functions, query optimisation, indexing strategy, stored procedures — takes longer and is learned through practice on real problems, not through a structured beginner curriculum. The intermediate level is enough to start. The advanced level comes with experience.

The most important thing about the SQL learning timeline is this: it starts when you open a SQL editor and write your first query. Not when you finish reading about SQL. Not when you feel ready. When you write the first line.

Share This Article

Leave a Reply