Skip to main content

Command Palette

Search for a command to run...

SQL for Data Analysis (Learn what matters)

Published
6 min read
SQL for Data Analysis (Learn what matters)

SQL Queries

  1.   show tables;
    

    It shows all the tables available in dataset

  2.   desc products;
    

    Describe a particular table, here ‘products’ is a table

  3.   select * from yearly_sales;
    

    ‘*’ refers everything, this means showing everything in ‘yearly_sales’ table

  4.   select Product, Cost_per_packet from products;
    

    This means show ‘Product’ & ‘Cost_per_packet’ column from ‘products’ table

  5.   select SaleDate, Amount, packetes, Amount / packetes from yearly_sales;
    

    Here we are showing some columns from ‘yearly_sales’ table & an extra column for calculating ‘Amount per packets’.

  6.   select SaleDate, Amount, packets, Amount / packetes 'Amount_per_packet' from yearly_sales;
    

    or

     select SaleDate, Amount, packetes, Amount / packets as 'Amount_per_packet' from yearly_sales;
    

    Giving the new column a specific name

  7.   select * from yearly_sales where Amount > 20000;
    

    writing conditions, show yearly_sales data only where the column Amount is above 20,000. It is like FILTERING in excel

  8.   select * from yearly_sales
      where Amount > 20000
      order by Amount;
    

    data by order. This will show the data for ‘yearly_sales’ table where the ‘amount’ is grater than 20,000 and then arrange them by the ‘amount’ data in ascending order (by default).

  9.   select * from yearly_sales
      where Amount > 20000
      order by Amount desc;
    

    in descending order.

  10. select * from yearly_sales
    where LocID='L4';
    

    yearly_sales data where ‘LocID’ is ‘L4’.

  11. select * from yearly_sales
    where LocID='L4'
    order by Pincode desc;
    

    order by ‘Pincode’ in Descending order

  12.  select * from yearly_sales
     where Amount>10000 and Curr_Date>='2022-01-01';
    

    here, we want to see all the data where value>10000 & year is after 2022. (using ‘and’ operation)

  13.  select * from yearly_sales
     where Amount>10000 or year(Curr_Date) = 2022;
    

    ‘or’ operation

  14.  select * from yearly_sales
     where Amount>10000 and year(Curr_Date) = 2022;
    

    here, we want to see all the data where value>10000 & year=2022, (here the date is in this format -

    2022-01-05 00:00:00

  15. select * from yearly_sales
    where packets>0 and packets<=100;
    

    or,

    (using ‘between’ keyword)

    select * from yearly_sales
    where Boxes between 0 and 100; 
    order by Boxes;
    

    Here, we get the data where No. of packets is between 0 to 100.

  16. select SaleDate, weekday(SaleDate) as `week_day` ,Amount, Customers, Boxes
    from yearly_sales
    where weekday(SaleDate) = 4;
    

    Here, we want to show all the shipments that happend on Thrusday (using ‘weekday’ function).

  17. select * from students
    where Team in ('Hufflepuff', 'Ravenclaw');
    

    alternative of ‘or’ operator - ‘in clause’. Here we can see the data from ‘students’ table where the team is either ‘Hufflepuff’ or ‘Ravenclaw’.

  18. select * from students
    where Student like 'B%';
    

    ‘like’ operator. Here we will get every students name that starts with ‘B’.

  19. select * from students
    where Student like '%B%';
    

    here we will get every student data which has ‘B’ anywhere in their name.

  20. select SaleDate, Amount, Customers, Boxes,
            case    when amount < 1000 then 'under 1000'
                    when amount < 5000 then 'under 5000'
                    when amount < 10000 then 'under 10000'
            else 'more than 10,000'
            end as 'Amount category'
    from yearly_sales;
    

    or

    select *,
            case    when amount < 1000 then 'under 1000'
                    when amount < 5000 then 'under 5000'
                    when amount < 10000 then 'under 10000'
            else 'more than 10,000'
            end as 'Amount category'
    from yearly_sales;
    

    Here, we made a ‘amount’ category where every amount upto 1000 having a label of ‘under 1000’,

    ‘amount’ between 1000 to 5000 having the labelof ‘under 5000’

    ‘amount’ between 5000 to 10,000 having the label ‘under 10,000’

    & anything more than 10,000 having a label of ‘more than 10,000’

Joins (between multiple tables)

suppose there is two tables - ‘students’ and ‘results’. Common column is ‘RollNo’

Joining the both table

select r.total, r.grade, s.full_name
from results as r
join students as s on s.RollNo = s.RollNo;

There is two types of join - Left Join & Right Join

Joining multiple tables:

suppose there is an another table named ‘classrooms’

we want to see student name from ‘students’ table, total marks from ‘results’ table & classroom name from ‘classrooms’ table -

select r.total, r.grade, s.full_name, c.room_name
from results as r
join students as s on s.RollNo = r.RollNo
join classrooms as c on c.RoomNm = r.RoomNm;

with conditions,

select s.full_name, c.room_name, r.total, r.grade
from results r
join students s on s.RollNo = r.RollNo
join classrooms c on c.RoomNm = r.RoomNm
where r.total < 100
and c.RoomNm = "Gryffindor";

Now if you want all the students data who are not included in a specific house/room -

select s.full_name, c.room_name, r.total, r.grade
from results r
join students s on s.RollNo = r.RollNo
join classrooms c on c.RoomNm = r.RoomNm
where c.RoomNm = '';

or,

select s.full_name, c.room_name, r.total, r.grade
from results r
join students s on s.RollNo = r.RollNo
join classrooms c on c.RoomNm = r.RoomNm
where c.RoomNm is null;

Now if you want data except the students who are not specifid in a perticular house/room -

(‘<>’ means ‘not is epual to’) [also you can use the ‘having’ option]

select s.full_name, c.room_name, r.total, r.grade
from results r
join students s on s.RollNo = r.RollNo
join classrooms c on c.RoomNm = r.RoomNm
where c.RoomNm <> '';

Now I want the marks and grade only of those students who are from India or Bangaladesh -

select s.full_name, r.total, r.grade
from results r
join students s on s.RollNo = r.RollNo
join countries co on co.CountryId = r.CountryId
where co.Country in ("India", "Bangaladesh")
order by r.total desc;

Using Group By

Now suppose I want total marks group by Houses/Rooms

select RoomNm, sum(total) as sum
from results 
group by RoomNm;

result -

RoomNm

sum

Gryffindor

67923

Hufflepuff

54542

Ravenclaw

89565

Slytherin

45675

It is like Pivot table in excel.

also,

select RoomNm, sum(total) as sum, avg(total) as Avg
from results 
group by RoomNm;

using two ‘group by’

select c.Country, s.RoomNm, sum(total) as sum, avg(total) as Avg
from results as r
join students as s on r.RollNo = s.RollNo
join countries as c on c.CountryId = r.CountryId
group by s.RoomNm, c.Country;

also sorting it -

select c.Country, s.RoomNm, sum(total) as sum, avg(total) as Avg
from results as r
join students as s on r.RollNo = s.RollNo
join countries as c on c.CountryId = r.CountryId
group by s.RoomNm, c.Country
order by c.Country;

Showing the Top 10 Students

select s.Student, r.total
from results r
join students s on s.RollNo = r.RollNo
order by r.total desc
limit 10;