13 SQL Statements for 80% of Your Data Science Tasks
SQL is a programming language used to manage and manipulate data in databases. It is often used by data analysts and data scientists to get insights from large datasets. This article covers 13 essential SQL statements that can be used to do 80% of data science tasks, such as filtering, sorting, grouping, and aggregating data. It is useful for both beginners of SQL, as it provides valuable insights.
1. SELECT
The SELECT statement is used to get information from a database. You should learn how to use SELECT to filter, sort, and organize data using different commands like WHERE, ORDER BY, and GROUP BY. Here is an example of a SELECT statement:
SELECT column1, column2, column3
FROM table_name
WHERE condition;
This example will get all the data from the columns named column1, column2, and column3 from the table called customers where the customer's age is 18 or older.
SELECT *
FROM customers
WHERE age >= 18;
2. JOIN
To combine data from two or more tables in a database, you should use the JOIN statement. You should also learn how to use different types of JOINs, such as INNER, LEFT, RIGHT, and FULL OUTER, to get data from multiple tables. Examples of JOIN statements include:
2.1. INNER JOIN
An INNER JOIN will only return rows where there is a match between the columns in both tables. For example, if you have two tables, Table A and Table B, and both tables have a column called "ID", an INNER JOIN will only return the rows where the "ID" in Table A matches the "ID" in Table B.
SELECT orders.order_id, customers.customer_name
FROM orders
INNER JOIN customers
ON orders.customer_id = customers.customer_id;
This means that a new table will be created that combines information from the orders table and the customers
table. The new table will only include the order_id
and customer_name
columns, and will only include rows where the customer_id
in the orders table matches the customer_id
in the customers table.
2.2. LEFT JOIN
A LEFT JOIN will combine two tables, the left table and the right table. It will return all the rows from the left table, and any matching rows from the right table. If there are no matches in the right table, the result will include empty values. For example, if you have two tables, one with customer information and one with order information, a LEFT JOIN will combine the two tables and return all the customer information, along with any order information that matches the customer.
SELECT customers.customer_name, orders.order_id
FROM customers
LEFT JOIN orders
ON customers.customer_id = orders.customer_id;
In this example, the customers table and the orders table are connected by the customer_id
column. The resulting table will include all the information from the customers table and the information from the orders table that matches the customer_id
. If there is no match in the orders table, the order_id
column will be empty.
2.3. RIGHT JOIN
A RIGHT JOIN will take all the rows from the right table and any matching rows from the left table. If there are no matches in the left table, the result will include empty values. For example:
SELECT customers.customer_name, orders.order_id
FROM customers
RIGHT JOIN orders
ON customers.customer_id = orders.customer_id;
The orders table and customers
table will be joined together using the customer_id
column. The resulting table will include all the rows from the orders
table and the matching rows from the customers
table. If there is no customer with the same customer_id
in the customers table, the customer_name
column in the resulting table will be empty.
2.4. OUTER JOIN
An OUTER JOIN in SQL is a way to combine two tables and get all the rows from one or both tables, even if there are no matches between the two tables. There are two types of OUTER JOINs: LEFT OUTER JOIN and RIGHT OUTER JOIN.
Here’s an example of a LEFT OUTER JOIN:
SELECT customers.customer_name, orders.order_id
FROM customers
LEFT OUTER JOIN orders
ON customers.customer_id = orders.customer_id;
In this example, the customers
table and the orders table are connected using the customer_id
column. The resulting table will have all the information from the customers table and the information from the orders table that matches the customer_id
. If there is no matching information in the orders
table, the order_id
column will be empty.
Here’s an example of a RIGHT OUTER JOIN:
SELECT customers.customer_name, orders.order_id
FROM customers
RIGHT OUTER JOIN orders
ON customers.customer_id = orders.customer_id;
This means that the orders
table and the customers table will be connected using the customer_id
column. The resulting table will have all the rows from the orders
table and the rows from the customers table that match. If there is no match, the customer_name
column will be empty.
If the database does not support RIGHT OUTER JOIN
, the same result can be achieved by using a LEFT OUTER JOIN
and changing the order of the tables.
3. WHERE
The WHERE statement allows you to select only the data that meets certain criteria. You should practice using it to make sure you can get the data you need.
Using the "where" statement in SQL, we can select only those employees from the "employees" table who work in the "Sales" department and have a salary greater than $50,000.
SELECT name, department, salary
FROM employees
WHERE department = 'Sales' AND salary > 50000;
This query will show the names, departments, and salaries of all employees who work in the Sales department and make more than $50,000.
4. GROUP BY
The GROUP BY statement allows us to group the employees by department and calculate the average salary for each department. This way, we can analyze the data by categories and get a better understanding of the salaries in each department.
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department;
This query will give a list of all the departments and the average salary for each department. It will do this by adding up all the salaries of the employees in each department and dividing it by the number of employees in that department. It will group the employees by department and use the AVG function to calculate the average salary for each department.
department | avg_salary
-----------------------
Sales | 65000
Marketing | 55000
Engineering| 80000
The Sales department has an average salary of $65,000, the Marketing department has an average salary of $55,000, and the Engineering department has an average salary of $80,000.
5. HAVING
The GROUP BY statement is used to group data together, and the HAVING statement is used to filter the grouped data based on certain conditions. It is important to understand how to use the HAVING statement to filter data.
This means that we are looking at a table called "orders" which has columns for "order_id", "customer_id", "product_id", and "quantity". We want to find customers who have ordered a total of at least 50 units of products. We can use the GROUP BY clause to group the orders by customer and calculate the total quantity of each product ordered by each customer. We can then use the HAVING clause to filter the results to only include customers who have ordered a total quantity of at least 50 units.
SELECT customer_id, SUM(quantity) AS total_quantity
FROM orders
GROUP BY customer_id
HAVING SUM(quantity) >= 50;
Get a list of all customers who have ordered at least 50 units of products in total, showing the total quantity of products they have ordered. The output of the query would look something like this:
customer_id | total_quantity
---------------------------
123 | 60
456 | 70
In this example, customer 123 and customer 456 both ordered at least 50 units of products. Customer 123 ordered 60 units and customer 456 ordered 70 units.
6. Window Function
Window functions in SQL are used to do calculations on a group of rows that are related to the row you are currently looking at. These functions are applied to a subset of rows from a table, based on a certain condition or division. Examples of window functions in SQL include:
6.1. ROW_NUMBER():
This function will give each row in a partition a unique number in order. The way to use this function is: ROW_NUMBER()
SELECT column1, column2, ..., ROW_NUMBER() OVER (ORDER BY column1) AS row_num
FROM table_name;
This query will give a list of results with an extra column called "row_num
" that shows the order of the results based on the values in "column1
".
6.2. SUM():
The SUM() function will add up all the values in a column within a partition. The syntax for using this function is: SUM(column_name)
SELECT column1, column2, ..., SUM(column3) OVER (PARTITION BY column1) AS column3_sum
FROM table_name;
This query will give a result with an extra column that shows the total of "column3" for each group based on the values of "column1".
6.3. RANK():
The RANK() function will give each row in a partition a rank based on the values of a certain column. The way to use this function is: RANK(column_name)
.
SELECT column1, column2, ..., RANK() OVER (PARTITION BY column1 ORDER BY column3 DESC) AS rank_num
FROM table_name;
This query will give a list of results with an extra column called "rank_num" that shows the order of each row within each group, based on the size of "column3" from biggest to smallest.
6.4. AVG():
This function calculates the average of a column within a partition. The syntax for the AVG()
function is:
SELECT column1, column2, ..., AVG(column3) OVER (PARTITION BY column1) AS column3_avg
FROM table_name;
This query will give us a list of results with an extra column that shows the average of one column for each group based on the values of another column. The exact syntax may be different depending on the database we are using.
7. UNION
The UNION operator in SQL allows you to combine the results of two or more SELECT statements into one result set. The SELECT statements must have the same number of columns and the data types must match. Any duplicate rows will be removed from the result.
Here’s an example of using the UNION operator in SQL: If we have two tables, customers and employees, with columns for name and city, we can use the UNION operator to make a list of all people who live in New York City from both tables.
SELECT name, city
FROM customers
WHERE city = 'New York'
UNION
SELECT name, city
FROM employees
WHERE city = 'New York';
This query will give a list of everyone who lives in New York City, both customers and employees. It will get the customers who live there from the first SELECT statement and the employees from the second SELECT statement. It will then combine the two lists and remove any duplicates.
The output of the query would look something like this:
name | city
-------------------
John Smith | New York
Jane Doe | New York
Bob Johnson | New York
Samantha Lee| New York
In this example, we can see that there are four people who live in New York City, two from the “customers” table and two from the “employees” table, and the UNION operator has combined the results of the two SELECT statements into a single result set.
8. CREATE
The CREATE statement is used to make a new table, view, or other database object. For example, if we want to make a table called "customers" with columns for "id", "name", "email", and "phone", we can use the CREATE statement to do this.
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(50),
email VARCHAR(100),
phone VARCHAR(20)
);
This query will create a table called "customers" with four columns: "id", "name", "email", and "phone". The "id" column will be an integer and will be the primary key of the table. The "name" column will be a string with a maximum length of 50 characters, the "email" column will be a string with a maximum length of 100 characters, and the "phone" column will be a string with a maximum length of 20 characters.
After the query is executed, we can add new rows to the "customers" table and get data from it.
INSERT INTO customers (id, name, email, phone)
VALUES (1, 'John Doe', 'johndoe@example.com', '555-555-1234');
SELECT * FROM customers;
This query will add a new customer to the customers table with an ID of 1, a name of John Doe, an email of johndoe@example.com
, and a phone number of 555-555-1234
. The second query will get all the customers from the customers table, including the new customer we just added.
id | name | email | phone
--------------------------------------------
1 | John Doe | johndoe@example.com | 555-555-1234
In this example, we made a new table in a database and added a new row to it using the CREATE statement.
9. INSERT
The INSERT statement is used to add new data to a database table. You should learn how to use it. For example, this is how you would use the INSERT statement in SQL:
We can add a new student to the "students" table with an ID of 1234, a name of "John Doe", a major in "Computer Science", and a GPA of 3.5 by using the INSERT statement.
INSERT INTO students (id, name, major, gpa)
VALUES (1234, 'John Doe', 'Computer Science', 3.5);
This query will add a new student to the "students" table with an ID, name, major, and GPA that are specified in the query.
id | name | major | gpa
-----------------------------------------
1234 | John Doe | Computer Science | 3.5
In this example, we added a new student to the "students" table using the INSERT command.
10. UPDATE
The UPDATE statement is used to change existing data in a database table. It is important to learn how to use UPDATE to change the values of one or more columns in a table. For example, here is an example of using the UPDATE statement in SQL:
We can use the UPDATE statement to change the major and GPA of a student with an ID of 1234 in the "students" table, which has columns for "id", "name", "major", and "gpa".
UPDATE students
SET major = 'Mathematics', gpa = 3.7
WHERE id = 1234;
This query will change the major and GPA of the student with an ID of 1234 in the students table. It will use the WHERE clause to specify which row to update and the SET keyword to specify which columns to update.
id | name | major | gpa
--------------------------------------
1234 | John Doe | Mathematics | 3.7
We changed the major and GPA of one student in the students table using the UPDATE statement.
11. DELETE
The DELETE statement is a way to remove data from a database table. It is important to learn how to use it correctly. For example, this is how you would use the DELETE statement in SQL:
We can use the DELETE statement to remove a student with an ID of 1234 from the "students" table, which has columns for "id", "name", "major", and "gpa".
DELETE FROM students
WHERE id = 1234;
This query will delete the row with an ID of 1234 from the "students" table. The WHERE clause is used to specify which row should be deleted, in this case the row with an ID of 1234. After the query is executed, the row with an ID of 1234 will no longer be in the "students" table.
id | name | major | gpa
--------------------------------------
5678 | Jane Doe | Computer Science | 3.5
In this example, we used the DELETE statement to delete one row from the "students" table.
12. DROP
The DROP statement is a way to delete a table or other object from a database. It is important to learn how to use DROP to get rid of any unnecessary tables or objects in the database. The syntax for the DROP statement varies depending on the type of object being deleted, but some common examples are:
12.1. DROP TABLE:
The DROP TABLE statement is used to delete an existing table, including all its data and indexes. The syntax for this statement is:
DROP TABLE table_name;
12.2. DROP INDEX:
This statement that you can use the DROP INDEX statement to remove an index from a table. The syntax for this statement is:
DROP INDEX index_name ON table_name;
12.3. DROP VIEW:
This statement is used to delete an existing view. The syntax for the DROP VIEW statement is:
DROP VIEW view_name;
12.4. DROP PROCEDURE:
This statement is used to delete an existing stored procedure. The syntax for the DROP PROCEDURE statement is:
DROP PROCEDURE procedure_name;
Be aware that the way you write the DROP statement may be different depending on the type of database you are using. Be very careful when using the DROP statement as it will delete the object you specify and all the data and indexes related to it. Make sure you have a backup of your data before using the DROP statement.
13. ALTER
The ALTER statement is used to change the way a database table or other database object looks. You should learn how to use ALTER to add or remove columns, change the type of data stored in a column, or make other changes to the table. The syntax for the ALTER statement varies depending on the type of object being modified, but some common examples are:
13.1. ALTER TABLE:
The ALTER TABLE statement can be used to change the structure of an existing table, like adding or deleting columns, changing data types, or setting constraints. The syntax for the ALTER TABLE statement is:
ALTER TABLE table_name
ADD column_name data_type [constraint],
MODIFY column_name data_type [constraint],
DROP column_name,
ADD CONSTRAINT constraint_name constraint_definition,
DROP CONSTRAINT constraint_name;
13.2. ALTER INDEX:
The ALTER INDEX statement can be used to change the structure of an existing index, like adding or removing columns or changing the type of index. The syntax for this statement is:
ALTER INDEX index_name
ADD column_name,
DROP column_name;
13.3. ALTER VIEW:
The ALTER VIEW statement can be used to change the definition of an existing view. The syntax for this statement is: ALTER VIEW view_name AS SELECT statement.
ALTER VIEW view_name
AS select_statement;
The way you write the ALTER statement may be different depending on the type of database you are using.
Mình hy vọng bạn thích bài viết này và học thêm được điều gì đó mới.
Donate mình một ly cafe hoặc 1 cây bút bi để mình có thêm động lực cho ra nhiều bài viết hay và chất lượng hơn trong tương lai nhé. À mà nếu bạn có bất kỳ câu hỏi nào thì đừng ngại comment hoặc liên hệ mình qua: Zalo - 0374226770 hoặc Facebook. Mình xin cảm ơn.
Momo: NGUYỄN ANH TUẤN - 0374226770
TPBank: NGUYỄN ANH TUẤN - 0374226770 (hoặc 01681423001)
All rights reserved