Final Project 1: Exercise SQL — Analyzing Tokopaedi Data

Ramadhian Ekaputra
11 min read3 days ago

--

What is SQL?

Structured Query Language (SQL) is a programming language for storing and processing information in a relational database. A relational database stores information in the form of tables, with rows and columns representing the different data attributes and the various relationships between data values. You can use SQL statements to store, update, delete, search, and retrieve information from databases. You can also use SQL to maintain and optimize database performance.

Why SQL?

Structured Query Language (SQL) is a popular query language that is often used in all types of applications. Data analysts and developers learn and use SQL because it integrates well with various programming languages. For example, they can embed SQL queries with the Java programming language to build high-performance data processing applications with major SQL database systems, such as PostgreSQL or MS SQL Server. SQL is also quite easy to learn because it uses common English keywords in its statements.

Exercise SQL:

The following dataset will be used:

Dataset Details for Query

The data used is data from Tokopedia (not real data). The explanation of the dataset is as follows:

order_detail:

  1. id → unique number of order / order_id
  2. customer_id → unique number of the customer
  3. order_date → date when the transaction was made
  4. sku_id → unique number of the product (sku is stock keeping unit)
  5. price → the price indicated in the price tagging
  6. qty_ordered → the quantity of the item purchased by the customer
  7. before_discount → the total price value of the product (price * qty_ordered)
  8. discount_amount → total product discount value
  9. after_discount → the total price value of the product when it has been reduced by the discount
  10. is_gross → indicates the customer has not yet paid for the order
  11. is_valid → indicates the customer has made payment
  12. is_net → indicates the transaction has been completed
  13. payment_id → unique number of the payment method

sku_detail:

  1. id → unique number of the product (can be used as key during join)
  2. sku_name → name of the product
  3. base_price → the price of the item listed in the price tagging
  4. cogs → cost of goods sold / total cost to sell 1 product
  5. category → product category

customer_detail:

  1. id → unique number of the customer
  2. registered_date → the date the customer started registering as a member

Payment_detail:

  1. id → unique number of the payment method
  2. payment_method → the payment method used

Questions:

Number 1:

Q: Selama transaksi yang terjadi selama 2021, pada bulan apa total nilai transaksi
(after_discount) paling besar? Gunakan is_valid = 1 untuk memfilter data transaksi.
Source table: order_detail

Nomor 2
Q: Selama transaksi pada tahun 2022, kategori apa yang menghasilkan nilai transaksi paling besar? Gunakan is_valid = 1 untuk memfilter data transaksi.
Source table: order_detail, sku_detail

Nomor 3
Q: Bandingkan nilai transaksi dari masing-masing kategori pada tahun 2021 dengan 2022. Sebutkan kategori apa saja yang mengalami peningkatan dan kategori apa yang mengalami penurunan nilai transaksi dari tahun 2021 ke 2022. Gunakan is_valid = 1 untuk memfilter data transaksi.
Source table: order_detail, sku_detail

Nomor 4
Q: Tampilkan top 5 metode pembayaran yang paling populer digunakan selama 2022
(berdasarkan total unique order). Gunakan is_valid = 1 untuk memfilter data transaksi.
Source table: order_detail, payment_method

Nomor 5
Q: Urutkan dari ke-5 produk ini berdasarkan nilai transaksinya.
1. Samsung
2. Apple
3. Sony
4. Huawei
5. Lenovo
Gunakan is_valid = 1 untuk memfilter data transaksi.
Source table: order_detail, sku_detail

Import The Table

  1. Download the dataset from this link: https://drive.google.com/drive/folders/1ElqO0TR7vYs9X2GwvZX8YFWYBoRcBvpY
  2. The dataset are: customer_detail.txt, order_detail.txt, payment_detail.txt, and sku_detail.txt
  3. Go To https://sqliteonline.com/
Click the Import button
Click Open File
Browse all the 4 items. Click open.
On the “Custom name” choose ‘First line’, Click Ok

1CASE 1: During the transactions that occurred during 2021, in which month was the total transaction value (after_discount) the largest?

  1. Identify the Table and Columns
  2. Formulate the query
  3. The data indicates that August 2021 recorded the highest total transaction value at $227,862,744
  4. This peak suggests a potential seasonal or promotional impact, possibly linked to marketing campaigns, sales events, or increased seasonal demand
--- Question 1 ----
--- Q: During the transactions that occurred in 2021, in which month was the total transaction value (after_discount) the largest? Use is_valid = 1 to filter transaction data. ----

SELECT TO_CHAR(order_date, 'MM') AS month, SUM(after_discount) AS total_value
FROM order_detail
WHERE is_valid = 1 AND TO_CHAR(order_date, 'YYYY') = '2021'
GROUP BY month
ORDER BY total_value DESC
LIMIT 1;

2CASE 2: During transactions in 2022, which category generated the most transaction value?

  1. Identify the Table and Columns
  2. Formulate the query
  3. The category “Mobile & Tablets” generated the highest transaction value in 2022.
  4. High consumer demand for mobile devices and tablets as essential tools for work, education, and entertainment will affect the transaction in 2022.
--- Question 2 ---
--- Q: During transactions in 2022, which category generated the highest transaction value? Use is_valid = 1 to filter transaction data. ---

SELECT sku_detail.category, SUM(order_detail.after_discount) AS total_value
FROM order_detail
JOIN sku_detail ON order_detail.sku_id = sku_detail.id
WHERE order_detail.is_valid = 1 AND TO_CHAR(order_detail.order_date, 'YYYY') = '2022'
GROUP BY sku_detail.category
ORDER BY total_value DESC
LIMIT 1;

3CASE 3: Compare the transaction value of each category in 2021 with 2022. State which categories have increased, and which categories have decreased in transaction value from 2021 to 2022.

  1. Identify the Table and Columns
  2. Formulate the query

Significant Growth in Electronics:

The Electronics category demonstrated the highest increase in transaction value from 2021 to 2022, indicating a strong market demand for electronic products. This could be attributed to technological advancements, increased remote work, or enhanced marketing strategies targeting tech-savvy consumers.

Stable Performance in Beauty Products:

The Beauty category showed steady performance with minimal change in transaction value between 2021 and 2022. This suggests a consistent consumer interest and stable market conditions in this category, reflecting ongoing demand for home improvement and kitchen products.

Decline in Books Products:

The Books category experienced a significant decrease in transaction value from 2021 to 2022. This decline may be due to shifting consumer preferences, economic factors, or increased competition. It highlights a need for strategic adjustments in marketing or product offerings to regain market share in this segment.

-- Question 3 ---
--- Q: Compare the transaction values of each category in 2021 with 2022. Mention which categories experienced an increase and which experienced a decrease in transaction value from 2021 to 2022. Use is_valid = 1 to filter transaction data. ---

WITH sales_2021 AS (
SELECT sku_detail.category, SUM(order_detail.after_discount) AS total_value_2021
FROM order_detail
JOIN sku_detail ON order_detail.sku_id = sku_detail.id
WHERE order_detail.is_valid = 1 AND EXTRACT(YEAR FROM order_detail.order_date) = 2021
GROUP BY sku_detail.category
),
sales_2022 AS (
SELECT sku_detail.category, SUM(order_detail.after_discount) AS total_value_2022
FROM order_detail
JOIN sku_detail ON order_detail.sku_id = sku_detail.id
WHERE order_detail.is_valid = 1 AND EXTRACT(YEAR FROM order_detail.order_date) = 2022
GROUP BY sku_detail.category
),
percentage_difference AS (
SELECT
COALESCE(sales_2021.category, sales_2022.category) AS category,
COALESCE(sales_2021.total_value_2021, 0) AS total_value_2021,
COALESCE(sales_2022.total_value_2022, 0) AS total_value_2022,
CASE
WHEN COALESCE(sales_2021.total_value_2021, 0) = 0 THEN NULL
ELSE ((COALESCE(sales_2022.total_value_2022, 0) - COALESCE(sales_2021.total_value_2021, 0)) / COALESCE(sales_2021.total_value_2021, 0)) * 100
END AS percentage_change,
CASE
WHEN (COALESCE(sales_2022.total_value_2022, 0) - COALESCE(sales_2021.total_value_2021, 0)) > 0 THEN 'Increase'
WHEN (COALESCE(sales_2022.total_value_2022, 0) - COALESCE(sales_2021.total_value_2021, 0)) < 0 THEN 'Decrease'
ELSE 'No Change'
END AS change_trend
FROM sales_2021
FULL OUTER JOIN sales_2022 ON sales_2021.category = sales_2022.category
)
SELECT
category,
total_value_2021,
total_value_2022,
percentage_change,
change_trend
FROM percentage_difference
ORDER BY percentage_change DESC;

Top Performers

Top Performers (Percentage Change > 50%)

  • Mobile & Tablets
  • Men Fashion
  • Entertainment
--- Top Performers (Percentage Change > 50%) sql ---

WITH sales_2021 AS (
SELECT sku_detail.category, SUM(order_detail.after_discount) AS total_value_2021
FROM order_detail
JOIN sku_detail ON order_detail.sku_id = sku_detail.id
WHERE order_detail.is_valid = 1 AND EXTRACT(YEAR FROM order_detail.order_date) = 2021
GROUP BY sku_detail.category
),
sales_2022 AS (
SELECT sku_detail.category, SUM(order_detail.after_discount) AS total_value_2022
FROM order_detail
JOIN sku_detail ON order_detail.sku_id = sku_detail.id
WHERE order_detail.is_valid = 1 AND EXTRACT(YEAR FROM order_detail.order_date) = 2022
GROUP BY sku_detail.category
),
percentage_difference AS (
SELECT
COALESCE(sales_2021.category, sales_2022.category) AS category,
COALESCE(sales_2021.total_value_2021, 0) AS total_value_2021,
COALESCE(sales_2022.total_value_2022, 0) AS total_value_2022,
CASE
WHEN COALESCE(sales_2021.total_value_2021, 0) = 0 THEN NULL
ELSE ((COALESCE(sales_2022.total_value_2022, 0) - COALESCE(sales_2021.total_value_2021, 0)) / COALESCE(sales_2021.total_value_2021, 0)) * 100
END AS percentage_change,
CASE
WHEN (COALESCE(sales_2022.total_value_2022, 0) - COALESCE(sales_2021.total_value_2021, 0)) > 0 THEN 'Increase'
WHEN (COALESCE(sales_2022.total_value_2022, 0) - COALESCE(sales_2021.total_value_2021, 0)) < 0 THEN 'Decrease'
ELSE 'No Change'
END AS change_trend
FROM sales_2021
FULL OUTER JOIN sales_2022 ON sales_2021.category = sales_2022.category
)
SELECT
category,
total_value_2021,
total_value_2022,
percentage_change,
change_trend
FROM percentage_difference
WHERE percentage_change > 50
ORDER BY percentage_change DESC
LIMIT 3;

Steady Performance

Steady Performers (Percentage Change Between -10% and 10%)

  • Kids & Baby
  • Beauty & Grooming
--- Steady Performers (Percentage Change Between -10% and 10%) ---

WITH sales_2021 AS (
SELECT sku_detail.category, SUM(order_detail.after_discount) AS total_value_2021
FROM order_detail
JOIN sku_detail ON order_detail.sku_id = sku_detail.id
WHERE order_detail.is_valid = 1 AND EXTRACT(YEAR FROM order_detail.order_date) = 2021
GROUP BY sku_detail.category
),
sales_2022 AS (
SELECT sku_detail.category, SUM(order_detail.after_discount) AS total_value_2022
FROM order_detail
JOIN sku_detail ON order_detail.sku_id = sku_detail.id
WHERE order_detail.is_valid = 1 AND EXTRACT(YEAR FROM order_detail.order_date) = 2022
GROUP BY sku_detail.category
),
percentage_difference AS (
SELECT
COALESCE(sales_2021.category, sales_2022.category) AS category,
COALESCE(sales_2021.total_value_2021, 0) AS total_value_2021,
COALESCE(sales_2022.total_value_2022, 0) AS total_value_2022,
CASE
WHEN COALESCE(sales_2021.total_value_2021, 0) = 0 THEN NULL
ELSE ((COALESCE(sales_2022.total_value_2022, 0) - COALESCE(sales_2021.total_value_2021, 0)) / COALESCE(sales_2021.total_value_2021, 0)) * 100
END AS percentage_change,
CASE
WHEN (COALESCE(sales_2022.total_value_2022, 0) - COALESCE(sales_2021.total_value_2021, 0)) > 0 THEN 'Increase'
WHEN (COALESCE(sales_2022.total_value_2022, 0) - COALESCE(sales_2021.total_value_2021, 0)) < 0 THEN 'Decrease'
ELSE 'No Change'
END AS change_trend
FROM sales_2021
FULL OUTER JOIN sales_2022 ON sales_2021.category = sales_2022.category
)
SELECT
category,
total_value_2021,
total_value_2022,
percentage_change,
change_trend
FROM percentage_difference
WHERE percentage_change BETWEEN -10 AND 10
ORDER BY percentage_change DESC
LIMIT 3;

Declining Categories

Declining Categories (Percentage Change < -30%)

  • Others
  • Books
--- Declining Categories (Percentage Change < -50%) ---

WITH sales_2021 AS (
SELECT sku_detail.category, SUM(order_detail.after_discount) AS total_value_2021
FROM order_detail
JOIN sku_detail ON order_detail.sku_id = sku_detail.id
WHERE order_detail.is_valid = 1 AND EXTRACT(YEAR FROM order_detail.order_date) = 2021
GROUP BY sku_detail.category
),
sales_2022 AS (
SELECT sku_detail.category, SUM(order_detail.after_discount) AS total_value_2022
FROM order_detail
JOIN sku_detail ON order_detail.sku_id = sku_detail.id
WHERE order_detail.is_valid = 1 AND EXTRACT(YEAR FROM order_detail.order_date) = 2022
GROUP BY sku_detail.category
),
percentage_difference AS (
SELECT
COALESCE(sales_2021.category, sales_2022.category) AS category,
COALESCE(sales_2021.total_value_2021, 0) AS total_value_2021,
COALESCE(sales_2022.total_value_2022, 0) AS total_value_2022,
CASE
WHEN COALESCE(sales_2021.total_value_2021, 0) = 0 THEN NULL
ELSE ((COALESCE(sales_2022.total_value_2022, 0) - COALESCE(sales_2021.total_value_2021, 0)) / COALESCE(sales_2021.total_value_2021, 0)) * 100
END AS percentage_change,
CASE
WHEN (COALESCE(sales_2022.total_value_2022, 0) - COALESCE(sales_2021.total_value_2021, 0)) > 0 THEN 'Increase'
WHEN (COALESCE(sales_2022.total_value_2022, 0) - COALESCE(sales_2021.total_value_2021, 0)) < 0 THEN 'Decrease'
ELSE 'No Change'
END AS change_trend
FROM sales_2021
FULL OUTER JOIN sales_2022 ON sales_2021.category = sales_2022.category
)
SELECT
category,
total_value_2021,
total_value_2022,
percentage_change,
change_trend
FROM percentage_difference
WHERE percentage_change < -30
ORDER BY percentage_change ASC
LIMIT 3;

Moderate Growth

Moderate Growth (Percentage Change Between 10% and 50%)

  • Appliances
  • Computing
  • Soghaat
--- Moderate Growth (Percentage Change Between 10% and 50%) ---

WITH sales_2021 AS (
SELECT sku_detail.category, SUM(order_detail.after_discount) AS total_value_2021
FROM order_detail
JOIN sku_detail ON order_detail.sku_id = sku_detail.id
WHERE order_detail.is_valid = 1 AND EXTRACT(YEAR FROM order_detail.order_date) = 2021
GROUP BY sku_detail.category
),
sales_2022 AS (
SELECT sku_detail.category, SUM(order_detail.after_discount) AS total_value_2022
FROM order_detail
JOIN sku_detail ON order_detail.sku_id = sku_detail.id
WHERE order_detail.is_valid = 1 AND EXTRACT(YEAR FROM order_detail.order_date) = 2022
GROUP BY sku_detail.category
),
percentage_difference AS (
SELECT
COALESCE(sales_2021.category, sales_2022.category) AS category,
COALESCE(sales_2021.total_value_2021, 0) AS total_value_2021,
COALESCE(sales_2022.total_value_2022, 0) AS total_value_2022,
CASE
WHEN COALESCE(sales_2021.total_value_2021, 0) = 0 THEN NULL
ELSE ((COALESCE(sales_2022.total_value_2022, 0) - COALESCE(sales_2021.total_value_2021, 0)) / COALESCE(sales_2021.total_value_2021, 0)) * 100
END AS percentage_change,
CASE
WHEN (COALESCE(sales_2022.total_value_2022, 0) - COALESCE(sales_2021.total_value_2021, 0)) > 0 THEN 'Increase'
WHEN (COALESCE(sales_2022.total_value_2022, 0) - COALESCE(sales_2021.total_value_2021, 0)) < 0 THEN 'Decrease'
ELSE 'No Change'
END AS change_trend
FROM sales_2021
FULL OUTER JOIN sales_2022 ON sales_2021.category = sales_2022.category
)
SELECT
category,
total_value_2021,
total_value_2022,
percentage_change,
change_trend
FROM percentage_difference
WHERE percentage_change BETWEEN 10 AND 50
ORDER BY percentage_change DESC
LIMIT 3;

4 CASE 4: Show the top 5 most popular payment methods used during 2022 (based on total unique orders).

  1. Identify the Table and Columns
  2. Formulate the query
  3. Cash on Delivery was the most popular payment method in 2022, with a significant lead over all other methods.
  4. This suggests a strong preference among customers for paying in cash upon receiving their orders.
  5. The significant disparity between the first and subsequent payment methods suggests a customer base that overwhelmingly prefers to pay upon delivery rather than using digital payment or credit-based payment systems
--- Question 4 ---
--- Q: Display the top 5 most popular payment methods used during 2022 (based on total unique orders). Use is_valid = 1 to filter transaction data. ---

SELECT payment_detail.payment_method, COUNT(DISTINCT order_detail.id) AS total_orders
FROM order_detail
JOIN payment_detail ON order_detail.payment_id = payment_detail.id
WHERE order_detail.is_valid = 1 AND TO_CHAR(order_detail.order_date, 'YYYY') = '2022'
GROUP BY payment_detail.payment_method
ORDER BY total_orders DESC
LIMIT 5;

5 CASE 5: Sort these 5 products by their transaction value. Samsung Apple Sony Huawei Lenovo

  1. Identify the Table and Columns
  2. Formulate the query
  3. The transaction values highlight the market performance and sales contribution of each brand within the Mobile & Tablets sector for the given year.
  4. Samsung emerges with the highest transaction value, indicating strong consumer demand for higher-priced products relative to Apple and Huawei in this category.
  5. The higher transaction values for Samsung and Apple suggest a potentially stronger consumer preference or market presence compared to Huawei in the Mobile & Tablets segment during 2022.
--- Question 5 ---
--- Q: Rank these 5 products based on their transaction values.

--- Samsung ---
--- Apple ---
--- Sony ---
--- Huawei ---
--- Lenovo ---

WITH a AS (
SELECT
CASE
WHEN LOWER (sd.sku_name) LIKE '%samsung%' THEN 'Samsung'
WHEN LOWER (sd.sku_name) LIKE '%iphone%' OR LOWER (sd.sku_name) LIKE '%ipad%'
OR LOWER (sd.sku_name) LIKE '%macbook%' OR LOWER (sd.sku_name) LIKE '%apple%'
THEN 'Apple'
WHEN LOWER (sd.sku_name) LIKE '%sony%' THEN 'Sony'
WHEN LOWER (sd.sku_name) LIKE '%huawei%' THEN 'Huawei'
WHEN LOWER (sd.sku_name) LIKE '%lenovo%' THEN 'Lenovo'
END AS product_name,
ROUND(SUM(od.after_discount)) AS total_sales
FROM order_detail AS od
LEFT JOIN sku_detail AS sd
ON od.sku_id = sd.id
WHERE
is_valid = 1
GROUP BY
1
ORDER BY 2 DESC
)
SELECT
a.*
FROM a
WHERE product_name notnull

--

--