📖 Problem Statement

The accounting department of a company that sells motorcycle parts operates three warehouses in a large metropolitan area. The company want’s their sales data analysed and want to capture sales by payment method. They also want to know the average unit price for each product line.

Objectives

  1. What are the total sales for each payment method?
  2. What is the average unit price for each product line?
  3. Create plots to visualize findings for questions 1 and 2.
  4. Investigate further (e.g., average purchase value by client type, total purchase value by product line, etc.)
  5. Summarize the findings.

💾 Data Summary

#load libraries
library(tidyverse)
library(lubridate)

#load data
df <- read_csv("C:/Users/Adejumo/Downloads/sales_data.csv")

head(df)
## # A tibble: 6 x 8
##   date       warehouse client_type product_line        quantity unit_price total
##   <date>     <chr>     <chr>       <chr>                  <dbl>      <dbl> <dbl>
## 1 2021-06-01 Central   Retail      Miscellaneous              8       16.8  135.
## 2 2021-06-01 North     Retail      Breaking system            9       19.3  174.
## 3 2021-06-01 North     Retail      Suspension & tract~        8       32.9  263.
## 4 2021-06-01 North     Wholesale   Frame & body              16       37.8  605.
## 5 2021-06-01 Central   Retail      Engine                     2       60.5  121.
## 6 2021-06-01 North     Wholesale   Suspension & tract~       40       37.4 1495.
## # ... with 1 more variable: payment <chr>
skimr::skim(df)
Table 1: Data summary
Name df
Number of rows 1000
Number of columns 8
_______________________
Column type frequency:
character 4
Date 1
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
warehouse 0 1 4 7 0 3 0
client_type 0 1 6 9 0 2 0
product_line 0 1 6 21 0 6 0
payment 0 1 4 11 0 3 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
date 0 1 2021-06-01 2021-08-28 2021-07-14 89

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
quantity 0 1 9.39 9.66 1.00 4.00 6.50 10.00 40.00 ▇▂▁▁▁
unit_price 0 1 30.32 12.26 10.03 21.09 28.57 37.92 66.62 ▆▇▅▂▁
total 0 1 289.11 345.23 10.35 93.69 178.36 321.69 2546.33 ▇▁▁▁▁

The sales data has the following 1000 items and 8 fields:
- “date” - The date, from June to August 2021.
- “warehouse” - The company operates three warehouses: North, Central, and West.
- “client_type” - There are two types of customers: Retail and Wholesale.
- “product_line” - Type of products purchased.
- “quantity” - How many items were purchased.
- “unit_price” - Price per item sold.
- “total” - Total sale = quantity * unit_price.
- “payment” - How the client paid: Cash, Credit card, Transfer.
There are no missing values in the data.

Exploratory Data Analysis

Total sale for each payment method

df %>% 
  group_by(payment) %>% 
  summarize(total = sum(total)) 
## # A tibble: 3 x 2
##   payment       total
##   <chr>         <dbl>
## 1 Cash         19199.
## 2 Credit card 110272.
## 3 Transfer    159642.

Let’s visualize the results and see the percentage of total sales in each payment method.

df %>% 
  group_by(payment) %>% 
  summarize(total = sum(total)) %>% 
  mutate(perc_sales = (total/sum(total))*100) %>% 
  ggplot(aes(x = reorder(payment, perc_sales), y = perc_sales, fill = payment)) +
  geom_col() +
  geom_text(aes(label = scales::comma(perc_sales))) +
  xlab("Payment Method") +
  ylab("Percentage of total Sales")

55% of total sales were paid through bank transfer, 38% through credit card, just 7% were paid through cash. Most customers prefer to use bank transfer and credit cards for payment.

Average unit price for each product line

The average unit price per product line is the calculated by dividing the total number of sales for each product line by the total number of units sold.

df %>% 
  group_by(product_line) %>% 
  summarize(avg_unit_price = mean(unit_price)) %>% 
  arrange(desc(avg_unit_price)) %>% 
  ggplot(aes(x = reorder(product_line, avg_unit_price), 
             y = avg_unit_price, fill = product_line)) +
  geom_col() +
  geom_text(aes(label = scales::comma(avg_unit_price))) +
  coord_flip() +
  xlab("Product Line") +
  ylab("Average Unit Price")

Products under the Engine product line are more expensive with an average unit price of over 60 dollars, Breaking system having the least expensive products with an average unit price of 17.7 dollars.

Average purchase value by client type

The average purchase value(APV) is giving by the total number of sales divided by the number of orders in each client type.

df %>% 
  group_by(client_type) %>% 
  summarize(avg_pur_value = sum(total)/sum(quantity)) 
## # A tibble: 2 x 2
##   client_type avg_pur_value
##   <chr>               <dbl>
## 1 Retail               30.7
## 2 Wholesale            30.8

Clients who purchase on wholesale have the highest average purchase value of 30.82 dollars while clients in retail have a purchase value of 30.72 dollars. Seems both clients have similar demands but the demand of clients on wholesale is more high.

Total purchase value by product line.

This is the total sales made in each product line

df %>% 
  group_by(product_line) %>% 
  summarize(total_sales = sum(total)) %>% 
  mutate(perc_sales = total_sales/sum(total_sales)*100) %>% 
  ggplot(aes(x = reorder(product_line, perc_sales), 
             y = perc_sales, fill = product_line)) +
  geom_col() + 
  coord_flip() +
  geom_text(aes(label = scales::comma(perc_sales))) +
  xlab("Product Line") +
  ylab("Percentage of total sales")

Most sales were made from products in Suspension and traction which constitute 25.25% of total sales. Products in Engine product line and Miscellaneous have the lowest amount of sales with 13.12% and 9.4% respectively.

Month with the highest total sales

df %>%
  group_by(month = month(date, label = T)) %>% 
  summarize(total = sum(total)) %>% 
  arrange(desc(total))
## # A tibble: 3 x 2
##   month   total
##   <ord>   <dbl>
## 1 Aug   100245.
## 2 Jun    95320.
## 3 Jul    93548.

Though sales dropped in the month of July, it sky rocketed in the month of August.

Warehouse with the highest total sales

df %>% 
  group_by(warehouse) %>% 
  summarize(total_sales = sum(total)) %>% 
  mutate(perc_sales = (total_sales/sum(total_sales))*100) %>% 
  ggplot(aes(x = reorder(warehouse, perc_sales),
             y = perc_sales, fill = warehouse)) +
  geom_col() +
  xlab("Warehouse") +
  ylab("Percentage of total sales") +
  geom_text(aes(label = scales::comma(perc_sales)))

Warehouse in Central of the metropolitan area have more sales than warehouses in other regions with 49 % of total sales made, which is almost close to half of the total sales. The warehouse in the western region have the lowest amount of sales with 16% of total sales.

Conclusion

The company compared to previous months have made a lot of sales with majority from the warehouse in the central region, this is a good sign indicating increase in sales. Most clients prefer to pay through bank transfer or credit card and this is due to the fact that most of them are wholesale and retail and they purchase in bulk which involves huge amount of money, it is more secure than carrying cash. As expected products in Engine are much more expensive than other parts of a motorcycle. Products under suspension and traction appeared to be in demand than other products line.
From this report, the company is doing well in terms of sales but will need to improve sales in the western region, low sales in that area might be due to low demand compared to other regions or high competition which will be advisable to look into.

If you find the insights in this analysis interesting, please upvote.