Investigating Model Vehicle Sales Data

Executive Summary

Our client is a wholesaler that sells scale-model vehicles to businesses. Using both SQL and Python techniques, I investigated several aspects of the company's operations. Important findings include:


I used these findings to develop several recommendations for the company moving forward:


Introduction

In this SQL-based (SQLite) project, I investigate data provided by a wholesaler that purchases scale-model vehicles (motorcycles, airplanes, vehicles, and trains) from manufacturers and sells them to retailers.

There are about 20 major companies operating within this market, including Mattel Inc., Amalgam Collection, Hamleys of London Ltd., and Welly Die Casting Factory Ltd. Together, these companies and others represent a scale-model vehicle market estimated at USD 4.2B in 2023 by Technavio. The consumer market is primarily in the US, EU, and Japan, and is expected to grow by about 6% each year through 2028.


Exploring the Database

The database consists of 8 tables, as shown in figure 1. These tables are of varying size (table 1), with the largest table containing the order details (5 attributes, 2,996 rows) and the smallest table containing information about the product lines (4 attributes, 7 rows).

Figure 1: Database schema

Table 1: Database table shapes

Individual Table Descriptions

Each table serves the general purpose described below:


Data Analysis

Identifying Low-Stock Items

One possible way to identify items that are at risk of running out of stock is to compare the quantity ordered (regardless of delivery status) with the quantity on-hand. To accomplish this with the tables we are given, we need to bring the stock information (in the products table) together with the line-item information (in the orderdetails table). Since we are interested in the individual products, this information can be grouped by product.

The basic query we can use to accomplish this task is:

SELECT p.productcode,

       p.productName,

       quantityInStock AS num_stocked,

       SUM(quantityordered) AS num_ordered,

       ROUND((SUM(quantityordered) * 1.0) / quantityinstock, 2) AS stock_fraction,

       buyprice AS buy_price,

       priceeach AS sell_price,

       priceeach * SUM(quantityordered) AS revenue,

       (priceeach - buyprice) * SUM(quantityordered) AS profit,

       ROUND((priceeach - buyprice)/priceeach, 2) AS profit_margin

  FROM products AS p

 INNER JOIN orderdetails AS od

    ON p.productcode = od.productcode

 GROUP BY p.productcode


Using this base code, we can add different filters to find important information on the data. For example, if we order by stock_fraction ascending, it is clear that we are extremely overstocked in 1995 Honda Civics and 2002 Chevy Corvettes. These are items that we do not need to order anytime soon. Several other important observations follow.

Back-Orders

If we filter for stock_fractions greater than 1.0, we pull out all of the products that are on back order. The number of items on backorder can be added to the query by adding the following code to the SELECT statement:

CASE

    WHEN SUM(quantityordered) - quantityinstock <= 0

    THEN 0

    ELSE SUM(quantityordered) - quantityinstock

END AS num_backordered


There are 11 products currently on backorder (table 2):

Table 2: Products currently on backorder

With this information about backordered items, we can identify the following:

High-margin, Low-stock Products

The most important items to keep in stock are those that are both in demand and highly profitable. These items can be identified by filtering for high stock_fraction and high profit_margin. Filtering with 0.6 and 0.5, respecively, we find two very profitable items that are out of stock (table 3).

Table 3: High-margin, low-stock products

Customer Segmentation

Marketing efforts are most effective when they are tailored to a subgroup of the overall customer base. In this case, I perform customer segmentation to identify VIP customers and unengaged customers. By separating these out, the company can plan marketing strategies to best reach each of these groups individually.

In evaluating the customer base and their relative contributions to revenue, I will connect customer and sales data together by joining the products, orderdetails, and orders tables. The basic query will take this form, with additional filters added to perform specific tasks:

SELECT o.customernumber,

       COUNT(*) AS num_orders,

       ROUND(SUM(od.quantityordered * od.priceeach), 2) AS total_revenue,

       ROUND(SUM(od.quantityordered * (od.priceeach - p.buyprice)), 2) AS total_profit

  FROM orders AS o

 INNER JOIN orderdetails AS od

    ON o.ordernumber = od.ordernumber

 INNER JOIN products AS p

    ON p.productcode = od.productcode

 GROUP BY o.customernumber

Identifying VIP Customers

Using the total_revenue and total_profit data, we can easily pull out VIP customers. The distribution of the calculated total_profit and total_revenue show that there is a clear cutoff for VIP customers (figure 3) at around $100,000 of total profit. This exclusive group includes only two customers:

Figure 3: Distribution of customers by total profit

Stratifying Remaining Customers

Removing the two upper outliers from the analysis, we can see additional details within the larger group of customers (figure 4). Specifically, I chose to separate the remaining customers into three groups:

These stratification thresholds were chosen based on the visual structure of figure 4 (a possible improvement to this technique would be to assign customer strata by percentile cutoffs)


Figure 4: VIP-free distribution of customers by total profit

With these groups in mind (VIP, high, middle, low), each of these groups can be addressed in different ways. The VIP's should be catered to individually, perhaps with special events. The low buyers can be targeted with promotions that encouraging bulk purchasing. The middle buyers can be receive typical promotions, while the high buyers should receive preferential attention from associates.

Growth Evaluation

Revenue Growth

All of the orders placed have associated dates, so we can stratify the numbers by date and get an idea of the company's growth over the covered time period. In this case, monthly stratification was most effective, and that was accomplished by parsing the string dates (in YYYY-MM-DD) format using the following code which reformatted the string into a single integer:

CAST(SUBSTR(orderdate, 1, 4) AS INT) * 100 

+ CAST(SUBSTR(orderdate, 6, 2) AS INT) AS year_month


The data was then grouped by year_month and plotted to reveal year-on-year trends (figure 5). This reveals the following trends:


Figure 5: Revenue trends

The high revenue in October and November can likely be attributed to the holiday shopping season. Since there is not a year-on-year increase in holiday spending, this is an area that the company could use marketing techniques to improve.

Customer Growth

In addition to revenue growth, customer growth is another important metric for company growth. To do this, I created a table of customer numbers along with their year and month from the orders table. I took this table and processed it using Pandas to find the growth in unique customer numbers during the observed time (figure 6)

Figure 6: Customer growth trends

Importance of Vehicle's Country of Origin

In the world of motor vehicles, different countries have different stereotypes. German cars are well-designed and high performance. Italian cars are luxurious. Japanese cars are reliable. To evaluate if there was any correlation between the country of origin of a car and the corresponding model's performance at the company. To perform this analysis, I first created a view that contained added researched information about the different car models. This view added information to the dataset about the vehicle's make and country of origin. The entire query can be found at ./code/views/view-make_country.sql, and example code showing three makes and three countries of origin is shown below:

DROP VIEW IF EXISTS make_and_country;


CREATE VIEW make_and_country

AS

SELECT CASE

            WHEN productName LIKE '%Harley%' THEN 'Harley-Davidson'

            WHEN productName LIKE '%Renault%' THEN 'Renault'

            WHEN productName LIKE '%Guzzi%' THEN 'Moto Guzzi'

            ...

            ELSE NULL

       END as make,

       CASE

            WHEN productName LIKE '%Harley%' THEN 'United States'

            WHEN productName LIKE '%Renault%' THEN 'France'

            WHEN productName LIKE '%Guzzi%' THEN 'Italy'

            ...

            ELSE NULL

       END as origin,

       productcode,

       productname

  FROM products


Using this view as a basis for further analysis, I identified that the company carried mostly American vehicles (46 different models), Italian cars have the highest average quantity on hand, and French models have the largest profit margin. In addition to this, I identified that French car models have the highest profit margin, while models of United Kingdom vehicles have the smallest (table a).

Table a: Stock and profit-margin information by country or origin

Slow-moving Products

If we instead sort the table by low stock_fractions, we find many products where less than 10% of our inventory has been sold (table b):

Table b: Slow-moving products