Vaidik Sirsate - April 25, 2023
Super Store Sales Analysis
This is a detailed article about my Exploratory Data Analysis of Super Store Sales Dataset available on Kaggle. I will be diving into the full process of data cleaning, analyzing and visualizing the data, along with my final suggestions and summary of the data.
Table of Content :
Information about the data
Questions to be Answered
Data Cleaning
Analyzing Data
Finished Project Dashboard
Insights and Recommendations
Information about the Data:
The dataset contains information of sales transaction of Super Store recorded in a period between January, 2015 and December, 2018. The dataset contains 18 variables describing 9800 observations of the transaction. I am performing Exploratory Data Analysis on this dataset to uncover some interesting insights.
Here there are some questions for analysis:
Which region is driving more sales?
What is the sales revenue generated by each category?
What are the sales figures for each city?
How much each product segment is contributing to the company's revenue?
Who are the most valuable customers?
What are the top 10 cities by sales?
What is the monthly sales trend?
What are the top sub categories by Sales?
Process:
For this project, I am using Microsoft Excel only to perform all the operations including data Cleaning, Analysis, and Visual presentation through Dashboard Creation.
Data Cleaning:
Data Cleaning is a process of ensuring data integrity, which means we have to make sure that data is complete, correct, and consistent for the analysis. This dataset is almost clean but as a standard process, I performed all the checks.
Duplicate Data: It has no duplicate records.
Consistency: In this I ensure appropriate column names, data types of all the fields, and consistent formatting.
Missing Values: Only Postal Code column has a missing value. And, on further inspection I found that data is missing for only one location which is, Vermont, Burlington, United States, So, we can find it in Google and update it.
So, I searched it in Google , and it is 05401 and updated it.
Now, we have to update it in excel but there is one problem, it’s starting with zero and in excel it doesn’t show zero in the beginning of the number of data types. So, for it we have set a custom number format as five times zero to cover the length of the postal code. As 00000. And, here is the final result.
Extra Spaces in text: I ensure no leading and trailing spaces in text data using power query.
Outliers: In this dataset we have only one numeric column "Sales", Which has some outliers starting from rom $10,000/- .
This is the dummy data, we can not cross check the data points. So, Now, we to deal with outlier we have couple of options.
Delete outliers: Delete all the rows having outliers.
Median imputation: It's numerical data and highly skewed so median imputation is more appropriate.
Winsorize the data points: Replacing extreme value with less extreme values as 95 and 5 percentile.
Since the data is dummy and my Excel is hanging again and again so, I am analyzing as it is, not performing any of the above mentioned operations.
Now, data is ready for Analysis phase!
Analyzing Data:
For this purpose I will use pivot table and pivot chart, powerful tools available in MS Excel.
- Which region is driving more sales?
According to the sales data, the West region emerged as the top performer, generating an impressive revenue of roughly $710,000. The East region came in second place with sales of approximately $670,000, while the Central region secured the third position with sales of $493,000.
In contrast, the South region had the lowest sales figures, with revenue of approximately $389,000, which is nearly half of the sales generated by the West region.
2. What is the sales revenue generated by each category?
The revenue distribution across the Furniture, Office Supplies, and Technology categories indicates a balanced contribution to the total revenue, with Furniture accounting for 32%, Office Supplies for 31%, and Technology taking the lead with the highest contribution at 37%.
3. What are the sales figures for each State?
The Sales by State map provides a clear visual representation of the company's extensive coverage across almost all the states in the United States. Notably, California and Texas stand out with darker shades, indicating significant sales figures of around $446,000 and $169,000, respectively.
4. How much each product segment is contributing to the company's revenue?
In terms of revenue generated by the different customer segments, the Consumer Segment takes the lead with an impressive contribution of around $1,148k. The Corporate Segment comes in second with revenues of $688k, while the Home Offices segment trails behind with the lowest revenue generation of approximately $425k.
5. Who are the most valuable customers?
The company's top 10 customers have generated sales figures ranging from $12k to $25k.
6. What are the top 10 cities by Revenue?
The bar plot provides a visual representation of the top 10 cities generating the highest revenue for the company. New York City secures the top position, generating around $252k, while Detroit stands at the 10th position, contributing the revenue of approximately $42k.
7. What is the monthly sales trend?
The time series graph displays the quarterly sales figures for the years 2015 to 2018, depicting a clear trend in sales data across all years. The sales trend follows a pattern of being lowest in the first quarter and gradually increasing to reach its peak in the fourth quarter.
8. What are the top sub categories by Sales?
The top 5 subcategories generating the highest sales revenue for the company are Phone, Chair, Storage, Tables, and Binders. The Phone subcategory takes the lead with approximately $328k in sales, followed closely by the Chair subcategory with $323k. The Storage subcategory secures the third position with sales of around $219k, while Tables and Binders round off the top 5 with sales figures of approximately $203k and $200k respectively.
Dashboard:
The Sales Performance Dashboard presents a comprehensive overview of the company's sales performance through a series of graphs and charts. The dashboard provides visual representations for quick and easy analysis of sales data.
Insights and Recommendations:
Below are general insights and recommendations from the analysis:
1). Allocate more resources to the West and East regions, best performing States and Cities: It might be beneficial to allocate more resources such as marketing budgets, sales personnel, and inventory to these regions to further boost their sales and profitability. Analyzing customer preferences, identifying popular products, and ensuring that the company has sufficient inventory to meet demand and maintain a good relationship with customers and this can help can help identify common trends or preferences that could be replicated in other locations.
And, analyze the reasons for low sales in the South region could help identify factors such as customer preferences, competition, and market saturation that might be affecting the region's low sales figures.
2). Increase marketing efforts for top-selling subcategories: Since the Phone and Chair subcategories generate the highest sales revenue, the company could increase its marketing efforts for these subcategories to capture even more market share. This could involve targeted advertising campaigns or offering promotions to incentivize customers to purchase these products. Consider expanding product offerings for top-selling subcategories: If the Phone and Chair subcategories continue to be top-sellers, the company could consider expanding its product offerings for these categories to meet customer demand. This could involve introducing new product lines or variations of existing products.
Identify reasons for lower sales in other subcategories: This could involve analyzing customer feedback or market research to identify areas for improvement or new product opportunities.
3). Focus on maximizing revenue from the Consumer Segment: The Consumer Segment generates the highest revenue, indicating that there might be a higher demand for products in this segment. Therefore, it might be beneficial to focus on maximizing revenue from this segment by analyzing customer preferences, market trends, and sales data to identify the most popular products within this segment.
Identify ways to increase revenue from the Corporate and Home Offices Segment: there may be opportunities for improvement such as sales strategies and pricing, Analyzing customer needs and preferences within this segment and identifying gaps in the product offerings can help improve revenue generation from this segment can help identify ways to increase revenue from this segment.
4). Leveraging Seasonal Trend: The trend of sales being lowest in the first quarter and gradually increasing to reach its peak in the fourth quarter indicates that the company could adjust its marketing and sales strategies accordingly. For example, the company could plan marketing campaigns and promotions to drive sales during the first quarter when sales are typically low, and focus on maintaining inventory levels during the fourth quarter when sales are typically high.
The company could introduce new products or variations of existing products that are popular during the peak season to capture the increased demand. Historical sales data can be used to forecast future sales and inform decisions around inventory management, production schedules, and staffing levels. By analyzing sales trends and forecasting future demand, the company can optimize its operations and minimize the risk of stockouts or overstocking.
5). Develop strategies to retain top customers: It might be beneficial to develop strategies to retain these customers, such as providing personalized offers, loyalty programs, or excellent customer service. Identify factors contributing to high sales from top customers such as customer characteristics, purchase behavior's, the company could use this insight to tailor marketing and sales strategies to get similar potential customers.
Thank you!