This project examines the impact of marketing investments, particularly influencer marketing, on sales performance using machine learning and statistical modeling in Python. The analysis explores relationships between TV, radio, and social media advertising, influencer tiers, and sales revenue. The goal is to provide data-driven recommendations for optimizing marketing budgets. The project leverages Python libraries such as pandas, NumPy, statsmodels, scikit-learn, and Seaborn for data analysis, modeling, and visualization.
The Influencer Impact and Marketing Sales Modeling project aims to:
- Analyze Marketing Investments: Evaluate how TV, radio, and social media spending affect sales performance.
- Quantify Influencer Impact: Assess the effectiveness of different influencer tiers (Mega, Macro, Micro, Nano) on revenue generation.
- Build Predictive Models: Construct regression models to determine the contribution of each marketing channel to sales.
- Identify Market Segments: Use K-means clustering to uncover natural groupings in campaigns based on spending patterns.
- Deliver Actionable Insights: Provide marketing budget allocation strategies to maximize return on investment (ROI).
The dataset consists of historical marketing and sales data from various promotional campaigns. Key features include:
- TV: Investment level in TV advertisements (Low, Medium, High).
- Radio: Budget allocated to radio promotions (in millions of dollars).
- Social Media: Budget allocated to social media marketing (in millions of dollars).
- Influencer: Type of influencer used in the campaign (Micro, Macro, Mega, Nano).
- Sales: Total revenue generated from the marketing campaign (in millions of dollars).
The data processing and analysis steps include:
- Data Cleaning: Handled missing values, encoded categorical variables (TV and Influencer), and standardized numerical features for clustering.
- Exploratory Data Analysis (EDA): Summarized key trends and relationships in the dataset, visualized distributions of marketing budgets and sales, and analyzed mean sales across different TV promotion levels and influencer tiers.
- Regression Modeling:
- Simple Linear Regression: Modeled the relationship between individual marketing channels and sales.
- Multiple Linear Regression: Predicted sales using TV, radio, and influencer types as predictors.
- Statistical Testing:
- Validated regression assumptions, including linearity, normality, homoscedasticity, and multicollinearity.
- Conducted Tukey's HSD post hoc test to compare sales across TV promotion levels.
- K-means Clustering: Determined the optimal number of clusters using the elbow method and silhouette scores, segmented marketing campaigns based on budget allocation patterns, and analyzed cluster characteristics and their impact on sales performance.
- Visualization: Created regression plots to illustrate relationships between marketing budgets and sales, generated residual plots to check model assumptions, and visualized cluster distributions and sales performance across marketing segments.
- TV advertising has the strongest impact on sales, with high TV budgets leading to significantly higher revenue.
- Radio promotions have a positive and statistically significant relationship with sales, though the impact per dollar spent is lower than TV.
- The multiple regression model (R² = 0.900) confirms that TV and radio budgets are key predictors of sales.
- Identified six distinct clusters based on budget allocation patterns.
- The highest-performing cluster includes campaigns with high TV budgets and Mega/Macro influencers, combined with moderate to high radio and social media spending.
- Campaigns with low TV budgets and Micro/Nano influencers consistently showed lower sales performance.
- Tukey's HSD test confirmed significant differences in sales across TV promotion levels (p < 0.001).
- All regression models met key statistical assumptions, ensuring reliable results.
- Combined regression modeling and clustering for both predictive insights and natural market segmentation.
- Identified optimal marketing strategies (High TV + Mega/Macro influencers + Radio) for maximizing sales.
- Applied Tukey's HSD test to validate significant sales differences across TV spending levels.
- Used multiple methods (elbow, silhouette) to determine the optimal number of clusters.
- Created visualizations that effectively communicate relationships between marketing investments and sales.
- Expand Data Incorporation: Include variables like seasonality, product categories, and geographic segmentation for deeper analysis.
- Test Interaction Effects: Examine potential synergies between TV, radio, and social media advertising.
- Refine Clustering Techniques: Explore hierarchical clustering or K-means for ordinal data to enhance segmentation accuracy.
- Develop Predictive Models: Implement machine learning techniques like random forests or gradient boosting to forecast sales trends.
- Optimize Budget Allocation: Build optimization models to recommend the best allocation strategy across marketing channels.
- Track Long-Term Performance: Monitor how changes in budget distribution impact sales over time.
This analysis provides data-driven insights to optimize marketing investments, helping businesses allocate resources effectively and maximize ROI. The findings guide strategic decisions on channel selection, influencer partnerships, and budget distribution to drive sustained revenue growth.