Employee Retention Analysis
Data audit and exploratory analysis of HR survey data using Python to evaluate employee satisfaction, retention patterns, and data quality limitations.
Python · Data Cleaning · EDA · Visualization
Key Insights
- Data quality issues significantly limited the reliability of workforce analysis — roughly 75% of columns required cleaning, restructuring, or removal before meaningful analysis could begin.
- Attrition patterns showed weak direct correlation with satisfaction metrics, suggesting that turnover signals may be obscured by inconsistent survey scales and incomplete records.
- Department structure and salary distribution appeared more strongly associated with attrition patterns than individual satisfaction indicators.
Project Overview
This project examined an HR dataset to investigate patterns related to employee attrition and retention across roles and departments. The work focused on auditing dataset quality, cleaning inconsistent survey data, and analyzing key workforce variables to evaluate whether the available data could reliably explain turnover patterns.
The analysis combined extensive exploratory data analysis, data cleaning, and statistical visualization to explore relationships between attrition, salary distribution, department structure, and employee satisfaction indicators.
What I Worked On
- Conducted extensive exploratory data analysis to audit the quality of HR and survey datasets before performing workforce analysis.
- Identified major data quality issues including missing values, inconsistent survey scales, duplicated categories, and incoherent records.
- Performed large-scale data cleaning including restructuring or removing low-value columns, resolving salary outliers, and normalizing categorical variables.
- Implemented targeted imputation strategies when appropriate and inferred missing values using related variables.
- Investigated inconsistencies in survey scoring systems (e.g., satisfaction scales ranging from 0–5 vs 1–4) to ensure correct interpretation.
- Developed visual analyses using Matplotlib and Seaborn to explore relationships between attrition, salary levels, departments, and satisfaction metrics.
- Structured the project as an HR data audit evaluating whether available data could reliably explain employee retention patterns.