Exploration of Attrition
A project to predict employee turnover or attrition using data science.
DDSAnalytics Exploration of Attrition
Sakava Kiv
2023-04-11
Shiny App
DDSAnalytics is an analytics company that specializes in talent management solutions for Fortune 100 companies. Talent management is defined as the iterative process of developing and retaining employees. It may include workforce planning, employee training programs, identifying high-potential employees and reducing/preventing voluntary employee turnover (attrition). To gain a competitive edge over its competition, DDSAnalytics is planning to leverage data science for talent management. The executive leadership has identified predicting employee turnover as its first application of data science for talent management. Before the business green lights the project, they have tasked your data science team to conduct an analysis of existing employee data. The business is also interested in learning about any job role specific trends that may exist in the data set (e.g., “Data Scientists have the highest job satisfaction”). You can also provide any other interesting trends and observations from your analysis. The analysis should be backed up by robust experimentation and appropriate visualization. Experiments and analysis must be conducted in R. You will also be asked to build a model to predict attrition.
- We will analyze features for reasons behind Attrition of Employees.
- We will Identify the top three factors that lead to employee attrition.
- We will build a model to predict attrition.
Description about the data
Education 1 ‘Below College’ 2 ‘College’ 3 ‘Bachelor’ 4 ‘Master’ 5 ‘Doctor’
EnvironmentSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
JobInvolvement 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
JobSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
PerformanceRating 1 ‘Low’ 2 ‘Good’ 3 ‘Excellent’ 4 ‘Outstanding’
RelationshipSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
WorkLifeBalance 1 ‘Bad’ 2 ‘Good’ 3 ‘Better’ 4 ‘Best’
Get a feel for the data
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.1 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
## Loading required package: lattice
##
##
## Attaching package: 'caret'
##
##
## The following object is masked from 'package:purrr':
##
## lift
## Bucket CreationDate
## 1 firstbucket.sk 2023-03-24T07:16:41.000Z
## Bucket: ddsproject1
##
## $Contents
## Key: Case2PredictionsClassifyEXAMPLE.csv
## LastModified: 2023-04-01T23:26:57.000Z
## ETag: "bd1de75effe9449f7d49a4de5116205a"
## Size (B): 3012
## Owner: 244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4
## Storage class: STANDARD
##
## $Contents
## Key: Case2PredictionsRegressEXAMPLE.csv
## LastModified: 2023-04-01T23:26:56.000Z
## ETag: "a0f1f01c30e2cd00488822ad3c9aa6fe"
## Size (B): 3187
## Owner: 244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4
## Storage class: STANDARD
##
## $Contents
## Key: CaseStudy2-data.csv
## LastModified: 2023-04-01T23:26:59.000Z
## ETag: "d68dd080517407fb3a4f05d91fed27d7"
## Size (B): 138428
## Owner: 244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4
## Storage class: STANDARD
##
## $Contents
## Key: CaseStudy2CompSet No Attrition.csv
## LastModified: 2023-04-01T23:27:00.000Z
## ETag: "6c9d92b8a6fc5fd805ff0a5d4dfddde0"
## Size (B): 47686
## Owner: 244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4
## Storage class: STANDARD
##
## $Contents
## Key: CaseStudy2CompSet No Salary.csv
## LastModified: 2023-04-03T01:43:23.000Z
## ETag: "30f4c83e1ebb33b19fe6b44c377a1fbd"
## Size (B): 46614
## Owner:
## Storage class: STANDARD
##
## $Contents
## Key: CaseStudy2CompSet No Salary.xlsx
## LastModified: 2023-04-01T23:27:02.000Z
## ETag: "bdcb211847739638a631f828a3278339"
## Size (B): 56381
## Owner: 244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4
## Storage class: STANDARD
## ID Age Attrition BusinessTravel DailyRate Department
## 1 1 32 No Travel_Rarely 117 Sales
## 2 2 40 No Travel_Rarely 1308 Research & Development
## 3 3 35 No Travel_Frequently 200 Research & Development
## 4 4 32 No Travel_Rarely 801 Sales
## 5 5 24 No Travel_Frequently 567 Research & Development
## 6 6 27 No Travel_Frequently 294 Research & Development
## DistanceFromHome Education EducationField EmployeeCount EmployeeNumber
## 1 13 4 Life Sciences 1 859
## 2 14 3 Medical 1 1128
## 3 18 2 Life Sciences 1 1412
## 4 1 4 Marketing 1 2016
## 5 2 1 Technical Degree 1 1646
## 6 10 2 Life Sciences 1 733
## EnvironmentSatisfaction Gender HourlyRate JobInvolvement JobLevel
## 1 2 Male 73 3 2
## 2 3 Male 44 2 5
## 3 3 Male 60 3 3
## 4 3 Female 48 3 3
## 5 1 Female 32 3 1
## 6 4 Male 32 3 3
## JobRole JobSatisfaction MaritalStatus MonthlyIncome
## 1 Sales Executive 4 Divorced 4403
## 2 Research Director 3 Single 19626
## 3 Manufacturing Director 4 Single 9362
## 4 Sales Executive 4 Married 10422
## 5 Research Scientist 4 Single 3760
## 6 Manufacturing Director 1 Divorced 8793
## MonthlyRate NumCompaniesWorked Over18 OverTime PercentSalaryHike
## 1 9250 2 Y No 11
## 2 17544 1 Y No 14
## 3 19944 2 Y No 11
## 4 24032 1 Y No 19
## 5 17218 1 Y Yes 13
## 6 4809 1 Y No 21
## PerformanceRating RelationshipSatisfaction StandardHours StockOptionLevel
## 1 3 3 80 1
## 2 3 1 80 0
## 3 3 3 80 0
## 4 3 3 80 2
## 5 3 3 80 0
## 6 4 3 80 2
## TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany
## 1 8 3 2 5
## 2 21 2 4 20
## 3 10 2 3 2
## 4 14 3 3 14
## 5 6 2 3 6
## 6 9 4 2 9
## YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
## 1 2 0 3
## 2 7 4 9
## 3 2 2 2
## 4 10 5 7
## 5 3 1 3
## 6 7 1 7
## [1] 870 36
##
## Welch Two Sample t-test
##
## data: Age by Attrition
## t = 4.1509, df = 184.91, p-value = 5.05e-05
## alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
## 95 percent confidence interval:
## 1.902905 5.350324
## sample estimates:
## mean in group No mean in group Yes
## 37.41233 33.78571
## Bucket CreationDate
## 1 firstbucket.sk 2023-03-24T07:16:41.000Z
## Bucket: ddsproject1
##
## $Contents
## Key: Case2PredictionsClassifyEXAMPLE.csv
## LastModified: 2023-04-01T23:26:57.000Z
## ETag: "bd1de75effe9449f7d49a4de5116205a"
## Size (B): 3012
## Owner: 244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4
## Storage class: STANDARD
##
## $Contents
## Key: Case2PredictionsRegressEXAMPLE.csv
## LastModified: 2023-04-01T23:26:56.000Z
## ETag: "a0f1f01c30e2cd00488822ad3c9aa6fe"
## Size (B): 3187
## Owner: 244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4
## Storage class: STANDARD
##
## $Contents
## Key: CaseStudy2-data.csv
## LastModified: 2023-04-01T23:26:59.000Z
## ETag: "d68dd080517407fb3a4f05d91fed27d7"
## Size (B): 138428
## Owner: 244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4
## Storage class: STANDARD
##
## $Contents
## Key: CaseStudy2CompSet No Attrition.csv
## LastModified: 2023-04-01T23:27:00.000Z
## ETag: "6c9d92b8a6fc5fd805ff0a5d4dfddde0"
## Size (B): 47686
## Owner: 244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4
## Storage class: STANDARD
##
## $Contents
## Key: CaseStudy2CompSet No Salary.csv
## LastModified: 2023-04-03T01:43:23.000Z
## ETag: "30f4c83e1ebb33b19fe6b44c377a1fbd"
## Size (B): 46614
## Owner:
## Storage class: STANDARD
##
## $Contents
## Key: CaseStudy2CompSet No Salary.xlsx
## LastModified: 2023-04-01T23:27:02.000Z
## ETag: "bdcb211847739638a631f828a3278339"
## Size (B): 56381
## Owner: 244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4
## Storage class: STANDARD
## [1] "ID" "Age"
## [3] "Attrition" "BusinessTravel"
## [5] "DailyRate" "Department"
## [7] "DistanceFromHome" "Education"
## [9] "EducationField" "EmployeeCount"
## [11] "EmployeeNumber" "EnvironmentSatisfaction"
## [13] "Gender" "HourlyRate"
## [15] "JobInvolvement" "JobLevel"
## [17] "JobRole" "JobSatisfaction"
## [19] "MaritalStatus" "MonthlyIncome"
## [21] "MonthlyRate" "NumCompaniesWorked"
## [23] "Over18" "OverTime"
## [25] "PercentSalaryHike" "PerformanceRating"
## [27] "RelationshipSatisfaction" "StandardHours"
## [29] "StockOptionLevel" "TotalWorkingYears"
## [31] "TrainingTimesLastYear" "WorkLifeBalance"
## [33] "YearsAtCompany" "YearsInCurrentRole"
## [35] "YearsSinceLastPromotion" "YearsWithCurrManager"
## 'data.frame': 870 obs. of 36 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Age : int 32 40 35 32 24 27 41 37 34 34 ...
## $ Attrition : chr "No" "No" "No" "No" ...
## $ BusinessTravel : chr "Travel_Rarely" "Travel_Rarely" "Travel_Frequently" "Travel_Rarely" ...
## $ DailyRate : int 117 1308 200 801 567 294 1283 309 1333 653 ...
## $ Department : chr "Sales" "Research & Development" "Research & Development" "Sales" ...
## $ DistanceFromHome : int 13 14 18 1 2 10 5 10 10 10 ...
## $ Education : int 4 3 2 4 1 2 5 4 4 4 ...
## $ EducationField : chr "Life Sciences" "Medical" "Life Sciences" "Marketing" ...
## $ EmployeeCount : int 1 1 1 1 1 1 1 1 1 1 ...
## $ EmployeeNumber : int 859 1128 1412 2016 1646 733 1448 1105 1055 1597 ...
## $ EnvironmentSatisfaction : int 2 3 3 3 1 4 2 4 3 4 ...
## $ Gender : chr "Male" "Male" "Male" "Female" ...
## $ HourlyRate : int 73 44 60 48 32 32 90 88 87 92 ...
## $ JobInvolvement : int 3 2 3 3 3 3 4 2 3 2 ...
## $ JobLevel : int 2 5 3 3 1 3 1 2 1 2 ...
## $ JobRole : chr "Sales Executive" "Research Director" "Manufacturing Director" "Sales Executive" ...
## $ JobSatisfaction : int 4 3 4 4 4 1 3 4 3 3 ...
## $ MaritalStatus : chr "Divorced" "Single" "Single" "Married" ...
## $ MonthlyIncome : int 4403 19626 9362 10422 3760 8793 2127 6694 2220 5063 ...
## $ MonthlyRate : int 9250 17544 19944 24032 17218 4809 5561 24223 18410 15332 ...
## $ NumCompaniesWorked : int 2 1 2 1 1 1 2 2 1 1 ...
## $ Over18 : chr "Y" "Y" "Y" "Y" ...
## $ OverTime : chr "No" "No" "No" "No" ...
## $ PercentSalaryHike : int 11 14 11 19 13 21 12 14 19 14 ...
## $ PerformanceRating : int 3 3 3 3 3 4 3 3 3 3 ...
## $ RelationshipSatisfaction: int 3 1 3 3 3 3 1 3 4 2 ...
## $ StandardHours : int 80 80 80 80 80 80 80 80 80 80 ...
## $ StockOptionLevel : int 1 0 0 2 0 2 0 3 1 1 ...
## $ TotalWorkingYears : int 8 21 10 14 6 9 7 8 1 8 ...
## $ TrainingTimesLastYear : int 3 2 2 3 2 4 5 5 2 3 ...
## $ WorkLifeBalance : int 2 4 3 3 3 2 2 3 3 2 ...
## $ YearsAtCompany : int 5 20 2 14 6 9 4 1 1 8 ...
## $ YearsInCurrentRole : int 2 7 2 10 3 7 2 0 1 2 ...
## $ YearsSinceLastPromotion : int 0 4 2 5 1 1 0 0 0 7 ...
## $ YearsWithCurrManager : int 3 9 2 7 3 7 3 0 0 7 ...
See the number of Yes and No for attrition respectively in a table
#see the number of Yes and No for attrition respectively
table(case_study$Attrition)
##
## No Yes
## 730 140
See the percentage for Attrition via a pie chart
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
Observation 1
- Observe that we only have int and string data types features. 27
features are numerical and 9 features are categorical
- Attrition is the value we are targeting. And the quantity of data of employees having Attrition is less compared to employees which do not have Attrition. So from the pie chart, we can see that of the 870 employee 16% left their job due to some reason, but 84% of the employees are still working at the company.
- No missing values, so we have a complete data set which is ideal.
## 'data.frame': 870 obs. of 27 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Age : int 32 40 35 32 24 27 41 37 34 34 ...
## $ DailyRate : int 117 1308 200 801 567 294 1283 309 1333 653 ...
## $ DistanceFromHome : int 13 14 18 1 2 10 5 10 10 10 ...
## $ Education : int 4 3 2 4 1 2 5 4 4 4 ...
## $ EmployeeCount : int 1 1 1 1 1 1 1 1 1 1 ...
## $ EmployeeNumber : int 859 1128 1412 2016 1646 733 1448 1105 1055 1597 ...
## $ EnvironmentSatisfaction : int 2 3 3 3 1 4 2 4 3 4 ...
## $ HourlyRate : int 73 44 60 48 32 32 90 88 87 92 ...
## $ JobInvolvement : int 3 2 3 3 3 3 4 2 3 2 ...
## $ JobLevel : int 2 5 3 3 1 3 1 2 1 2 ...
## $ JobSatisfaction : int 4 3 4 4 4 1 3 4 3 3 ...
## $ MonthlyIncome : int 4403 19626 9362 10422 3760 8793 2127 6694 2220 5063 ...
## $ MonthlyRate : int 9250 17544 19944 24032 17218 4809 5561 24223 18410 15332 ...
## $ NumCompaniesWorked : int 2 1 2 1 1 1 2 2 1 1 ...
## $ PercentSalaryHike : int 11 14 11 19 13 21 12 14 19 14 ...
## $ PerformanceRating : int 3 3 3 3 3 4 3 3 3 3 ...
## $ RelationshipSatisfaction: int 3 1 3 3 3 3 1 3 4 2 ...
## $ StandardHours : int 80 80 80 80 80 80 80 80 80 80 ...
## $ StockOptionLevel : int 1 0 0 2 0 2 0 3 1 1 ...
## $ TotalWorkingYears : int 8 21 10 14 6 9 7 8 1 8 ...
## $ TrainingTimesLastYear : int 3 2 2 3 2 4 5 5 2 3 ...
## $ WorkLifeBalance : int 2 4 3 3 3 2 2 3 3 2 ...
## $ YearsAtCompany : int 5 20 2 14 6 9 4 1 1 8 ...
## $ YearsInCurrentRole : int 2 7 2 10 3 7 2 0 1 2 ...
## $ YearsSinceLastPromotion : int 0 4 2 5 1 1 0 0 0 7 ...
## $ YearsWithCurrManager : int 3 9 2 7 3 7 3 0 0 7 ...
## [1] 870 27
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
## Warning in cor(case_study_numeric): the standard deviation is zero
Observation 2
- JobLevel appears to be a crucial feature and you better believe we’ll be delving deeper into it with some exploratory analysis.
- Brace yourself, because we discovered some positively correlated relationships: TotalWorkingYears has a positive relationship with both JobLevel and MonthlyIncome indicated by the dark shade of red.
- And that’s not all folks, we also found a positive relationship between YearsAtCompany with both YearsInCurrentRole and YearsWithCurrentManager.
EDA
Analysis of Catergorical Features
## BusinessTravel n
## 1 Non-Travel 94
## 2 Travel_Frequently 158
## 3 Travel_Rarely 618
## `summarise()` has grouped output by 'BusinessTravel'. You can override using
## the `.groups` argument.
## Department n
## 1 Human Resources 35
## 2 Research & Development 562
## 3 Sales 273
## `summarise()` has grouped output by 'Department'. You can override using the
## `.groups` argument.
## EducationField n
## 1 Human Resources 15
## 2 Life Sciences 358
## 3 Marketing 100
## 4 Medical 270
## 5 Other 52
## 6 Technical Degree 75
## `summarise()` has grouped output by 'EducationField'. You can override using
## the `.groups` argument.
## Gender n
## 1 Female 354
## 2 Male 516
## `summarise()` has grouped output by 'Gender'. You can override using the
## `.groups` argument.
## MaritalStatus n
## 1 Divorced 191
## 2 Married 410
## 3 Single 269
## `summarise()` has grouped output by 'MaritalStatus'. You can override using the
## `.groups` argument.
## JobRole n
## 1 Healthcare Representative 76
## 2 Human Resources 27
## 3 Laboratory Technician 153
## 4 Manager 51
## 5 Manufacturing Director 87
## 6 Research Director 51
## 7 Research Scientist 172
## 8 Sales Executive 200
## 9 Sales Representative 53
## `summarise()` has grouped output by 'JobRole'. You can override using the
## `.groups` argument.
## OverTime n
## 1 No 618
## 2 Yes 252
## `summarise()` has grouped output by 'OverTime'. You can override using the
## `.groups` argument.
##
## Female Male
## No 252 366
## Yes 102 150
Observation 3
- There are more employees who travel very rarely, and the number of Attrition of such employees are greater. So employees who get less chance to travel may need to be given surveys to see if they want to travel to reduce attrition rate here.
- Employees working in R&D department are the biggest population, but employees from sales department or in positions like sales executive or sale Representative leave the job early.
- Males have a higher attrition then Females.
- Male workers work overtime more than females.
Analysis of Numerical Features
Distribution of Age with a mean age vertical line
Ordinal Features
## Education n
## 1 1 98
## 2 2 182
## 3 3 324
## 4 4 240
## 5 5 26
Observation 4
- There are more Employees with Bachelor degrees than other education levels. Attrition with respect to bachelor might have more expectations from the company and we will explore the reason behind this from this dataset.
- EnvironmentSatisfaction are ranked as high or medium. We observe from the data a small amount of employees do not like the work environment which cannot be a large cause for changing a job.
- JobInvolvement of employees is very high indicated by 514 for 3 ‘High’.
## EnvironmentSatisfaction n
## 1 1 172
## 2 2 178
## 3 3 258
## 4 4 262
## JobInvolvement n
## 1 1 47
## 2 2 228
## 3 3 514
## 4 4 81
## `geom_smooth()` using formula = 'y ~ x'
Observation 5
- There seems to be a positive linear relationship. As age and experience increases so does income.
## JobLevel n
## 1 1 329
## 2 2 312
## 3 3 132
## 4 4 60
## 5 5 37
## `summarise()` has grouped output by 'JobLevel'. You can override using the
## `.groups` argument.
## NumCompaniesWorked n
## 1 0 111
## 2 1 320
## 3 2 74
## 4 3 91
## 5 4 85
## 6 5 43
## 7 6 39
## 8 7 46
## 9 8 28
## 10 9 33
## `summarise()` has grouped output by 'NumCompaniesWorked'. You can override
## using the `.groups` argument.
## StockOptionLevel n
## 1 0 379
## 2 1 355
## 3 2 81
## 4 3 55
## `summarise()` has grouped output by 'StockOptionLevel'. You can override using
## the `.groups` argument.
## Warning in cor(case_study_numeric): the standard deviation is zero
Observations: Factors Responsible for Employee Attrition
- OverTime has highest relationship with Attrition. Employee who do OverTime are more likely to change or leave the Job early. We observed this as well during categorical variable analysis.
- Age is second highest factor, as those who are senior are retiring or the employees who have bachelors degree have more expectations from the organization and may feel burned out.
- MonthlyIncome is third factor for employee to Attrition
Modeling
## Attrition n
## 1 No 730
## 2 Yes 140
## Attrition
## classifications No Yes
## No 709 93
## Yes 21 47
## Confusion Matrix and Statistics
##
## Attrition
## classifications No Yes
## No 709 93
## Yes 21 47
##
## Accuracy : 0.869
## 95% CI : (0.8447, 0.8907)
## No Information Rate : 0.8391
## P-Value [Acc > NIR] : 0.00809
##
## Kappa : 0.3875
##
## Mcnemar's Test P-Value : 2.936e-11
##
## Sensitivity : 0.9712
## Specificity : 0.3357
## Pos Pred Value : 0.8840
## Neg Pred Value : 0.6912
## Prevalence : 0.8391
## Detection Rate : 0.8149
## Detection Prevalence : 0.9218
## Balanced Accuracy : 0.6535
##
## 'Positive' Class : No
##
## [1] 870 36
## [1] 730 36
## [1] 140 36
## [1] 140 36
## [1] 280 36
##
## classifications No Yes
## No 103 34
## Yes 37 106
## Confusion Matrix and Statistics
##
##
## classifications No Yes
## No 103 34
## Yes 37 106
##
## Accuracy : 0.7464
## 95% CI : (0.6912, 0.7963)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.4929
##
## Mcnemar's Test P-Value : 0.8124
##
## Sensitivity : 0.7357
## Specificity : 0.7571
## Pos Pred Value : 0.7518
## Neg Pred Value : 0.7413
## Prevalence : 0.5000
## Detection Rate : 0.3679
## Detection Prevalence : 0.4893
## Balanced Accuracy : 0.7464
##
## 'Positive' Class : No
##
## [1] 730 36
## [1] 1460 36
##
## classifications No Yes
## No 528 1
## Yes 202 729
## Confusion Matrix and Statistics
##
##
## classifications No Yes
## No 528 1
## Yes 202 729
##
## Accuracy : 0.861
## 95% CI : (0.8421, 0.8783)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.7219
##
## Mcnemar's Test P-Value : < 2.2e-16
##
## Sensitivity : 0.7233
## Specificity : 0.9986
## Pos Pred Value : 0.9981
## Neg Pred Value : 0.7830
## Prevalence : 0.5000
## Detection Rate : 0.3616
## Detection Prevalence : 0.3623
## Balanced Accuracy : 0.8610
##
## 'Positive' Class : No
##
Observation from KNN prediction model for attribute Age and Monthly Income as Predictor for Attrition
- With the model for Oversampling when k = 3 we have a model that predicts Attrition around 86% with Sensitivity of around 72% and Specificity of around 99%.