Exploration of Attrition

A project to predict employee turnover or attrition using data science.

DDSAnalytics Exploration of Attrition

Shiny App

DDSAnalytics is an analytics company that specializes in talent management solutions for Fortune 100 companies. Talent management is defined as the iterative process of developing and retaining employees. It may include workforce planning, employee training programs, identifying high-potential employees and reducing/preventing voluntary employee turnover (attrition). To gain a competitive edge over its competition, DDSAnalytics is planning to leverage data science for talent management. The executive leadership has identified predicting employee turnover as its first application of data science for talent management. Before the business green lights the project, they have tasked your data science team to conduct an analysis of existing employee data. The business is also interested in learning about any job role specific trends that may exist in the data set (e.g., “Data Scientists have the highest job satisfaction”). You can also provide any other interesting trends and observations from your analysis. The analysis should be backed up by robust experimentation and appropriate visualization. Experiments and analysis must be conducted in R. You will also be asked to build a model to predict attrition.

  • We will analyze features for reasons behind Attrition of Employees.
  • We will Identify the top three factors that lead to employee attrition.
  • We will build a model to predict attrition.

Description about the data

  • Education 1 ‘Below College’ 2 ‘College’ 3 ‘Bachelor’ 4 ‘Master’ 5 ‘Doctor’

  • EnvironmentSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

  • JobInvolvement 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

  • JobSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

  • PerformanceRating 1 ‘Low’ 2 ‘Good’ 3 ‘Excellent’ 4 ‘Outstanding’

  • RelationshipSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

  • WorkLifeBalance 1 ‘Bad’ 2 ‘Good’ 3 ‘Better’ 4 ‘Best’

Get a feel for the data

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.1     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
## Loading required package: lattice
## 
## 
## Attaching package: 'caret'
## 
## 
## The following object is masked from 'package:purrr':
## 
##     lift
##           Bucket             CreationDate
## 1 firstbucket.sk 2023-03-24T07:16:41.000Z
## Bucket: ddsproject1 
## 
## $Contents
## Key:            Case2PredictionsClassifyEXAMPLE.csv 
## LastModified:   2023-04-01T23:26:57.000Z 
## ETag:           "bd1de75effe9449f7d49a4de5116205a" 
## Size (B):       3012 
## Owner:          244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4 
## Storage class:  STANDARD 
## 
## $Contents
## Key:            Case2PredictionsRegressEXAMPLE.csv 
## LastModified:   2023-04-01T23:26:56.000Z 
## ETag:           "a0f1f01c30e2cd00488822ad3c9aa6fe" 
## Size (B):       3187 
## Owner:          244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4 
## Storage class:  STANDARD 
## 
## $Contents
## Key:            CaseStudy2-data.csv 
## LastModified:   2023-04-01T23:26:59.000Z 
## ETag:           "d68dd080517407fb3a4f05d91fed27d7" 
## Size (B):       138428 
## Owner:          244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4 
## Storage class:  STANDARD 
## 
## $Contents
## Key:            CaseStudy2CompSet No Attrition.csv 
## LastModified:   2023-04-01T23:27:00.000Z 
## ETag:           "6c9d92b8a6fc5fd805ff0a5d4dfddde0" 
## Size (B):       47686 
## Owner:          244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4 
## Storage class:  STANDARD 
## 
## $Contents
## Key:            CaseStudy2CompSet No Salary.csv 
## LastModified:   2023-04-03T01:43:23.000Z 
## ETag:           "30f4c83e1ebb33b19fe6b44c377a1fbd" 
## Size (B):       46614 
## Owner:          
## Storage class:  STANDARD 
## 
## $Contents
## Key:            CaseStudy2CompSet No Salary.xlsx 
## LastModified:   2023-04-01T23:27:02.000Z 
## ETag:           "bdcb211847739638a631f828a3278339" 
## Size (B):       56381 
## Owner:          244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4 
## Storage class:  STANDARD
##   ID Age Attrition    BusinessTravel DailyRate             Department
## 1  1  32        No     Travel_Rarely       117                  Sales
## 2  2  40        No     Travel_Rarely      1308 Research & Development
## 3  3  35        No Travel_Frequently       200 Research & Development
## 4  4  32        No     Travel_Rarely       801                  Sales
## 5  5  24        No Travel_Frequently       567 Research & Development
## 6  6  27        No Travel_Frequently       294 Research & Development
##   DistanceFromHome Education   EducationField EmployeeCount EmployeeNumber
## 1               13         4    Life Sciences             1            859
## 2               14         3          Medical             1           1128
## 3               18         2    Life Sciences             1           1412
## 4                1         4        Marketing             1           2016
## 5                2         1 Technical Degree             1           1646
## 6               10         2    Life Sciences             1            733
##   EnvironmentSatisfaction Gender HourlyRate JobInvolvement JobLevel
## 1                       2   Male         73              3        2
## 2                       3   Male         44              2        5
## 3                       3   Male         60              3        3
## 4                       3 Female         48              3        3
## 5                       1 Female         32              3        1
## 6                       4   Male         32              3        3
##                  JobRole JobSatisfaction MaritalStatus MonthlyIncome
## 1        Sales Executive               4      Divorced          4403
## 2      Research Director               3        Single         19626
## 3 Manufacturing Director               4        Single          9362
## 4        Sales Executive               4       Married         10422
## 5     Research Scientist               4        Single          3760
## 6 Manufacturing Director               1      Divorced          8793
##   MonthlyRate NumCompaniesWorked Over18 OverTime PercentSalaryHike
## 1        9250                  2      Y       No                11
## 2       17544                  1      Y       No                14
## 3       19944                  2      Y       No                11
## 4       24032                  1      Y       No                19
## 5       17218                  1      Y      Yes                13
## 6        4809                  1      Y       No                21
##   PerformanceRating RelationshipSatisfaction StandardHours StockOptionLevel
## 1                 3                        3            80                1
## 2                 3                        1            80                0
## 3                 3                        3            80                0
## 4                 3                        3            80                2
## 5                 3                        3            80                0
## 6                 4                        3            80                2
##   TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany
## 1                 8                     3               2              5
## 2                21                     2               4             20
## 3                10                     2               3              2
## 4                14                     3               3             14
## 5                 6                     2               3              6
## 6                 9                     4               2              9
##   YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
## 1                  2                       0                    3
## 2                  7                       4                    9
## 3                  2                       2                    2
## 4                 10                       5                    7
## 5                  3                       1                    3
## 6                  7                       1                    7
## [1] 870  36
## 
##  Welch Two Sample t-test
## 
## data:  Age by Attrition
## t = 4.1509, df = 184.91, p-value = 5.05e-05
## alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
## 95 percent confidence interval:
##  1.902905 5.350324
## sample estimates:
##  mean in group No mean in group Yes 
##          37.41233          33.78571
##           Bucket             CreationDate
## 1 firstbucket.sk 2023-03-24T07:16:41.000Z
## Bucket: ddsproject1 
## 
## $Contents
## Key:            Case2PredictionsClassifyEXAMPLE.csv 
## LastModified:   2023-04-01T23:26:57.000Z 
## ETag:           "bd1de75effe9449f7d49a4de5116205a" 
## Size (B):       3012 
## Owner:          244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4 
## Storage class:  STANDARD 
## 
## $Contents
## Key:            Case2PredictionsRegressEXAMPLE.csv 
## LastModified:   2023-04-01T23:26:56.000Z 
## ETag:           "a0f1f01c30e2cd00488822ad3c9aa6fe" 
## Size (B):       3187 
## Owner:          244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4 
## Storage class:  STANDARD 
## 
## $Contents
## Key:            CaseStudy2-data.csv 
## LastModified:   2023-04-01T23:26:59.000Z 
## ETag:           "d68dd080517407fb3a4f05d91fed27d7" 
## Size (B):       138428 
## Owner:          244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4 
## Storage class:  STANDARD 
## 
## $Contents
## Key:            CaseStudy2CompSet No Attrition.csv 
## LastModified:   2023-04-01T23:27:00.000Z 
## ETag:           "6c9d92b8a6fc5fd805ff0a5d4dfddde0" 
## Size (B):       47686 
## Owner:          244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4 
## Storage class:  STANDARD 
## 
## $Contents
## Key:            CaseStudy2CompSet No Salary.csv 
## LastModified:   2023-04-03T01:43:23.000Z 
## ETag:           "30f4c83e1ebb33b19fe6b44c377a1fbd" 
## Size (B):       46614 
## Owner:          
## Storage class:  STANDARD 
## 
## $Contents
## Key:            CaseStudy2CompSet No Salary.xlsx 
## LastModified:   2023-04-01T23:27:02.000Z 
## ETag:           "bdcb211847739638a631f828a3278339" 
## Size (B):       56381 
## Owner:          244d58669cf167603091f3194f7e19dc964453ccdec5c5eb3645ee5152f563e4 
## Storage class:  STANDARD
##  [1] "ID"                       "Age"                     
##  [3] "Attrition"                "BusinessTravel"          
##  [5] "DailyRate"                "Department"              
##  [7] "DistanceFromHome"         "Education"               
##  [9] "EducationField"           "EmployeeCount"           
## [11] "EmployeeNumber"           "EnvironmentSatisfaction" 
## [13] "Gender"                   "HourlyRate"              
## [15] "JobInvolvement"           "JobLevel"                
## [17] "JobRole"                  "JobSatisfaction"         
## [19] "MaritalStatus"            "MonthlyIncome"           
## [21] "MonthlyRate"              "NumCompaniesWorked"      
## [23] "Over18"                   "OverTime"                
## [25] "PercentSalaryHike"        "PerformanceRating"       
## [27] "RelationshipSatisfaction" "StandardHours"           
## [29] "StockOptionLevel"         "TotalWorkingYears"       
## [31] "TrainingTimesLastYear"    "WorkLifeBalance"         
## [33] "YearsAtCompany"           "YearsInCurrentRole"      
## [35] "YearsSinceLastPromotion"  "YearsWithCurrManager"
## 'data.frame':    870 obs. of  36 variables:
##  $ ID                      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Age                     : int  32 40 35 32 24 27 41 37 34 34 ...
##  $ Attrition               : chr  "No" "No" "No" "No" ...
##  $ BusinessTravel          : chr  "Travel_Rarely" "Travel_Rarely" "Travel_Frequently" "Travel_Rarely" ...
##  $ DailyRate               : int  117 1308 200 801 567 294 1283 309 1333 653 ...
##  $ Department              : chr  "Sales" "Research & Development" "Research & Development" "Sales" ...
##  $ DistanceFromHome        : int  13 14 18 1 2 10 5 10 10 10 ...
##  $ Education               : int  4 3 2 4 1 2 5 4 4 4 ...
##  $ EducationField          : chr  "Life Sciences" "Medical" "Life Sciences" "Marketing" ...
##  $ EmployeeCount           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ EmployeeNumber          : int  859 1128 1412 2016 1646 733 1448 1105 1055 1597 ...
##  $ EnvironmentSatisfaction : int  2 3 3 3 1 4 2 4 3 4 ...
##  $ Gender                  : chr  "Male" "Male" "Male" "Female" ...
##  $ HourlyRate              : int  73 44 60 48 32 32 90 88 87 92 ...
##  $ JobInvolvement          : int  3 2 3 3 3 3 4 2 3 2 ...
##  $ JobLevel                : int  2 5 3 3 1 3 1 2 1 2 ...
##  $ JobRole                 : chr  "Sales Executive" "Research Director" "Manufacturing Director" "Sales Executive" ...
##  $ JobSatisfaction         : int  4 3 4 4 4 1 3 4 3 3 ...
##  $ MaritalStatus           : chr  "Divorced" "Single" "Single" "Married" ...
##  $ MonthlyIncome           : int  4403 19626 9362 10422 3760 8793 2127 6694 2220 5063 ...
##  $ MonthlyRate             : int  9250 17544 19944 24032 17218 4809 5561 24223 18410 15332 ...
##  $ NumCompaniesWorked      : int  2 1 2 1 1 1 2 2 1 1 ...
##  $ Over18                  : chr  "Y" "Y" "Y" "Y" ...
##  $ OverTime                : chr  "No" "No" "No" "No" ...
##  $ PercentSalaryHike       : int  11 14 11 19 13 21 12 14 19 14 ...
##  $ PerformanceRating       : int  3 3 3 3 3 4 3 3 3 3 ...
##  $ RelationshipSatisfaction: int  3 1 3 3 3 3 1 3 4 2 ...
##  $ StandardHours           : int  80 80 80 80 80 80 80 80 80 80 ...
##  $ StockOptionLevel        : int  1 0 0 2 0 2 0 3 1 1 ...
##  $ TotalWorkingYears       : int  8 21 10 14 6 9 7 8 1 8 ...
##  $ TrainingTimesLastYear   : int  3 2 2 3 2 4 5 5 2 3 ...
##  $ WorkLifeBalance         : int  2 4 3 3 3 2 2 3 3 2 ...
##  $ YearsAtCompany          : int  5 20 2 14 6 9 4 1 1 8 ...
##  $ YearsInCurrentRole      : int  2 7 2 10 3 7 2 0 1 2 ...
##  $ YearsSinceLastPromotion : int  0 4 2 5 1 1 0 0 0 7 ...
##  $ YearsWithCurrManager    : int  3 9 2 7 3 7 3 0 0 7 ...

See the number of Yes and No for attrition respectively in a table

#see the number of Yes and No for attrition respectively 
table(case_study$Attrition)
## 
##  No Yes 
## 730 140

See the percentage for Attrition via a pie chart

## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor

Observation 1

  • Observe that we only have int and string data types features. 27 features are numerical and 9 features are categorical
  • Attrition is the value we are targeting. And the quantity of data of employees having Attrition is less compared to employees which do not have Attrition. So from the pie chart, we can see that of the 870 employee 16% left their job due to some reason, but 84% of the employees are still working at the company.
  • No missing values, so we have a complete data set which is ideal.
## 'data.frame':    870 obs. of  27 variables:
##  $ ID                      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Age                     : int  32 40 35 32 24 27 41 37 34 34 ...
##  $ DailyRate               : int  117 1308 200 801 567 294 1283 309 1333 653 ...
##  $ DistanceFromHome        : int  13 14 18 1 2 10 5 10 10 10 ...
##  $ Education               : int  4 3 2 4 1 2 5 4 4 4 ...
##  $ EmployeeCount           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ EmployeeNumber          : int  859 1128 1412 2016 1646 733 1448 1105 1055 1597 ...
##  $ EnvironmentSatisfaction : int  2 3 3 3 1 4 2 4 3 4 ...
##  $ HourlyRate              : int  73 44 60 48 32 32 90 88 87 92 ...
##  $ JobInvolvement          : int  3 2 3 3 3 3 4 2 3 2 ...
##  $ JobLevel                : int  2 5 3 3 1 3 1 2 1 2 ...
##  $ JobSatisfaction         : int  4 3 4 4 4 1 3 4 3 3 ...
##  $ MonthlyIncome           : int  4403 19626 9362 10422 3760 8793 2127 6694 2220 5063 ...
##  $ MonthlyRate             : int  9250 17544 19944 24032 17218 4809 5561 24223 18410 15332 ...
##  $ NumCompaniesWorked      : int  2 1 2 1 1 1 2 2 1 1 ...
##  $ PercentSalaryHike       : int  11 14 11 19 13 21 12 14 19 14 ...
##  $ PerformanceRating       : int  3 3 3 3 3 4 3 3 3 3 ...
##  $ RelationshipSatisfaction: int  3 1 3 3 3 3 1 3 4 2 ...
##  $ StandardHours           : int  80 80 80 80 80 80 80 80 80 80 ...
##  $ StockOptionLevel        : int  1 0 0 2 0 2 0 3 1 1 ...
##  $ TotalWorkingYears       : int  8 21 10 14 6 9 7 8 1 8 ...
##  $ TrainingTimesLastYear   : int  3 2 2 3 2 4 5 5 2 3 ...
##  $ WorkLifeBalance         : int  2 4 3 3 3 2 2 3 3 2 ...
##  $ YearsAtCompany          : int  5 20 2 14 6 9 4 1 1 8 ...
##  $ YearsInCurrentRole      : int  2 7 2 10 3 7 2 0 1 2 ...
##  $ YearsSinceLastPromotion : int  0 4 2 5 1 1 0 0 0 7 ...
##  $ YearsWithCurrManager    : int  3 9 2 7 3 7 3 0 0 7 ...
## [1] 870  27
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
## Warning in cor(case_study_numeric): the standard deviation is zero

Observation 2

  • JobLevel appears to be a crucial feature and you better believe we’ll be delving deeper into it with some exploratory analysis.
  • Brace yourself, because we discovered some positively correlated relationships: TotalWorkingYears has a positive relationship with both JobLevel and MonthlyIncome indicated by the dark shade of red.
  • And that’s not all folks, we also found a positive relationship between YearsAtCompany with both YearsInCurrentRole and YearsWithCurrentManager.

EDA

Analysis of Catergorical Features

##      BusinessTravel   n
## 1        Non-Travel  94
## 2 Travel_Frequently 158
## 3     Travel_Rarely 618
## `summarise()` has grouped output by 'BusinessTravel'. You can override using
## the `.groups` argument.

##               Department   n
## 1        Human Resources  35
## 2 Research & Development 562
## 3                  Sales 273
## `summarise()` has grouped output by 'Department'. You can override using the
## `.groups` argument.

##     EducationField   n
## 1  Human Resources  15
## 2    Life Sciences 358
## 3        Marketing 100
## 4          Medical 270
## 5            Other  52
## 6 Technical Degree  75
## `summarise()` has grouped output by 'EducationField'. You can override using
## the `.groups` argument.

##   Gender   n
## 1 Female 354
## 2   Male 516
## `summarise()` has grouped output by 'Gender'. You can override using the
## `.groups` argument.

##   MaritalStatus   n
## 1      Divorced 191
## 2       Married 410
## 3        Single 269
## `summarise()` has grouped output by 'MaritalStatus'. You can override using the
## `.groups` argument.

##                     JobRole   n
## 1 Healthcare Representative  76
## 2           Human Resources  27
## 3     Laboratory Technician 153
## 4                   Manager  51
## 5    Manufacturing Director  87
## 6         Research Director  51
## 7        Research Scientist 172
## 8           Sales Executive 200
## 9      Sales Representative  53
## `summarise()` has grouped output by 'JobRole'. You can override using the
## `.groups` argument.

##   OverTime   n
## 1       No 618
## 2      Yes 252
## `summarise()` has grouped output by 'OverTime'. You can override using the
## `.groups` argument.

##      
##       Female Male
##   No     252  366
##   Yes    102  150

Observation 3

  • There are more employees who travel very rarely, and the number of Attrition of such employees are greater. So employees who get less chance to travel may need to be given surveys to see if they want to travel to reduce attrition rate here.
  • Employees working in R&D department are the biggest population, but employees from sales department or in positions like sales executive or sale Representative leave the job early.
  • Males have a higher attrition then Females.
  • Male workers work overtime more than females.

Analysis of Numerical Features

Distribution of Age with a mean age vertical line

Ordinal Features

##   Education   n
## 1         1  98
## 2         2 182
## 3         3 324
## 4         4 240
## 5         5  26

Observation 4

  • There are more Employees with Bachelor degrees than other education levels. Attrition with respect to bachelor might have more expectations from the company and we will explore the reason behind this from this dataset.
  • EnvironmentSatisfaction are ranked as high or medium. We observe from the data a small amount of employees do not like the work environment which cannot be a large cause for changing a job.
  • JobInvolvement of employees is very high indicated by 514 for 3 ‘High’.
##   EnvironmentSatisfaction   n
## 1                       1 172
## 2                       2 178
## 3                       3 258
## 4                       4 262
##   JobInvolvement   n
## 1              1  47
## 2              2 228
## 3              3 514
## 4              4  81

## `geom_smooth()` using formula = 'y ~ x'

Observation 5

  • There seems to be a positive linear relationship. As age and experience increases so does income.
##   JobLevel   n
## 1        1 329
## 2        2 312
## 3        3 132
## 4        4  60
## 5        5  37
## `summarise()` has grouped output by 'JobLevel'. You can override using the
## `.groups` argument.

##    NumCompaniesWorked   n
## 1                   0 111
## 2                   1 320
## 3                   2  74
## 4                   3  91
## 5                   4  85
## 6                   5  43
## 7                   6  39
## 8                   7  46
## 9                   8  28
## 10                  9  33
## `summarise()` has grouped output by 'NumCompaniesWorked'. You can override
## using the `.groups` argument.

##   StockOptionLevel   n
## 1                0 379
## 2                1 355
## 3                2  81
## 4                3  55
## `summarise()` has grouped output by 'StockOptionLevel'. You can override using
## the `.groups` argument.

## Warning in cor(case_study_numeric): the standard deviation is zero

Observations: Factors Responsible for Employee Attrition

  • OverTime has highest relationship with Attrition. Employee who do OverTime are more likely to change or leave the Job early. We observed this as well during categorical variable analysis.
  • Age is second highest factor, as those who are senior are retiring or the employees who have bachelors degree have more expectations from the organization and may feel burned out.
  • MonthlyIncome is third factor for employee to Attrition

Modeling

##   Attrition   n
## 1        No 730
## 2       Yes 140
##                Attrition
## classifications  No Yes
##             No  709  93
##             Yes  21  47
## Confusion Matrix and Statistics
## 
##                Attrition
## classifications  No Yes
##             No  709  93
##             Yes  21  47
##                                           
##                Accuracy : 0.869           
##                  95% CI : (0.8447, 0.8907)
##     No Information Rate : 0.8391          
##     P-Value [Acc > NIR] : 0.00809         
##                                           
##                   Kappa : 0.3875          
##                                           
##  Mcnemar's Test P-Value : 2.936e-11       
##                                           
##             Sensitivity : 0.9712          
##             Specificity : 0.3357          
##          Pos Pred Value : 0.8840          
##          Neg Pred Value : 0.6912          
##              Prevalence : 0.8391          
##          Detection Rate : 0.8149          
##    Detection Prevalence : 0.9218          
##       Balanced Accuracy : 0.6535          
##                                           
##        'Positive' Class : No              
## 
## [1] 870  36
## [1] 730  36
## [1] 140  36
## [1] 140  36
## [1] 280  36
##                
## classifications  No Yes
##             No  103  34
##             Yes  37 106
## Confusion Matrix and Statistics
## 
##                
## classifications  No Yes
##             No  103  34
##             Yes  37 106
##                                           
##                Accuracy : 0.7464          
##                  95% CI : (0.6912, 0.7963)
##     No Information Rate : 0.5             
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.4929          
##                                           
##  Mcnemar's Test P-Value : 0.8124          
##                                           
##             Sensitivity : 0.7357          
##             Specificity : 0.7571          
##          Pos Pred Value : 0.7518          
##          Neg Pred Value : 0.7413          
##              Prevalence : 0.5000          
##          Detection Rate : 0.3679          
##    Detection Prevalence : 0.4893          
##       Balanced Accuracy : 0.7464          
##                                           
##        'Positive' Class : No              
## 
## [1] 730  36
## [1] 1460   36
##                
## classifications  No Yes
##             No  528   1
##             Yes 202 729
## Confusion Matrix and Statistics
## 
##                
## classifications  No Yes
##             No  528   1
##             Yes 202 729
##                                           
##                Accuracy : 0.861           
##                  95% CI : (0.8421, 0.8783)
##     No Information Rate : 0.5             
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.7219          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.7233          
##             Specificity : 0.9986          
##          Pos Pred Value : 0.9981          
##          Neg Pred Value : 0.7830          
##              Prevalence : 0.5000          
##          Detection Rate : 0.3616          
##    Detection Prevalence : 0.3623          
##       Balanced Accuracy : 0.8610          
##                                           
##        'Positive' Class : No              
## 

Observation from KNN prediction model for attribute Age and Monthly Income as Predictor for Attrition

  • With the model for Oversampling when k = 3 we have a model that predicts Attrition around 86% with Sensitivity of around 72% and Specificity of around 99%.