Easy 2-Factor ANOVA in R

The CGPfunctions provides a super-fast method to visualize the complete results of a two-factor ANOVA.

Libraries


require(tidyverse)
require(CGPfunctions)
require(skimr)
require(Hmisc) # %nin%
require(cowplot)

Data

You can download the data using this link

data <- read.csv(file = path) 

Data Wrangling

data %>% head()
data %>% skimr::skim()

data <- janitor::clean_names(data)

data <- 
   data %>%
   filter(gender %nin% c("","Other"),
          education_level %nin% c("","Other")) %>% 
   mutate(gender = factor(gender,levels = c("Male","Female")),
                education_level = factor(education_level, 
                                         levels  = data$education_level[data$education_level %nin%c("","Other")] %>% unique()))

data %>% count(education_level) # There's a PhD coded as phD
#data[which(data$education_level == "phD"),"education_level"] = "PhD"
#data[which(data$education_level == "Bachelor's Degree"),"education_level"] = "Bachelor's"
#data[which(data$education_level == "Master's Degree"),"education_level"] = "Master's"

# education_level should be ordered
data$education_level <- factor(data$education_level,levels = c("High School","Bachelor's","Master's","PhD"))

&ldquo;head data&rdquo;

Descriptive plots

Let’s graph the variables we are interested in

a1 <- 
   ggpubr::ggviolin(data %>% filter(gender != "Other"), 
               y = "salary", x = "gender",
               add = "boxplot",
               fill = "gender")+
   coord_cartesian(ylim = c(0,250000)) +
   xlab("")+
   theme(legend.position = "none")+
   ggtitle("Salary by Gender")

a2 <- 
   ggpubr::ggviolin(data %>% filter(gender != "Other"), 
               y = "salary", x = "education_level",
               add = "boxplot",
               fill = "education_level")+
   coord_cartesian(ylim = c(0,250000)) +
   xlab("")+
   theme(legend.position = "none")+
   ggtitle("Salary by Educational level")

cowplot::plot_grid(a1,a2)

&ldquo;factor plot&rdquo;

ANOVA 2WAY

We are going to proceed with the two way ANOVA. I want to remember that a good practice would be to check the assumptions before directly jumping into the analysis. However, the objective of this post is to provide a preview of the function and its output.

CGPfunctions::Plot2WayANOVA(dataframe = data,
                            formula = salary ~ education_level*gender,
                            errorbar.display = "SEM",
                            mean.label = T)

&ldquo;main&rdquo;
The function returns four main outputs:

  • Main result of the ANOVA

  • Descriptive table of disaggregated means with confidence intervals and errors

  • Posthoc analysis

  • Interaction plot

Anova Results

The object can be saved for easier access to its elements. For example, to better visualize the overall ANOVA table, you can access it through the command: o1$ANOVATable.

print(o1$ANOVATable[c("term","df","statistic","p.value","etasq","cohens.f")])

&ldquo;anova table&rdquo;
According to the results, we can see that both the effects of the two variables and the interaction are significant in the model.

#Descriptive table of disaggregated means with confidence intervals and errors
o1$MeansTable

&ldquo;means table&rdquo;

post-hoc analyses

An important output is the post-hoc analysis. We can observe the main effects within each factor and the comparisons between the interaction of the levels. We can change the approach of these analyses within the parameter of the function posthoc.method

o1$PosthocTable


# Posthoc multiple comparisons of means: Scheffe Test 
#    95% family-wise confidence level
#
#$education_level
#                            diff    lwr.ci    upr.ci   pval    
#Bachelor's-High School  60667.30  53067.99  68266.60 <2e-16 ***
#Master's-High School    95662.77  87769.20 103556.34 <2e-16 ***
#PhD-High School        131235.85 123078.73 139392.96 <2e-16 ***
#Master's-Bachelor's     34995.48  30622.17  39368.78 <2e-16 ***
#PhD-Bachelor's          70568.55  65735.76  75401.34 <2e-16 ***
#PhD-Master's            35573.07  30289.62  40856.52 <2e-16 ***
#
#$gender
#                 diff    lwr.ci    upr.ci   pval    
#Female-Male -11218.79 -14865.24 -7572.353 <2e-16 ***
#
#$`education_level:gender`
#                                            diff     lwr.ci       upr.ci    pval    
#Bachelor's:Male-High School:Male       59591.017   48145.24   71036.7985 < 2e-16 ***
#Master's:Male-High School:Male        100679.841   88564.22  112795.4623 < 2e-16 ***
#PhD:Male-High School:Male             129330.320  117324.48  141336.1620 < 2e-16 ***
#High School:Female-High School:Male    -8624.704  -22998.23    5748.8221  0.6516    
#Bachelor's:Female-High School:Male     49784.468   38066.84   61502.0935 < 2e-16 ***
#Master's:Female-High School:Male       83313.791   71501.15   95126.4271 < 2e-16 ***
#PhD:Female-High School:Male           120885.541  108106.75  133664.3283 < 2e-16 ***
#Master's:Male-Bachelor's:Male          41088.824   34770.45   47407.2020 < 2e-16 ***
#PhD:Male-Bachelor's:Male               69739.303   63634.07   75844.5360 < 2e-16 ***
#High School:Female-Bachelor's:Male    -68215.721  -78202.30  -58229.1365 < 2e-16 ***
#Bachelor's:Female-Bachelor's:Male      -9806.549  -15323.47   -4289.6325 1.8e-07 ***
#Master's:Female-Bachelor's:Male        23722.774   18006.83   29438.7164 < 2e-16 ***
#PhD:Female-Bachelor's:Male             61294.524   53782.46   68806.5828 < 2e-16 ***
#PhD:Male-Master's:Male                 28650.479   21366.51   35934.4503 < 2e-16 ***
#High School:Female-Master's:Male     -109304.545 -120052.30  -98556.7899 < 2e-16 ***
#Bachelor's:Female-Master's:Male       -50895.373  -57693.81  -44096.9367 < 2e-16 ***
#Master's:Female-Master's:Male         -17366.050  -24326.97  -10405.1330 4.9e-16 ***
#PhD:Female-Master's:Male               20205.700   11707.85   28703.5466 2.1e-14 ***
#High School:Female-PhD:Male          -137955.023 -148578.87 -127331.1724 < 2e-16 ***
#Bachelor's:Female-PhD:Male            -79545.851  -86146.66  -72945.0403 < 2e-16 ***
#Master's:Female-PhD:Male              -46016.529  -52784.57  -39248.4906 < 2e-16 ***
#PhD:Female-PhD:Male                    -8444.779  -16785.36    -104.1929  0.0442 *  
#Bachelor's:Female-High School:Female   58409.172   48112.15   68706.1954 < 2e-16 ***
#Master's:Female-High School:Female     91938.494   81533.48  102343.5091 < 2e-16 ***
#PhD:Female-High School:Female         129510.245  118020.10  141000.3864 < 2e-16 ***
#Master's:Female-Bachelor's:Female      33529.323   27286.82   39771.8248 < 2e-16 ***
#PhD:Female-Bachelor's:Female           71101.073   63180.98   79021.1651 < 2e-16 ***
#PhD:Female-Master's:Female             37571.750   29511.76   45631.7438 < 2e-16 ***
#
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#

ANOVA plot

Lastly, and perhaps more illustrative, the graph of simple effects. To change the simple effect, we just need to alter the direction of the interaction within the parameter of the function.


CGPfunctions::Plot2WayANOVA(dataframe = data,
                            formula = salary ~ education_level*gender, # switch for different graph
                            errorbar.display = "SEM",
                            mean.label = T)

&ldquo;second plot&rdquo;
Additionally, the function easily provides us with the Bayesian analysis of the model and the assessment of the assumption of homoscedasticity.

o1$Bayesian_models
# 
#   model                                                    bf support                    margin_of_error
#   <chr>                                                 <dbl> <chr>                                <dbl>
# 1 education_level                                   Inf       " data support is extreme"        1.47e- 7
# 2 education_level + gender                          Inf       " data support is extreme"        1.28e- 2
# 3 education_level + gender + education_level:gender Inf       " data support is extreme"        4.45e- 2
# 4 gender                                              1.23e22 " data support is extreme"        2.05e-26
Adrián Muñoz García
Adrián Muñoz García
Data Scientist and Statistics Professor

My research interests are Behavioral Economics, Cognition and Methods