📖 Background

The principal of a large school is interested in knowing if the test preparation courses are helpful and also know the effect of parental education level on test scores.

💪Objectives

  1. What are the average reading scores for students with/without the test preparation course?
  2. What are the average scores for the different parental education levels?
  3. Create plots to visualize findings for questions 1 and 2.
  4. Look at the effects within subgroups. Compare the average scores for students with/without the test preparation course for different parental education levels (e.g., faceted plots).
  5. The principal wants to know if kids who perform well on one subject also score well on the others. Look at the correlations between scores.
  6. Summarize the findings.

💾 The data

The data has the following fields:

  • “gender” - male / female
  • “race/ethnicity” - one of 5 combinations of race/ethnicity
  • “parent_education_level” - highest education level of either parent
  • “lunch” - whether the student receives free/reduced or standard lunch
  • “test_prep_course” - whether the student took the test preparation course
  • “math” - exam score in math
  • “reading” - exam score in reading
  • “writing” - exam score in writing
library(tidyverse)

data <- read_csv("C:/Users/Adejumo/Downloads/exams.csv")

head(data)
## # A tibble: 6 x 8
##   gender `race/ethnicity` parent_education~ lunch test_prep_course  math reading
##   <chr>  <chr>            <chr>             <chr> <chr>            <dbl>   <dbl>
## 1 female group B          bachelor's degree stan~ none                72      72
## 2 female group C          some college      stan~ completed           69      90
## 3 female group B          master's degree   stan~ none                90      95
## 4 male   group A          associate's degr~ free~ none                47      57
## 5 male   group C          some college      stan~ none                76      78
## 6 female group B          associate's degr~ stan~ none                71      83
## # ... with 1 more variable: writing <dbl>
skimr::skim(data)
Table 1: Data summary
Name data
Number of rows 1000
Number of columns 8
_______________________
Column type frequency:
character 5
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
gender 0 1 4 6 0 2 0
race/ethnicity 0 1 7 7 0 5 0
parent_education_level 0 1 11 18 0 6 0
lunch 0 1 8 12 0 2 0
test_prep_course 0 1 4 9 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
math 0 1 66.09 15.16 0 57.00 66 77 100 ▁▁▅▇▃
reading 0 1 69.17 14.60 17 59.00 70 79 100 ▁▂▆▇▃
writing 0 1 68.05 15.20 10 57.75 69 79 100 ▁▂▅▇▃

Exploratory Data Analysis

Average reading scores for students with/without the test preparation course

Students who took the test preparation course

mean_reading_tpc <- data %>% 
  filter(test_prep_course == "completed") %>% 
  summarise(mean(reading)) %>% 
  as_vector()

data %>% 
  filter(test_prep_course == "completed") %>% 
  ggplot(aes(x = reading)) +
  geom_density(fill = "skyblue",
               alpha = 0.5) +
  geom_vline(xintercept = mean_reading_tpc, size = 0.5, color = "red") +
  annotate(x = mean_reading_tpc, y = +Inf, label = round(mean_reading_tpc, 2), vjust = 2, geom = "label") +
  xlab("Students exam scores") +
  ggtitle("Average scores of students who took the test preparation course")

Students who did not take the test preparation course

mean_reading <- data %>% 
  filter(test_prep_course == "none") %>% 
  summarise(mean(reading)) %>% 
  as_vector()

data %>% 
  filter(test_prep_course == "none") %>% 
  ggplot(aes(x = reading)) +
  geom_density(fill = "pink",
               alpha = 0.5) +
  geom_vline(xintercept = mean_reading, size = 0.5, color = "red") +
  annotate(x = mean_reading, y = +Inf, label = round(mean_reading, 2), vjust = 2, geom = "label")+
  xlab("Students exam scores") +
  ggtitle("Average scores of students who did not take the test preparation course")

The average score of students who took the test preparation course is higher with an average of 73.89 and the normal plot showing that majority of the students scored above average.

Average scores on the different parental educational levels

data %>% 
  group_by(parent_education_level) %>% 
  summarize(Mathematics = round(mean(math),1),
            Reading = round(mean(reading),1),
            Writing = round(mean(writing),1)) %>% 
  pivot_longer(cols = c( "Mathematics", "Reading", "Writing"), 
               names_to = c("subject"),
               values_to = "scores") %>% 
  ggplot(aes(subject, parent_education_level)) +
  geom_tile(aes(fill = scores), colour = "white") +
  scale_fill_gradient(low = "white", high = "steelblue")+
  geom_text(aes(label = scores)) +
  theme(legend.position = "none") +
  xlab("Average scores in each Subject") +
  ylab("Parent Level of Education")

Children of parents who have achieved a higher level of education recorded a higher test score.

Average scores for students with/without the test preparation course for different parental education level

Mathematics

data %>% 
  group_by(test_prep_course, parent_education_level) %>% 
  summarize(Mathematics = round(mean(math),1),
            Reading = round(mean(reading),1),
            Writing = round(mean(writing),1),
            .groups = ) %>% 
  ggplot(aes(test_prep_course, Mathematics, colour = test_prep_course)) +
  geom_boxplot() +
  facet_wrap(vars(parent_education_level)) +
  xlab("Test Peparation Course") +
  ylab("Mathematics Test Score") +
  ggtitle("Mathematics") +
  labs(colour = "Test Preparation Course")

#### Reading

data %>% 
  group_by(test_prep_course, parent_education_level) %>% 
  summarize(Mathematics = round(mean(math),1),
            Reading = round(mean(reading),1),
            Writing = round(mean(writing),1)) %>% 
  ggplot(aes(test_prep_course, Reading, colour = test_prep_course)) +
  geom_boxplot() +
  facet_wrap(vars(parent_education_level)) +
  xlab("Test Peparation Course") +
  ylab("Reading Test Score") +
  ggtitle("Reading") +
  labs(colour = "Test Preparation Course")

#### Writing

data %>% 
  group_by(test_prep_course, parent_education_level) %>% 
  summarize(Mathematics = round(mean(math),1),
            Reading = round(mean(reading),1),
            Writing = round(mean(writing),1)) %>% 
  ggplot(aes(test_prep_course, Writing, colour = test_prep_course)) +
  geom_boxplot() +
  facet_wrap(vars(parent_education_level)) +
  xlab("Test Peparation Course") +
  ylab("Writing Test Score") +
  ggtitle("Writing") +
  labs(colour = "Test Preparation Course")

Students of Parents who have attained high education level performed better than those with lower education level regardless whether they took the test preparation test or not.

Relationship between students test scores

data %>% 
  select(math, reading, writing) %>% 
  cor() %>% 
  corrplot::corrplot(method = "number")

There is a highly positive correlation between the test scores especially in reading and writing. Students who perform well in one of the subjects is likely to perform better in the rest.

Summary

From the above analysis, we can conclude that the test preparation course have a significant effect on student performance and also children of parents with higher educational qualifications regardless of taking the test preparation course or not displayed higher scores than those whose parents have lower educational qualifications.