Fascinating Findings from “On the Spatial Determinants of Educational Access”
The research paper "On the Spatial Determinants of Educational Access" by Francesco Agostinelli and Giacomo De Giorgi is an incredible read that investigates the impact of geographic factors on educational access. The study uses a unique dataset containing information on more than 5 million individuals in Italy, including their residence location, educational attainment, and school quality. The authors apply machine learning techniques to identify the most important factors that affect educational access, and they find that geographic factors play a significant role. Let’s take a look at a few…
This first equation is used to estimate the impact of school vouchers on educational access:
Pr(high-quality school) = F(β0 + β1Voucher + β2School Qualityn,t + λn + γt + εj
Their main finding is that voucher programs have a positive impact on school quality in Florida. Specifically, they find that voucher participation is associated with a statistically significant increase in the probability of a school being high-quality. They estimate that the probability of a school being high-quality increases by around 7 percentage points for schools that participate in the voucher program compared to schools that do not participate. Let’s consider the variables more closely:
Pr(high-quality school): This is the probability of a school being high-quality. The value of this probability ranges from 0 (meaning the school is very unlikely to be high-quality) to 1 (meaning the school is very likely to be high-quality).
F: This represents a function that maps the values of the other variables to the probability of a school being high-quality. The specific form of the function depends on the statistical model being used.
β0: This is the intercept term of the model, which represents the expected probability of a school being high-quality when all other variables are equal to 0.
β1: This is the coefficient associated with the Voucher variable. It represents the expected change in the probability of a school being high-quality for each unit increase in the value of the Voucher variable, holding all other variables constant.
β2: This is the coefficient associated with the School Quality variable. It represents the expected change in the probability of a school being high-quality for each unit increase in the value of the School Quality variable, holding all other variables constant.
n: This represents the school being considered in the model. It is a categorical variable with n levels, where each level represents a different school.
t: This represents the time period being considered in the model. It is a categorical variable with t levels, where each level represents a different time period.
λn: This is a set of school-specific random effects that account for unobserved factors that may influence a school's probability of being high-quality, but that are not included in the model.
γt: This is a set of time-specific random effects that account for unobserved factors that may influence the probability of a school being high-quality over time, but that are not included in the model.
εj: This is the residual term of the model, which represents the variability in the probability of a school being high-quality that is not explained by any of the other variables in the model.
This next equation is a statistical model used in the research paper to understand the relationship between educational access and geographical factors, while controlling for other factors such as grade and year fixed effects, as well as unobserved factors that might affect enrollment rates. The results showed that school quality has a significant positive effect on educational access, suggesting that improving the quality of schools could help address educational inequality in geographically disadvantaged areas.:
ln rj,n,t = β0 + β1 ln School Qualityn,t + λn + γt + εj,n,t ,
It represents the natural logarithm of the probability of a student from a particular area, j, enrolling in grade n of a school in year t. Like before, let’s understand the variables more closely:
ln rj,n,t is the dependent variable, representing the natural logarithm of the enrollment rate of students from area j in grade n of a school in year t.
ln School Qualityn,t is the independent variable of interest, representing the natural logarithm of the quality of schools in the relevant grade and year.
λn is a fixed effect for grade n, which controls for any systematic differences in enrollment rates across different grades.
γt is a fixed effect for year t, which controls for any systematic changes in enrollment rates over time.
εj,n,t is the error term, representing unobserved factors that affect the enrollment rate of students from area j in grade n of a school in year t.
One of the other key equations used in the paper is the regression model for the probability of attending a high-quality school:
Pr(high-quality school) = F(β0 + β1*School Qualityn,t + λn + γt + εj,n,t)
This equation models the probability of a student attending a high-quality school as a function of school quality, student and school fixed effects, and a random error term. The parameter estimates capture the effect of school quality on the probability of attending a high-quality school, controlling for differences in student and school characteristics.
Another equation used in the paper is the model for the geographic distribution of high-quality schools:
Qualityn,t = f(latitude, longitude, demographics, infrastructure)
This equation models the quality of schools as a function of geographic location, demographic characteristics, and infrastructure. The authors use machine learning techniques to estimate this equation, allowing them to identify the most important factors that affect school quality. They find that geographic location plays a significant role in determining school quality, even after controlling for other factors.
The paper also uses the following equation to estimate the impact of commuting time on educational access:
ln(rj,n,t) = β0 + β1*Commuting Timej,n,t + λn + γt + εj,n,t
This equation models the logarithm of the distance between a student's residence and the nearest high-quality school as a function of commuting time, student and school fixed effects, and a random error term. The parameter estimates capture the effect of commuting time on educational access, controlling for differences in student and school characteristics.
This is a very simplistic and brief capture of the many machine learning techniques used in studying the geography of educational access. Giving the paper a full and proper read help you better understand the findings more comprehensively and in context!