The Promise of Infant Health Investments
Exploring the relationship between healthcare, infant mortality, and academic achievement
Table of Contents
Disclaimer
Nothing in this article should be misconstrued as medical advice. Any health decisions about an infant related to you/someone you know should be made in consultation with a licensed medical provider.
Introduction
“Those nights were dark and long. Those days were grey and mean, And yet you were so strong. We watched your heart on screen, And feared the lights and beeps; The reds, the blues and greens. We prayed for God to keep; You safe as you were small; To watch you grow and sleep. Good things would soon befall. Against all odds, it’s true, That we would leave those halls. The worst of days are through. The darkest nights are done, I thank resilient you. Your life has just begun, O happy birthday, Son.”
- Robert Maziar
In 2022, 1.4 percent of infants born in the United States were designated as having “very-low birthweight” or VLBW (National Center for Health Statistics 2024). Though being born low birthweight does not always indicate an underlying health issue, VLBW infants are, on average, at a greater risk of a number of serious conditions, including but not limited to severe breathing problems, brain bleeds, and gastrointestinal issues (March of Dimes 2021). Because VLBW infants exhibit this elevated risk of health problems, hospitals often employ an array of interventions to ensure their survival through infancy:
Cranial ultrasounds allows the physician to identify intraventricular hemorrhages - a type of brain bleed that’s disproportionately present in low birthweight infants (Hand et al 2020).
Surfactant therapy reduces the incidence of respiratory distress syndrome by helping the infant breathe more easily (Polin et al 2014).
Kangaroo mother care consists of maintaining skin-to-skin contact and exclusive breastfeeding with the infant (Darmstadt 2023).
Increased familial support consists of providing greater educational and counseling support to the family members of the VLBW infant. Hospitals often schedule follow-up appointments to ensure that the child is meeting developmental milestones (Darmstadt 2023).
Admission to the Neonatal Intensive Care Unit (NICU) places the infant under round-the-clock supervision from neonatologists, nurses, and other health professionals. Any one of the aforementioned interventions may be administered in the NICU, depending on the infant’s condition.
Importantly, these interventions come at a substantial cost to the healthcare system: despite constituting roughly 8 percent of newborns, preterm and low birthweight infants constitute a whopping 47 percent of costs for infant hospitalizations in the United States (Almond et al 2010). Thus, one might reasonably ask: how do we know that these interventions work?
One way to answer this question is by leveraging an econometric method known as regression discontinuity (RD). Researchers employ the RD method in settings where an arbitrary threshold is used determine the probability of administering an intervention. In the context of this essay, we may observe that being below the VLBW threshold arbitrarily increases the amount of healthcare that an infant receives.
Now here’s where the magic happens. Because the threshold is arbitrary, the infants just above and just below the threshold are highly comparable to one another. Thus, we can estimate the impact of infant health investments by comparing the outcomes of these two groups of infants to one another. If the group of infants just below the threshold fare better than the group of infants just above the threshold, we can be reasonably confident that infant health investments are an effective intervention.
By the time of writing this article, four studies have employed the regression discontinuity method to estimate the impact of infant health investments on infant mortality and/or academic achievement. My aim in writing this article, then, is quite simple: review these studies in detail and determine to what degree increased infant health investments are capable of reducing mortality and producing lasting gains in academic achievement.
Methodological Challenges
Before we can take a deep dive into the studies, we need to take a moment to understand several issues that may threaten the validity of the regression discontinuity method.
Statistical Power
Statistical power refers to the ability of a study to register an effect as statistically significant if there is, in fact, a real effect to be detected. In the context of this essay, I’m primarily concerned with whether or not a study has sufficient power to detect the effects of infant health investments, assuming such effects actually exist.
The reason we ought to be concerned with power is because the regression discontinuity method can only be applied to the small sample of births which take place around the VLBW threshold. This issue is compounded by the fact that very preterm infants are typically recommended additional health interventions irrespective of birthweight and as a result, should be discarded from the sample to avoid downwardly biasing the treatment effect. Thus, many studies might find “no impact” of the VLBW threshold on mortality/academic achievement only because they lacked the power/sample size to actually detect the effects of infant health investments.
Manipulation Around The Threshold
The regression discontinuity method assumes that the VLBW threshold is arbitrarily imposed. Put another way, it assumes that infants just above and just below the VLBW threshold are highly comparable to one another. But this assumption is violated if birthweights are systematically misreported around the threshold:
Opportunistic hospitals might intentionally misreport births above the VLBW threshold as having occurred below the VLBW threshold to increase NICU utilization, thereby increasing hospital revenue (Shigeoka & Fushimi 2014). If such hospitals are disproportionately located in high-income communities, the method may estimate a positive impact of infant health investments where no such impact exists (i.e. the impact is just an artifact of wealthier babies being disproportionately reported just below the VLBW threshold).
Highly-educated parents might request physicians to misreport births above the VLBW threshold as having occurred below the VLBW threshold to provide their children with the benefits of additional infant health investments. The regression discontinuity method may then estimate a positive impact of such investments where no such impact exists (i.e. the impact is just an artifact of advantaged babies being disproportionately reported just below the VLBW threshold).
To address the issue of manipulation around the threshold, the regression discontinuity method should be accompanied by additional statistical tests which provide assurance that the infants just below and just above the VLBW threshold are truly comparable. Two of these robustness tests are described below:
Balance Tests
Balance tests assess whether the average value of various traits remains similar between individuals just below and just above the threshold in question. In this context, researchers might assess whether parental socioeconomic status (e.g. household income, educational attainment) and parental behaviors (e.g. smoking and alcohol use) differ between infants born on different sides of the VLBW threshold.
Though balance tests have an intuitive appeal, it is important to note that they can only rule out confounding caused by observable traits (i.e. traits which are contained in the dataset being analyzed by the researchers).
Twin Control
Researchers may assess manipulation around the threshold by restricting comparisons to twins born to the same mother, one of whom was born below the VLBW threshold and one of whom was born above VLBW threshold. The robustness of this strategy lies in its ability to hold unobservable traits constant:
The genetics of the twins will be similar due to sharing the same biological parents.
The behaviors of the mother should be similar when giving birth to both children (i.e. it is implausible to believe that the mother would request the physician to manipulate the listed birthweight of one child but not the other).
The obvious limitation of this method, however, is that it drastically reduces statistical power. Because it is rare to observe twins born on both sides of the VLBW threshold, the twin control method is unlikely to have the necessary statistical power to detect a positive impact of infant health investments.
Survivorship Bias
Survivorship bias occurs when individuals in the population of interest must satisfy a particular criterion to be included in the analysis, biasing the estimated impact of the intervention in question. In the context of this essay, we may note that:
Some infants born just below the VLBW threshold will live long enough to begin schooling as a direct result of infant health investments.
Some infants born just above the VLBW threshold will tragically die before beginning schooling as a direct result of not being provided additional infant health investments.
Because we only observe the academic achievement of those children who lived long enough to begin schooling, the estimated impact of infant health investments on academic achievement may be downwardly biased. Put another way, the regression discontinuity approach may:
Include the less-achievement-oriented infants below the VLBW threshold who survived due to additional infant health investments1
Exclude the less-achievement-oriented infants above the VLBW threshold who passed away due to not being provided with additional infant health investments
Thus, the regression discontinuity approach may underestimate the impact of infant health investments on academic achievement due to survivorship bias.
Non-Random Heaping
Heaping occurs when the distribution of a variable is highly concentrated at particular values in an unnatural manner. In this context, heaping occurs because hospitals often round birthweights to the nearest 10g, 50g, and/or 100g interval. If such heaping is correlated with demographic characteristics, we then have a case of non-random heaping.
A simple example can help illuminate the dangers that non-random heaping poses to the regression discontinuity method. Barreca et al (2015) note that:
Birthweights are commonly - and unnaturally - heaped at the 1500g value.
Birthweights at the 1500g value are disproportionately likely to belong to babies born to Black mothers
Under these conditions, the regression discontinuity method may estimate a positive impact of infant health investments not because these investments truly improved infant health but instead because infants just at the VLBW threshold disproportionately belong to a marginalized group.
Barreca et al (2015) consider several methods to address the challenge of non-random heaping:
Remove Heaped Data
This strategy invites researchers to simply remove all observations which correspond to heaps (i.e. babies born at 10g/50g intervals). Though this strategy will recover an unbiased estimate of the impact of infant health investments, it has two limitations:
The results will only be generalizable to the types of babies born at non-heaped values. In the example discussed earlier, we would be less confident that the intervention generalizes to infants born to Black mothers.
Removing all observations at heaped values lowers the sample size which, in turn, reduces statistical power.
Donut Regression Discontinuity
This strategy invites researchers to exclude observations within a set radius of the VLBW threshold. By excluding the small set of observations which lie immediately at/next to the threshold, the donut regression discontinuity design will be uncontaminated by heaping bias which occurs exactly at the 1500g threshold.
This methodology does, however, have an important weakness: namely, it undercuts the principal motivation behind employing a regression discontinuity design. The elegance of the regression discontinuity design lies in its plausible assumption that individuals just below and just above a given threshold are comparable to one another. By excluding the observations which ought to be most comparable to one another, the donut method eliminates bias induced by non-random heaping only to introduce the very bias which the original regression discontinuity method aimed to eliminate at the outset (Noack and Rothe 2023). Thus, it is not readily obvious how to interpret the output of the donut method. A null result might indicate problematic non-random heaping or it could indicate bias created by the method itself.
Placebo Tests
This strategy invites researchers to conduct placebo tests: statistical tests which are conducted with the explicit intention to not detect an effect. In this context, such tests may take a variety of forms:
Researchers can re-estimate the regression discontinuity model at 100g intervals other than the VLBW threshold. Because there are no differences in health investments at non-VLBW thresholds, we should not expect the regression discontinuity model to estimate a positive impact at these thresholds. If the regression discontinuity model does estimate a positive impact at non-VLBW thresholds, we would have reason to suspect that something is contaminating the treatment impacts being estimated by the model (e.g. non-random heaping).
Researchers can re-estimate the regression discontinuity model among babies who were born very preterm (i.e. < 32 weeks). Because very preterm births are recommended additional health interventions irrespective of birthweight, we should not expect the regression discontinuity model to estimate a positive impact at the VLBW threshold among this sample of infants. If the model does estimate a positive impact at the threshold, we would have reason to suspect that something is contaminating the treatment impacts being estimated by the model (e.g. non-random heaping).
Indicator Variables
This strategy invites researchers to include variables which indicate whether a given birthweight falls directly on a 10g/50g interval. Including these variables allows the regression discontinuity method to recover an unbiased estimate of the impact of infant health investments under the assumption (key word: “assumption”) that the average outcomes between infants born at heaped vs non-heaped birthweights only differ by a constant amount.
Hospital Control
This strategy invites researchers to include variables that control for the hospital that each infant was born at. The underlying assumption (key word: “assumption”) is that, while heaping practices might differ between hospitals (perhaps due to hospitals using birthweight scales with differing degrees of precision), heaping practices are employed consistently within each hospital. Thus, comparisons of infants born at the same hospital should be purged of bias due to non-random heaping.
In practice, researchers might employ multiple of the above strategies to provide greater confidence that their estimated treatment impacts are not compromised by non-random heaping.
External Validity
The regression discontinuity method estimates a local average treatment effect. In plain language, the method only estimates the impact of infant health investments at the VLBW threshold. Thus, we cannot assume that additional infant health investments will universally improve the outcomes of all infants merely because they do so among VLBW infants who constitute an at-risk patient population.
Environmental Amplification
The regression discontinuity method estimates the impact of being just below the VLBW threshold relative to being just above the VLBW threshold. Though this method will capture the impact of infant health investments (because such investments are recommended below the threshold), it might also capture the ways in which the impact of these investments is amplified by one’s post-natal environment. We can consider a hypothetical example:
Arial is a mother of twin girls: Cambria and Calibri:
Cambria was born 1450g (i.e. a VLBW infant) and received additional health interventions.
Calibri was born 1550g (i.e. a non-VLBW infant) and not eligible to receive additional health interventions.
Because only Cambria received additional health interventions, she experienced more rapid development than Calibri. This divergence in development led Arial to unwittingly provide greater attention to Cambria than Calibri. She became more likely to read books to Cambria and provide her with educational toys. Thus, by the time both girls began school, Cambria was well-ahead of Calibri in academic achievement.
The hypothetical demonstrates that infant health investments may have both direct and indirect impacts on academic achievement. To assess the possibility of indirect impacts, one can estimate the effect of being just below the VLBW threshold on the post-natal environment experienced by the child (e.g. school quality, parental time investments). A large effect on the post-natal environment would be suggestive of an indirect impact on academic achievement.
Key Takeaway
There are a number of limitations to the regression discontinuity method that must be considered before taking its results at face value.
Infant Mortality & Academic Achievement
Almond et al (2010)
This study applies the regression discontinuity method to two samples:
A nation-wide sample of ~200k infants born in the United States during 1983-2002.
A five-state sample of ~30k infants born in California, New Jersey, Arizona, Maryland, and New York during 1991-2006.
The authors begin by assessing how being born below the VLBW threshold impacts infant health investments.2 This assessment is limited to the five-state sample because the authors were only able to link these states to hospital administrative data containing information on specific treatments and hospital discharge costs.

Using the five-state sample, the authors estimate that being just below the VLBW threshold:
Increases the length of stay in the hospital by 2 (± 0.45) days relative to a mean of ~25 days
Increases the hospital discharge cost - a measure of the cost of the care provided to the infant - by $9.5k (± 2.7k) relative to a mean of ~$82k.
These estimates provide suggestive evidence that hospitals increase the level of care provided to infants as they fall just below the VLBW threshold.
The authors then estimate the impact of being below the VLBW threshold on infant mortality using the nation-wide sample.

These results indicate that being born below the VLBW threshold reduces infant mortality around 0.67 - 1.21 percentage points relative to a mean mortality rate of 5.5 percent. That’s quite a large reduction in mortality! To understand what specific health interventions drive this reduction, the authors examine how the probability of administering specific treatments changes at the VLBW threshold.

Fascinatingly, they discover no difference in ventilation usage and noisily estimated differences in additional treatments - NICU admission, cranial ultrasound, heart operations - between infants who lie on different sides of the VLBW threshold. Because the change in specific treatments is noisily estimated, the authors remain agnostic about what types of interventions are primarily responsible for the reduction in mortality:
In the end, differences in our summary measures are consistent with medical care driving the mortality results, but we likely lack the statistical power to detect differences in particular procedures in our five-state sample.
To alleviate concern about the methodological challenges described earlier, the authors conduct a number of robustness tests:
Balance Tests
The authors estimate the impact of being just below the VLBW threshold on a number of control variables (e.g. maternal educational attainment) which should be similar across the threshold. They claim (key word: “claim”) to find no evidence that control variables spike at the threshold, providing greater confidence that infants just above and just below the threshold are born into similar environments.

Cause-of-Death Analysis
The authors re-estimate the impact of being just below the VLBW threshold on 1-year mortality separately by cause-of-death. Consistent with expectations, no effects are observed on deaths due to external causes (e.g. accidents) whereas effects are largest (though noisily estimated) on mortality due to perinatal conditions.

Though the robustness tests discussed above provide some assurance that the improvements to mortality and achievement are not entirely spurious, a skeptical reader might issue the following critique:
Plotting the distribution of birthweight reveals two heaps: one at 1500g and one at 1503g. The 1500g heap arises due to rounding at the 100g level while the 1503g heap arises due to rounding at the 1 oz level (i.e. 1503g is equal to 3 lbs and 5 oz).
Importantly, Barreca et al (2015) show that this heaping is non-random: Black infants are disproportionately recorded as being born at 100g intervals.
When Barreca et al (2015) remove these heaped data points, the impact of infant health investments on mortality reduces to effectively zero.
To add insult to injury, the study does not pass basic placebo tests. When Barreca et al (2015) reapply the regression discontinuity method to placebo birthweight thresholds (i.e. thresholds other than the VLBW threshold), one trend immediately stands out.
The placebo estimates are consistently negative. This is very odd because, if infants just above and just below the placebo thresholds are truly comparable and there is no difference in treatment around the placebo threshold, the placebo estimates should be randomly distributed around zero (i.e. there should be both positive and negative placebo estimates, not just negative estimates).
The critique made by the skeptic is quite sound: non-random heaping clearly undermines the results presented in the study. Interestingly, however, Almond et al (2011) provide a counter-argument. They point out that, while the estimated mortality reduction loses statistical significance in the full sample of hospitals upon adjusting for non-random heaping, the same is not true in the sample of lower-quality hospitals.3

Normally, I would immediately regard such a counter-argument as suspect. I would probably write something like:
This argument is a quintessential example of a questionable research practice. A researcher sees that their study turned out to be a house of cards, so they fish around for a subgroup where the main effect retains significance. When they finally find one, they return to the public and triumphantly declare, “Aha! But the results are real in this group. My result is completely robust.” This type of counter-argument is, of course, nonsense on stilts.
The reason why I am not writing such a blistering critique, however, is because the subgroup analysis highlighted in Almond et al (2011) was already touted prominently in Almond et al (2010). For example, Almond et al (2010) states:
Perhaps unsurprisingly, regression estimates … do not give statistically significant estimates for our one-year mortality outcome, with the exception of Level 0/1/2 NICU hospitals—for which we estimate a negative, statistically significant coefficient.
Thus, the allegation that the study authors merely fished around for a desirable subgroup in response to criticism is flatly untrue.4
The better response to this counter-argument, in my opinion, is to point out that the result may still be statistical noise even if it does pass the threshold of statistical significance. For example, Barreca et al (2015) point out the some of the regression discontinuity estimates at placebo thresholds (i.e. thresholds where there should be no differences in medical care) yield large, statistically significant reductions in mortality.

Thus, one should not necessarily assume that a result which achieves statistical significance translates to a real reduction in infant mortality.
One final issue worth highlighting with the study is that it may be under-estimating the impact of additional infant health investments. Importantly, the authors do not mention removing very preterm infants from the sample.5 If they in fact did not exclude these infants from the sample, the effect size would be downwardly biased because the study would include comparisons between very preterm infants born around the VLBW threshold, both of whom received additional medical care.
To wrap things up then, the present study does not provide sufficient evidence that additional infant health investments below the VLBW threshold reduce 1-year infant mortality because it is compromised by non-random heaping.
Chyn, Gold, and Hastings (2021)
This study applies the regression discontinuity method to a sample of ~2k infants born in Rhode Island during 2002-2015. To deal with the issue of non-random heaping, the authors exclusively utilize the donut regression discontinuity method, excluding all infants born within 3g of the VLBW threshold.
The authors begin by assessing how being born below the VLBW threshold impacts infant health investments. They estimate that being born just below the VLBW threshold increases length of stay in the NICU by 3.4 (± 1.5) days relative to a mean of 9.9 days.

Having presented evidence that being below the VLBW threshold plausibly increases infant health investments, they then estimate the impact of the VLBW threshold on academic achievement as measured by standardized tests taken during third, fifth, and eighth grade. The results are shown below:

These results indicate that being born below the VLBW threshold improves academic achievement in third-grade by 0.44 (± 0.16) standard deviations. To put this number into context, we should note that this effect is just under half the size of the Black-White achievement gap in Rhode Island during the same timeframe.

The authors argue that the improvements to third-grade achievement are not susceptible to the methodological challenges described earlier by conducting a number of tests:
Balance Tests
The authors estimate the impact of being just below the VLBW threshold on a number of control variables (e.g. maternal educational attainment) which should be similar across the threshold. They find no evidence that control variables spike at the threshold, providing greater confidence that infants just above and just below the threshold are born into similar environments.

Placebo Tests
The authors apply the regression discontinuity method to birthweight thresholds other than the VLBW threshold. Consistent with expectations, the model only estimates a large significant impact in the expected direction at the 1500g VLBW threshold.

Though the robustness tests discussed above provide some assurance that the improvements to third-grade achievement are not entirely spurious, a skeptical reader might issue the following critique:
The impact of the intervention attenuates to 0.31 (± 0.17) standard deviations by eighth-grade, a result which is not significant at the conventional α=0.05 level. To add insult to injury, a visual representation of the results tells a much less rosy story.
Any sane person looking at this visualization would not leave with the takeaway that academic achievement “spikes” at the VLBW threshold. This result is most likely statistical noise, and it’s indefensible to argue that infant health investments increase achievement on this basis.
There are four reasons why the skeptic’s critique is over-stated:
Fertility: The authors estimate that parents with a child born just below the VLBW threshold are 6.4 (± 2.8) percentage points more likely to have an additional child relative to parents with a child born just above the VLBW threshold. This difference would naturally attenuate the impact of infant health investments below the VLBW threshold. Why? Because these infants would be more likely to have a sibling born in subsequent years, dividing the parental attention that would have otherwise been concentrated solely on them. Thus, the positive impacts of infant health investments would be attenuated by the negative impacts of less parental attention.
No Covariate Adjustment: The authors do not explicitly adjust for any covariates in their regression discontinuity model. In laymen’s terms, the authors do not include control variables like maternal educational attainment in their regression discontinuity models. I imagine that an astute reader will be puzzled by this statement. They might say:
But why would you need to include these variables? We know based on the balance tests that these variables are roughly equal across the cutoff.
The primary reason is to increase the precision of the estimates. By partitioning out the variation in academic achievement driven by covariates, we can increase the statistical power of the study (Cattaneo, Keele, and Titiunik 2022).
Very Preterm Births: Many medical interventions that are recommended below the VLBW threshold are also recommended to very preterm infants irrespective of birthweight. Importantly, the authors do not mention removing very preterm infants from the sample.6 If they in fact did not exclude these infants from the sample, the effect size would be downwardly biased because the study would include comparisons between very preterm infants born around the VLBW threshold, both of whom received additional medical care.
College Enrollment: The authors estimate that being born just below the VLBW threshold increases the probability of college enrollment by 20 (± 7.7) percentage points. Unlike the achievement results, these results are statistically significant at the α=0.05 level and yield more compelling visual inferences.
This magnitude of improvement in college enrollment makes it more plausible to think that additional infant health investments below the VLBW threshold truly do improve academic achievement in the long-run.
Nonetheless, I agree with the skeptic’s overarching point that we cannot solely rely on this study to conclude that infant health investments produce large lasting gains in academic achievement. The eighth-grade achievement results in particular are too noisy to yield a definitive conclusion.
Daysal et al (2022)
This study applies the regression discontinuity method to a sample of ~2k infants born in Denmark during 1982-1993. The authors are unfortunately unable to empirically assess whether the magnitude of health interventions increases just at the VLBW threshold because they lack data on specific treatments administered to infants. They do, however, cite a number of sources describing the importance that Danish health authorities place on the VLBW threshold:
Danish neonatal medicine textbooks pay particular attention to VLBW children (those weighing less than 1,500 grams, regardless of gestational age) and very premature newborns (those with a gestational age less than 32 weeks, regardless of birth weight) … papers [also] indicate that children below 1,500 grams or born before 32 weeks of gestation are more likely to receive additional treatments such as cranial ultrasound (Greisen et al., 1986), antibiotics (Topp, Uldall, & Greisen, 2001), prophylactic treatment with nasal continuous positive airway pressure, prophylactic surfactant treatment and high priority of breast feeding, and use of the kangaroo method (Jacobsen et al., 1993; Verder et al., 1994; Verder, 2007; Mathiasen et al., 2008).
Armed with this knowledge, the authors estimate the impact of being below the VLBW threshold on indices of infant mortality and ninth-grade academic achievement. The results are shown below:

These results indicate that additional infant health investments reduce mortality in the short-run by 1.01 (± 0.408) standard deviations and increase academic achievement by 0.564 (± 0.234) standard deviations.7 To put the achievement result in context, the authors note:
Among all children born during the period covered by our sibling sample, the difference in language (math) scores between the children of nonimmigrants and immigrants is 0.264 (0.404) standard deviation … We also calculate that the difference in language (math) test scores among those born in households above the 90th income percentile and those born in households below the 10th income percentile is 0.557 (0.769) standard deviation.
The authors then extend this analysis by estimating the impact of being below the VLBW threshold on the siblings of the child in question. In particular, they compare ninth-grade academic achievement between the sibling of a child born just below the VLBW threshold and the sibling of a child born just above the VLBW threshold.
There are at least two reasons why we might expect additional health investment below the VLBW threshold to impact siblings:
Parental Investment: If infants just below the VLBW threshold become healthier over time as a result of additional health investments, parents will be able to devote more attention to the siblings of these infants.
Infection Transmission: If infants just below the VLBW threshold become healthier over time as a result of additional health investments, they will be less likely to transmit infectious diseases to their siblings which could, in turn, have positive impacts on academic achievement.
The results of this extended analysis indicate that additional infant health investments improve the academic achievement of siblings by 0.524 (± 0.193) standard deviations.
The authors argue that the improvements to ninth-grade achievement are not susceptible to the methodological challenges described earlier by conducting a number of tests:
Balance Tests
The authors estimate the impact of being just below the VLBW threshold on a number of control variables (e.g. maternal educational attainment) which should be similar across the threshold. They find no evidence that control variables spike at the threshold, providing greater confidence that infants just above and just below the threshold are born into similar environments.

Placebo Tests
The authors apply the regression discontinuity method to birthweight thresholds other than the VLBW threshold. Consistent with expectations, the model only estimates a large significant impact in the expected direction at the 1500g VLBW threshold.

They also apply the regression discontinuity method to the sample of very preterm births. Because very preterm infants are offered additional healthcare irrespective of birthweight, we should not expect to see a significant effect size in this sample. Consistent with expectations, the model does not estimate a significant impact of the VLBW threshold on mortality and academic achievement.


Additional Heaping Tests
The authors employ two strategies to deal with the issue of non-random heaping:
They use a donut regression discontinuity, excluding infants born exactly at 1500g.
They include indicator variables at 50g heaps.
The results are robust to both of these strategies.


Though the robustness tests discussed above provide some assurance that the improvements to mortality and achievement are not entirely spurious, a skeptical reader might issue the following critique:
The study estimates the impact on the VLBW threshold on 22 outcomes: 6 child outcomes + 4 sibling outcomes + 4 maternal outcomes + 4 paternal outcomes. We should thus expect 1 statistically significant finding purely by chance. Though the authors employ multiple testing corrections to deal with this issue, these corrections render every single improvement in mortality and achievement statistically insignificant at the α=0.05 level.
Additionally, a visual representation of the results once again tells a much less rosy story than the individual statistics.
Note that one cannot excuse these disappointing results by proclaiming “but what about college enrollment!” because the study explicitly finds that additional health investments below the VLBW threshold do not significantly increase enrollment in the Danish higher education system.
Lastly, there is a bigger picture question about whether infant health investments actually increase below the VLBW threshold. The authors claim that such investments increase by citing evidence from medical authorities, but they never actually show this to be the case. That seems like a glaring problem if you want to know which specific infant health investments are driving the observed effects!
Thus, this study cannot be used to prove that infant health investments improve outcome of infants born below the VLBW threshold.
There are three reasons why the skeptic’s critique is over-stated:
Observations Near The Threshold: In the visual representation of the results, the differences between the below-threshold and above-threshold groups are largest right around the threshold, consistent with these two groups being the most comparable to one another.
Sibling Spillover Visualization: The visual representation of the sibling achievement results look quite compelling.
Covariate Adjustment: When the authors include a basic set of covariates in the model (e.g. maternal educational attainment), the p-values corresponding to the achievement outcome become highly significant, consistent with covariate adjustment increasing statistical power (Cattaneo, Keele, and Titiunik 2021). These p-values would likely survive multiple testing corrections.
p-values associated with regression discontinuity estimates on mortality and ninth-grade achievement. Unadjusted p-values were computed manually based on the z-statistics derived from bias-corrected estimates and robust standard errors. Attribution: Table 2, Table 3, and Table A4 of Daysal et al (2022).
Nonetheless, I agree with the skeptic that it would have been nice to know which specific treatments actually increased below the VLBW threshold. Acquiring this knowledge would help clarify which specific treatments mediate the relationship between infant health investments and reduce mortality/increased achievement.
Bhardwaj, Løken, and Neilson (2013)
This study applies the regression discontinuity method to two samples of infants:
~5k infants born in Chile during 1992-2007
~1.5k infants born in Norway during 1986-1993
To deal with the issue of non-random heaping, they include indicator variables at 100g birthweight heaps. The authors begin by estimating the impact of being below the VLBW threshold on the length of stay in the hospital and the probability of admission to the NICU in Chile and Norway, respectively.8 The results are shown below:

Now those are some beautiful regression discontinuity plots! These results indicate that being born just below the VLBW threshold increases length of stay in the hospital by 4 (± 1.6) days and the probability of NICU admission by 14.3 (± 5.2) percentage points in Chile and Norway, respectively.
Armed with this knowledge, the authors estimate the impact of being below the VLBW threshold on infant mortality and academic achievement. Importantly, academic achievement is measured differently between the two countries:
In Chile, academic achievement is measured in one of two ways: (1) grades in math and language courses, averaged across first through eighth-grade (2) fourth-grade national standardized math and language tests.
In Norway, academic achievement is measured in one of two ways: (1) tenth-grade grades (2) tenth-grade national standardized tests which may take place in one of three subjects: math, Norwegian, or English. The test subject is chosen by each school.
The impacts on infant mortality are shown below:

These results indicate that additional health investments below the VLBW threshold reduce the probability of infant mortality by 4.5 (± 1.8) percentage points and 3.1 (± 1.3) percentage points in Chile and Norway, respectively. These reductions are quite large when one considers that average infant mortality was 10.9 and 3.6 percentage points in the Chilean and Norwegian samples, respectively.
The impacts on academic achievement are shown below:

These results indicate that additional health investments below the VLBW threshold improve math grades by 0.152 (± 0.583) standard deviations and national math standardized test scores by 0.228 (± 0.087) standard deviations in Chile and Norway, respectively.
The authors argue that the improvements to math achievement are not susceptible to the methodological challenges described earlier by conducting a number of tests:
Balance Tests
The authors estimate the impact of being just below the VLBW threshold on a number of control variables (e.g. maternal educational attainment) which should be similar across the threshold. They find no evidence that control variables spike at the threshold, providing greater confidence that infants just above and just below the threshold are born into similar environments.

Placebo Tests
The authors apply the regression discontinuity method to birthweight thresholds other than the VLBW threshold. Consistent with expectations, the model only estimates a large significant impact in the expected direction at the 1500g VLBW threshold.


They also apply the regression discontinuity method to the sample of very preterm births. Because very preterm infants are offered additional healthcare irrespective of birthweight, we should not expect to see a significant effect size in this sample. Consistent with expectations, the model does not estimate a significant impact of the VLBW threshold on mortality and academic achievement.


Twin Control
The authors restrict comparisons to twins who are born on different sides of the VLBW threshold. Because this restriction substantially reduces the size of the sample, the authors are unfortunately only able to look at the impact of health investments on the mortality outcome.

The authors ultimately find that restricting comparisons to twins does not substantially attenuate the effect of infant health investments, providing strong evidence that reduction in mortality is not an artifact of manipulating reported birthweight.
Additional Heaping Tests
The authors employ three strategies to deal with the issue of non-random heaping:
The authors employ a donut regression discontinuity, excluding infants born exactly at 1500g.
They exclude data at 10g, 50g, and 100g heaps, yielding an unbiased estimate of the impact of infant health investments among infants born at non-heaped birthweights.
They control for the hospital at which each infant was born.
The results are generally robust to these strategies.


Though the robustness tests discussed above provide some assurance that the improvements to mortality and achievement are not entirely spurious, a skeptical reader might issue the following critique:
Though the achievement improvements in Norway seem genuine, the achievement “improvement” in Chile is quite underwhelming. This becomes especially clear when the authors examine the impact of being below the VLBW threshold on national standardized test scores as opposed to classroom-level grades in Chile:
The increase in math achievement just below the VLBW threshold is not statistically significant, and average language achievement is lower just below the VLBW threshold relative to just above the threshold.
I agree that the impact on Chilean national standardized tests looks to be minimal. The authors do, however, note the following about the Chilean test score data:
While providing rich data on student characteristics, the amount of observations with SIMCE scores in the VLBW range is limited both because it was administered in years that cover about half the births between 1992 and 2002 and because of overall lower match rates due to missing or corrupted IDs in the SIMCE data.
Although I suspect that this issue would not downwardly bias the impact of being below the VLBW threshold on academic achievement, it’s worth keeping it in mind when interpreting the test score results.
Mechanisms
Thus far, this essay has taken the approach of examining the impact of infant health investments purely through a statistical lens:
“Do the outcomes statistically significantly differ between infants just above and just below the VLBW threshold”
“Is the result contaminated by non-random heaping?”
“Do the regression discontinuity plots look visually compelling?”
While this approach has its merits, rigorous scientific analysis shouldn’t be limited to just bickering over statistical significance and effect sizes. It should also involve a discussion of the plausibility of the mechanisms underlying the phenomenon in question … so let’s have that discussion!
Improved Respiratory Functioning
One way that infant health investments could reduce mortality and increase academic achievement is by improving respiratory functioning. There are two independent lines of evidence that support this view: (1) the efficacy of surfactant therapy (2) the negative effects of air pollution.
Surfactant Therapy
Surfactant therapy is a health intervention that facilitates the ability of the infant to breathe. In Polin (2014), the American Academy of Pediatrics conducted a narrative review of RCTs which administered surfactant therapy to at-risk infants. They concluded:
Surfactant replacement, given as prophylaxis or rescue treatment, reduces the incidence of RDS, air leaks, and mortality in preterm infants with RDS [respiratory distress syndrome].
Bhardwaj, Løken, and Neilson (2013) provide suggestive evidence that the benefits of surfactant therapy extend to academic achievement as well. They conduct a subgroup analysis in which they separately analyze the impact of infant health investments prior to and following the introduction of surfactant therapy. The results of this subgroup analysis are shown below:

As the table shows, the impact on academic achievement is particularly large in the period following the introduction of surfactant therapy. Importantly, however, this subgroup analysis does not definitively prove that surfactant therapy mediates the relationship between infant health investments and academic achievement because (1) it’s entirely possible that other medical advances were made during this same time period (2) an interaction term is needed to formally support this conclusion. Harrell (2024) explains this latter issue in more detail.
Air Pollution
There is a large body of literature demonstrating that air pollution increases mortality and reduces academic achievement. I want to note at the outset, however, that this literature exhibits publication bias, leading to implausibly large estimates of the pollution—> mortality relationship (Bagilet 2024).9 Thus, in each of the studies that follow, I only describe the sign of the estimated relationship as opposed to the magnitude of the estimated relationship (which may be upwardly biased).
Deryugina et al (2019) leverage day-to-day changes in wind direction to measure the impact of air pollution on mortality among the elderly. To understand their methodology, note that changes in wind direction impact the amount of air pollution that is transported from one area of the country to another. For example, Boston experiences greater air pollution on days when the wind blows from the southwest because New York City is located southwest of Boston.

This example can be applied more generally: namely, one can estimate the causal impact of air pollution by analyzing whether mortality in a given city increases on days when wind direction leads to excessive amounts of pollution, controlling for changes in other weather-related variables (e.g. changes in temperature and precipitation). Using this approach, the study concludes that increases in air pollution meaningfully increase mortality among the elderly.
Persico and Johnson (2021) analyze the impact of an Environmental Protection Agency (EPA) order which temporarily halted pollution compliance measures during the COVID-19 pandemic. They begin by constructing a set of control and treatment counties:
Control Counties: Counties that had 1-5 Toxic Release Inventory (TRI) sites at the time of the EPA order.
Treatment Counties: Counties that had 6+ Toxic Release Inventory (TRI) sites at the time of the EPA order.
They then estimate how the difference in weekly COVID-19 death rates between the treatment and control counties at 1-week intervals changes following the introduction of the EPA order.

They ultimately find that the difference in weekly COVID-19 deaths between treatment and control counties increases following the EPA’s rollback of compliance measures. Thus, the study provides evidence that the EPA order led to increased COVID-19 deaths most likely via increased levels of air pollution.
Ebenstein, Lavy, and Roth (2016) analyze how exposure to PM2.5 (a type of air pollutant) impacts performance on multi-day, high-stakes examinations in Israel. They observe that the same student’s performance is markedly worse on exam days with high levels of PM2.5 relative to exam days with low PM2.5.

Heissel, Persico, and Simon (2019) estimate how academic achievement changes between two groups of students living in the same zip code:
Students who move to a middle/high school that is downwind of a highway (i.e. a school that is more exposed to air pollution).
Students who move to a middle/high school that is not downwind of a highway (i.e. a school that is less exposed to air pollution).

They estimate that students attending downwind-schools experience less improvement in academic achievement relative to their peers attending non-downwind schools.

Lastly, Biasi, LaFortune, and Schönholzer (2024) use a highly sophisticated application of the regression discontinuity method to examine how spending on school HVAC systems impacts student achievement. In essence, they compare academic achievement between school districts with the same history of capital spending, one of which just barely approves a bond to increase HVAC spending and one of which just barely rejects a bond to increase HVAC spending. They estimate that the approval of a bond to increase HVAC spending improves district-level academic achievement in the short-run.

Though the above lines of evidence are compelling, there also exists evidence which is inconsistent with the respiratory functioning hypothesis.
Daysal et al (2024) note that younger siblings are at an elevated risk of respiratory disease during their first year in life because they are exposed to infectious diseases that are brought home by older siblings.

This elevated respiratory disease risk provides a natural way to assess the degree to which respiratory functioning impacts academic achievement. Namely, one can estimate how the younger-older sibling achievement gap varies between areas with high respiratory disease risk and areas with low respiratory disease risk. If early-life respiratory functioning truly impacts achievement, we should expect to see larger younger-older sibling achievement gaps in areas with high respiratory disease risk.

Employing this strategy yields little evidence that respiratory disease risk impacts achievement. The authors state:
We find that an additional respiratory hospitalization in the municipality per 100 children aged 13–71 months reduces the 9th grade Danish and math test scores by about 0.008 and 0.003 of a standard deviation more for younger siblings than older siblings, respectively. But while the effect on the Danish test score is marginally significant at the 10% level, the coefficient for the math test score is not statistically significant at conventional levels.
Thus, it remains unclear whether improved respiratory functioning mediates the relationship between infant health investments and later-life academic achievement.
Admission to the NICU
Both Chyn, Hastings, and Gold (2021) and Bharadwaj, Løken, and Neilson (2013) presented evidence that being born below the VLBW threshold increases the amount of time that infants spend in the NICU. This additional time spent in the NICU could reduce mortality by placing the infant under round-the-clock supervision from a rotating panel of medical professionals.
Hajdu et al (2024) assess this possibility by leveraging the expansion of NICUs and Newborn Emergency Transport Systems (NETS) in Hungary during 1990-2015. In particular, they compare the infant mortality of children born to mothers who lived the same distance away from a given city, one of whom gave birth prior to establishment of a NICU in the city’s hospital and the other of whom gave birth following the establishment of a NICU in the city’s hospital.

The study estimates that being born in a city that contains a NICU reduces mortality among VLBW infants by 144 (± 42) births per 1,000 thousand live births. Thus, the study provides evidence that NICU utilization at least partly mediates the relationship between infant health investments and reduced mortality. Future research could use a similar strategy to assess whether the same is true of the relationship between infant health investments and academic achievement.
Identifying Brain Bleeds
Another way infant health investments could reduce mortality and increase academic achievement is by rapidly identifying intraventricular hemorrhages (IVH) via cranial ultrasound. Before proceeding to studies that examine IVH, three facts about IVH are worth noting:
Roughly 20-25% of VLBW infants are born with IVH (Zhou et al 2024).
IVH is graded on a scale of 1-4 with 1 being considered a mild case - bleeding is limited to a small fraction of the ventricles - and 4 being considered lethal - bleeding has spread to brain tissues surrounding the ventricles (Boston Children’s Hospital).
There is no known treatment for IVH. Though doctors may be able to mitigate additional brain injury by preventing IVH from worsening, they cannot undo the damage already caused by the condition upon initial detection (El-Atawi et al 2016). This is why early detection via cranial ultrasound is critical.
The available evidence indicates that infants with IVH have elevated mortality risk and lower IQ in childhood:
Treluyer et al (2023) leverage a French nationally representative prospective sample of infants to estimate mortality rates by IVH severity. Consistent with expectations, mortality rises with the level of IVH severity.
Zhou et al (2024) conduct a meta-analysis of studies which estimate IQ differences between infants with varying IVH severity. They estimate that (1) infants with grade 1-2 IVH score 0.35 sd lower on IQ tests relative to infants without IVH (2) infants with grade 3-4 IVH score 0.57 sd lower on IQ tests relative to infants with grade 1-2 IVH. Note, however, that only the latter result is statistically significant.
Thus, the above studies provide evidence that rapid identification of brain bleeds may mediate the relationship between infant health investments and mortality/achievement.
Environmental Amplification
Infant health investments might increase academic achievement indirectly via the environmental amplification mechanism discussed earlier. To recap, environmental amplification occurs when infant health investments place parents in a better position to raise their child, creating a virtuous cycle that extends well into adolescence.
Three of the four studies discussed assess the possibility of environmental amplification mediating the impact of infant health investments on academic achievement. Each study uses roughly the same methodology: re-estimate the regression discontinuity model using various measures of the post-natal environment as the outcome variables (e.g. school quality, parental mental health). The results of these analyses are shown below:

Interestingly, none of the studies estimate statistically significant impacts on measures of the post-natal environment. This absence of significance is consistent with either a lack of statistical power or a lack of environmental amplification. Thus, the existing body of literature remains agnostic to the truth of the environmental amplification hypothesis.
Conclusion
Having reviewed both the regression discontinuity literature and the mechanistic literature, I hold the following conclusions about additional infant health investments below the VLBW threshold:
Additional infant health investments likely reduce infant mortality among VLBW infants. Bharadwaj, Løken, and Neilson (2013) provide the strongest evidence here. The estimated reductions in mortality remain directionally consistent even when limited to within-twin comparisons. Additionally, the mechanistic literature provides multiple pathways through which health investments could plausibly reduce mortality: use of surfactant therapy, admission to the NICU, and early detection of brain bleeds via cranial ultrasound. Importantly, however, this evidence is not conclusive as the estimated reductions in mortality shown in Almond et al (2010) become close to zero upon adjusting for non-random heaping.10
Additional infant health investments might produce large and lasting gains in academic achievement among VLBW infants. Daysal et al (2022) provide the strongest evidence here. The estimated improvements to achievement would likely survive multiple testing correction when one includes covariates in the regression discontinuity model. Additionally, the visual representation of the sibling results reveals a clear discontinuity at the VLBW threshold. Importantly, however, this evidence is not conclusive as the estimated improvements to achievement (as measured by standardized tests) do not replicate in Chile.
Based on these conclusions, I would like to see the literature on infant health investments move in the following directions:
We need more mechanistic evidence to understand exactly how infant health investments reduce mortality. Future studies should aim to gain greater clarity on (1) exactly which medical treatments are more likely to be administered below the VLBW threshold in practice (2) exactly which causes of mortality are reduced by additional infant health investments below the VLBW threshold. Although Almond et al (2010) come closest to providing answers to these questions, it is unclear whether the answers they provide are robust to non-random heaping.
We need more replication studies to verify that the achievement improvement is robust. It is not clear whether the lack of improvement to Chilean academic achievement (as measured by standardized test scores) is due to factors that are unique to the Chilean context or due to issues with the other studies that do estimate improvements to academic achievement. To identify which of these explanations is correct, I would like to see additional studies apply the regression discontinuity method to other states/countries. These results could then meta-analyzed to yield a highly-powered estimate of the relationship between infant health investments and academic achievement.
Overall, the existing body of literature looks quite promising, and I am eager to see whether the next generation of literature on infant health investments confirms or contradicts these findings.
Addendum: What About Cognitive Ability?
In light of the impacts on academic achievement discussed in this essay, one might wonder: do infant health investments have a positive long-term impact on cognitive ability as well?
It is unfortunately not possible to answer this question using the present set of studies because none include cognitive test scores as an outcome. The good news, however, is that such a study is well within the realm of possibility. Both Norway and Denmark aggregate nationally representative birthweight and cognitive test score data spanning multiple decades (Black, Devereaux, and Salvanes 2007; Husby, Wohlfahrt, and Melbye 2022). By linking these two sources of data, one could assess whether infant health investments improve the cognitive test scores of VLBW infants.
Perhaps more importantly, one might be able to gain deeper insights into the underlying structure of cognitive ability through such an analysis. To elaborate, there exists considerable debate in the psychological community about the structure of cognitive ability. Is cognitive ability comprised of a single underlying latent factor? Or is cognitive ability constituted by a network of latent factors that interact with one another? Or perhaps does it have a different kind of structure altogether?
These questions have proven difficult to answer definitively in the absence of experimental data. To put it another way, it is difficult to know with certainty which of these structures is the “correct” structure of cognitive ability without conducting experimental interventions on people’s brains to see which structure most is most consistent with human neurological activity. Because it is unethical to conduct such experiments, our ability to understand which structure is most plausible is limited.
The good news, however, is that the existence of the VLBW threshold may (key word: “may”) offer one way around this predicament. If the additional health investments administered below the VLBW threshold improve cognitive test scores by facilitating neurological development (key word: “if”), this threshold could be used to ascertain which theoretical structure of cognitive ability is most consistent with empirical data. Thus, the regression discontinuity method may provide an opportunity to resolve a long-standing debate about the nature of cognitive ability.
Works Cited
Almond, D., Doyle, J. J., Kowalski, A. E., & Williams, H. (2010). Estimating Marginal Returns to Medical Care: Evidence from At-Risk Newborns. The Quarterly Journal of Economics 125(2), 591-634. https://doi.org/10.1162/qjec.2010.125.2.591
Almond, D., Doyle, J. J., Kowalski, A. E., & Williams, H. (2011). The Role of Hospital Heterogeneity in Measuring Marginal Returns to Medical Care: A Reply to Barreca, Guldi, Lindo, and Waddell. The Quarterly Journal of Economics, 126(4), 2125–2131. https://doi.org/10.1093/qje/qjr037
Bagilet, V. (2024). Accurate Estimation of Small Effects: Illustration Through Air Pollution and Health. https://vincentbagilet.github.io/inference_pollution/inference_pollution_paper.pdf
Barreca, A. I., Lindo, J. M., & Waddell, G. R. (2016). Heaping-Induced Bias in Regression-Discontinuity Design. Economic Inquiry, 54(1), 268–293. https://doi.org/10.1111/ecin.12225
Bharadwaj, P., Løken, K. V., & Neilson, C. (2013). Early Life Health Interventions and Academic Achievement. American Economic Review, 103(5), 1862–1891. https://doi.org/10.1257/aer.103.5.1862
Biasi, B., Lafortune, J., & Schönholzer, D. (2024). What Works and For Whom? Effectiveness and Efficiency of School Capital Investments Across the U.S. EdWorking Papers, 24-898. https://doi.org/10.26300/RRCV-M178
Black, S. E., Devereux, P. J., & Salvanes, K. G. (2007). From the Cradle to the Labor Market? The Effect of Birth Weight on Adult Outcomes. The Quarterly Journal of Economics 122(1), 409-439. https://doi.org/10.1162/qjec.122.1.409
Boston Children’s Hospital. Intraventricular Hemorrhage. https://www.childrenshospital.org/conditions/intraventricular-hemorrhage
Cattaneo, M. D., Keele L., & Titiunik R. (2021). Covariate Adjustment in Regression Discontinuity Designs. arXiv. https://doi.org/10.48550/arXiv.2110.08410
Chyn, E., Gold, S., & Hastings, J. (2021). The returns to early-life interventions for very low birth weight children. Journal of Health Economics, 75, 102400. https://doi.org/10.1016/j.jhealeco.2020.102400
Darmstadt, G. L., Al Jaifi, N. H., Arif, S., Bahl, R., Blennow, M., Cavallera, V., Chou, D., Chou, R., Comrie-Thomson, L., Edmond, K., Feng, Q., Riera, P. F., Grummer-Strawn, L., Gupta, S., Hill, Z., Idowu, A. A., Kenner, C., Kirabira, V. N., Klinkott, R., … Yunis, K. (2023). New World Health Organization recommendations for care of preterm or low birth weight infants: Health policy. eClinicalMedicine, 63, 102155. https://doi.org/10.1016/j.eclinm.2023.102155
Daysal, N. M., Simonsen, M., Trandafir, M., & Breining, S. (2022). Spillover Effects of Early-Life Medical Interventions. The Review of Economics and Statistics, 104(1), 1–16. https://doi.org/10.1162/rest_a_00982
Daysal, N.M., Hui, D., Rossin-Slater, M., & Schwandt H. (2024). NBER, Working Paper 29524. https://doi.org/10.3386/w29524
Deryugina, T., Heutel, G., Miller, N. H., Molitor, D., & Reif, J. (2019). The Mortality and Medical Costs of Air Pollution: Evidence from Changes in Wind Direction. American Economic Review, 109(12), 4178–4219. https://doi.org/10.1257/aer.20180279
Ebenstein, A., Lavy, V., & Roth, S. (2016). The Long-Run Economic Consequences of High-Stakes Examinations: Evidence from Transitory Variation in Pollution. American Economic Journal: Applied Economics, 8(4), 36–65. https://doi.org/10.1257/app.20150213
El-Atawi, K., Elhalik, M., Kulkarni, T., Abdelsamed, A., Alexander L, & Satyan, A. D. (2016). Risk Factors, Diagnosis, and Current Practices in the Management of Intraventricular Hemorrhage in Preterm Infants: A Review. Academic Journal of Pediatrics & Neonatology. https://doi.org/10.19080/AJPN.2016.01.555561
Hand, L. I., Shellhaas, R. A., Milla, S. S., Cummings, J. J, Adams-Chapman, I. S., Aucott S. W., Goldsmith, J. P., Kaufman, D. A., Martin, C. R., Puopolo, K. M., Hartman, A. L., Bonkowsky, J. L., Capal, J. K., Lotze, T. E., Urion, D. K., Alazraki, A. L, Annam, A., Benya, E., Brown, B. P., Otero, H. J., & Richer, E. (2020). Routine Neuroimaging of the Preterm Brain. Pediatrics 146(5). https://doi.org/10.1542/peds.2020-029082
Hajdu, T., Kertesi, G., Kézdi, G., & Szabó-Morvai, Á. (2024). The Effects of Neonatal Intensive Care on Infant Mortality and Long-Term Health Impairments. American Journal of Health Economics, 10(1), 1–29. https://doi.org/10.1086/724219
Harrell, F. (2024 March). Interaction term vs subgroup analysis. https://stats.stackexchange.com/questions/643147/interaction-term-vs-subgroup-analysis
Heissel, J. A., Persico, C., & Simon, D. (2022). Does Pollution Drive Achievement? The Effect of Traffic Pollution on Academic Performance. Journal of Human Resources 57(3), 747-776. https://muse.jhu.edu/article/853087
Husby, A., Wohlfahrt, J., & Melbye M. (2022). Gestational age at birth and cognitive outcomes in adolescence: population based full sibling cohort study. BMJ. https://doi.org/10.1136/bmj-2022-072779
March of Dimes (2021 June). Low birthweight. https://www.marchofdimes.org/find-support/topics/birth/low-birthweight
Maziar, R. [WeeklyTax_3601]. (2024 March). Happy birthday son - a poem for our son, from dad. https://www.reddit.com/r/NICUParents/comments/1br6ezv/happy_birthday_son_a_poem_for_our_son_from_dad/
National Center for Health Statistics. (2024 April). Birthweight and Gestation. https://www.cdc.gov/nchs/fastats/birthweight.htm
Noack, C., & Rothe, C. (2023). Donut Regression Discontinuity Designs. arXiv. https://doi.org/10.48550/arXiv.2308.14464
Persico, C. L., & Johnson, K. R. (2021). The effects of increased pollution on COVID-19 cases and deaths. Journal of Environmental Economics and Management, 107, 102431. https://doi.org/10.1016/j.jeem.2021.102431
Polin, R. A., Waldemar, C. A., Papile, L., Tan, R., Kumar, P., Benitz, W., Eichenwald, E., Cummings, J., & Baley J. (2014). Surfactant Replacement Therapy for Preterm and Term Neonates With Respiratory Distress. Pediatrics 133(1), 156-163. https://doi.org/10.1542/peds.2013-3443
Shigeoka, H., & Fushimi, K. (2014). Supplier-induced demand for newborn treatment: Evidence from Japan. Journal of Health Economics, 35, 162–178. https://doi.org/10.1016/j.jhealeco.2014.03.003
Stanford Center for Education Policy Analysis. Racial and Ethnic Achievement Gaps. https://cepa.stanford.edu/educational-opportunity-monitoring-project/achievement-gaps/race/
Treluyer, L., Chevallier, M., Jarreau, P., Baud, O., Benhammou, V., Gire, C., Marchand-Martin, L., Marret, S., Pierrat, V., Ancel, P., & Torchin H. (2023). Intraventricular Hemorrhage in Very Preterm Children: Mortality and Neurodevelopment at Age 5. Pediatrics 151(4). https://doi.org/10.1542/peds.2022-059138
UNICEF Ethiopia. (2018). A pre-term baby is kept warm in an incubator [Photograph]. Flickr. https://www.flickr.com/photos/unicefethiopia/42542598385
Zhou, M., Wang, S., Zhang, T., Duan S., & Wang H. (2024). Neurodevelopmental outcomes in preterm or low birth weight infants with germinal matrix-intraventricular hemorrhage: a meta-analysis. Pediatric Research 95, 625-633. https://doi.org/10.1038/s41390-023-02877-8
Note that my use of the term “less-achievement-oriented” should not be interpreted in a strictly genetic sense. Black, Devereaux, and Salvanes (2007) demonstrate that birthweight has a causal impact on IQ scores even within monozygotic twins.
The authors explicitly adjust for whether the mother was born in a different state than the newborn, the mother’s age and educational attainment, the father’s age, the newborn’s sex, gestational age, race, and plurality (i.e singleton birth vs twin birth vs multiple birth) in each of their regression discontinuity models.
Hospital quality is defined by whether the hospital has a high-level NICU, low-level NICU, or no NICU at all.
While one could argue that the initial subgroup analysis in Almond et al (2010) is itself a questionable research practice, I have a hard time believing such an accusation. The subgroup analysis provided a natural segue into the welfare analysis which forms the central result of the paper: namely, that spending on infant healthcare has a large return on investment. Thus, it seems reasonable to believe that the authors had planned to do the subgroup analysis from the start (as opposed to only doing it to gin up nice p-values).
The authors produce both conventional and bias-corrected estimates of the impact of infant health investments. I report the bias-corrected estimates because these are the estimates for which robust standard errors and confidence intervals are computed. The authors describe an example in Footnote 17:
As an example, the 95% robust confidence interval for the mortality effect (−0.508) is constructed using the bias-corrected estimate and the robust standard error as: −1.011 ± 0.408 × 1.96 = [−1.811, −0.211].
The authors explicitly adjust for mother’s age, educational attainment, and marital status, newborn’s year of birth and region/municipality of birth, the type of birth service (i.e. doctor vs midwife), and indicators at 100g heaps in each of their regression discontinuity models.
The pollution —> achievement relationship may also be inflated due to publication bias, but Bagilet (2024) does not examine this possibility explicitly.
One hypothesis that could explain this discrepancy is admission to the NICU. Bharadwaj, Løken, and Neilson (2013) demonstrate that being below the VLBW threshold sharply increases the probability of admission to the NICU in Norway. Almond et al (2010), on the other hand, argue that the VLBW threshold has a weak relationship to NICU utilization in the United States. Thus, it may be that the mortality reduction is only robust in Norway because the increase in infant health investments is greater at the VLBW threshold in Norway relative to the United States.