A Family Affair: Rigorous Causal Inference Comes to Statistical Sociogenomics

In this post, Professor Daniel Adkins, Director of Biostatistics and instructor for the Introduction to Statistical Genetics course at Statistical Horizons, delves into recent debates surrounding causal inference in human complex trait genetics. For those interested in exploring the intricacies of genetic causality further, we invite you to register for the upcoming Statistical Sociogenomics seminar, taught by Professor Robbee Wedow, February 22-24, 2024.

The rise of polygenic scores (PGS) in human genetics research has been hailed as a paradigm-shifting development for the study of complex traits, as well as holding great promise for advancing clinical prediction algorithms. A polygenic score aggregates the effects of numerous genetic variants into a single number that predicts some aspect of an individual’s phenotype, often the risk of a particular disease.

PGS have become ubiquitous in the scientific literature (Plomin and von Stumm 2022) and are now slowly but steadily making their way into the clinic (Patel et al. 2023). However, regardless of how beneficial for clinical prediction the application of PGS turns out to be, prediction alone will never be sufficient to address more fundamental questions regarding the causal mechanisms through which genetic variation influences human complex traits. This distinction is particularly crucial in the domains of sociogenomics and behavioral genetics, where understanding the causal influence of genetics on human behavior is of primary importance.

Getting Serious About Causality

Fortunately, quantitative methodologists across a range of disciplines, from population genetics to economics, are now actively developing robust causal inference methods for human complex trait genetics. These researchers are applying advanced econometric methods, including within-family estimators, to exploit unique features of human heredity, such as Mendelian segregation, with impressive results.

Two recent high-impact articles exemplify this trend, giving us a telling glimpse into the current state of causal inference in complex trait genetics: Veller et al. (2023) “Causal interpretations of family Genome-Wide Association Studies (GWAS) in the presence of heterogeneous effects” and Nivard et al. (2024) “More than nature and nurture, indirect genetic effects on children’s academic achievement are consequences of dynastic social processes“.

The objectives of these articles are quite different. Veller and colleagues use formal theory to identify exactly what is being measured by the within-family estimator in genetic studies, while Nivard’s group applies an “extended pedigree” PGS design to disentangle direct genetic effects from various types of indirect ones. Both articles reinforce the emerging consensus that genetic causality is substantially indirect, much of it functioning via amorphous extended family “dynastic effects.” Even using advanced econometric methods, they argue that existing estimators of direct polygenic effects have notable limitations and are susceptible to bias under plausible scenarios.

For more on this topic, register for Statistical Sociogenomics

The Happy Marriage of Within-Family Estimators and Mendelian Segregation

Before engaging with these gloomy findings, let’s first acknowledge the progress that econometric methods like the within-family estimator have made in human complex trait genetics. The efforts toward rigorous identification of direct causal genetic effects have rapidly led to the primacy of family-based designs in human complex trait genetic research. These designs exploit the natural experiment afforded by Mendelian segregation to isolate the direct genetic signal from indirect genetic signals and other environmental confounds (Young et al. 2019; Howe et al. 2022).

This research design allows a nuanced exploration of genetic causality, distinguishing between direct genetic influences on traits and various types of indirect ones. These include:

Genetic nurture, where parental genotypes affect offspring outcomes via the parental environment.
Dynastic effects, which reflect the broad transmission of familial genetic influence via multigenerational social environments.

This progress largely owes to the melding of econometric principles with statistical genetics (e.g., Young et al. 2019, 2022) in ways that have elevated the standard of causal inference within sociogenomics and behavioral genetics. By incorporating econometric views on quasi-experimental randomization (via Mendelian segregation), within-family estimators, and local average treatment effects (LATE), this literature has helped formalize the problem of assessing genetic causality. As a result, within-family estimators for GWAS (genome-wide area studies) and PGS have become widely acknowledged as the benchmark for causal analysis in complex trait genetics.

Probing the Limits of Within-Family Estimators in Genomics

While within-family designs have substantially improved inferential rigor in complex trait genetics, it now seems that optimism regarding the approach should be tempered. Recent studies on the topic, particularly those by population geneticists (e.g., Veller et al. 2023; Veller and Coop 2023), have provided a thorough critique of family-based direct causal genetic effect estimates. This research has effectively critiqued within-family designs, arguing for a much more qualified interpretation of these effects, and challenging the validity of causal claims that rely on family-based PGS.

The critique advanced in Veller et al. (2023) is particularly trenchant, as it challenges the widely accepted view that Mendelian segregation provides the equivalent of a randomized control trial (RCT) in which alleles are randomized within families, allowing family-based GWAS to provide unbiased estimates of the average causal effect (ATE) of alleles. Instead, they argue that the analogy between family-based GWAS and RCTs is not as straightforward as we thought.

Here is one significant limitation: Mendelian segregation only randomizes alleles among the offspring of heterozygous parents, leaving the effects in children of homozygous parents unobserved. This limitation is crucial because, in the presence of specific types of gene-environment interaction (G×E), gene-gene interaction (G×G), and linkage disequilibrium (LD) patterns, the effects of alleles might differ between the offspring of homozygous and heterozygous parents. As a result, family-based GWAS may offer a biased estimate of the allele’s average effect in the sample, at best serving as the unbiased LATE in the children of heterozygotes. They further argue that even this more qualified interpretation does not apply to PGS due to the variation in heterozygosity across families for different sets of SNPs.

Nivard et al. (2024) focus on indirect genetic effects on education, leveraging the Norwegian Mother, Father and Child Cohort Study (MoBa) data including pairs of genetically-related families (parents were siblings, children were cousins; N=10,913). By analyzing extended pedigrees, they distinguish two types of indirect genetic effects, “genetic nurture” within nuclear families, and broader “dynastic effects” of extended family environments.

Their analysis reveals that indirect genetic effects on educational achievement are more a function of broader multi-generational dynastic effects than immediate nuclear family environments. This challenges the prevailing view that education is primarily transmitted within nuclear families, pointing instead to the significant role of broader familial socioeconomic environments shaped by genetics across generations.

Nivard et al.’s work underscores the complexity of genetic and environmental contributions to educational outcomes. It indicates that genetic causality of educational attainment involves both direct and circuitously indirect extended family influences, thus complicating our understanding of both educational inheritance and the substantive character of indirect genetic effects.

Critiquing the Critics

Although these two articles both offer rigorous analyses that expand our understanding of genetic causality, they also have notable limitations. For instance, while Veller et al. point out important limitations to the use of family-based GWAS as a stand-in for RCTs, their conclusions demand further exploration regarding the magnitude of bias caused by the absence of homozygous parent’s signal in within-family estimates.

Their concerns about biases introduced by G×E and G×G interactions may not be universally applicable and may depend on unwarranted assumptions about the association of locus heterozygosity with specific environments or genetic variants. While they demonstrate that bias can occur in particular G×E and G×G scenarios in model organisms, they do not definitively demonstrate that these G×E and G×G scenarios are common in humans. Thus, the degree of bias caused by the absence of homozygote in within-family estimators remains to be seen and may well be small in magnitude.

The article by Nivard and colleagues also leaves ample room for future elaboration, particularly regarding the weakly theorized concept of “dynastic effects”, which warrants more precise characterization in future research. This limitation speaks to the methodological and conceptual challenges in isolating the effects of extended versus nuclear family environments on education and related complex traits. Their findings suggest a layered genetic and socioeconomic interplay that extends well beyond the immediate family.

Conclusion

Despite a seemingly pessimistic narrative regarding the magnitude and detectability of direct genetic effects, recent research gives considerable cause for optimism regarding the prospects of the statistical genetics of complex traits. Crucially, these developments bring us nearer to achieving a methodological rigor necessary for the definitive mapping of the causal pathways of genetic influences on complex traits. Moreover, the innovative work of Nivard et al. (2024) with the extended pedigree MoBa dataset opens exciting new avenues for exploration. While the potential of trio family designs has been extensively explored, the full capabilities of extended pedigree designs, especially when genotypes for a broad array of genetic and social relatives are available, remain largely untapped.

The strategic use of extended pedigree designs holds great promise for strengthening causal inference in complex trait genetics, particularly for PGS. That is, by broadening the pedigree, we enhance the observation of genetic variance and minimize the incidence of homozygotic loci within the family, thereby improving the accuracy of causal estimates using PGS within extended families.

In conclusion, the critical examinations by Veller et al. and Nivard et al. challenge us to reassess the efficacy of established causal inference methodologies within statistical sociogenomics and related domains. These studies not only highlight the limitations of current approaches, including within-family estimators and PGS, but also chart a course for future research, emphasizing the need for refined methodologies that can more accurately model the interplay between genetic and environmental factors. As we move forward, it is paramount that we address these methodological challenges and conceptual nuances, exploring new avenues such as extended pedigree designs to deepen our understanding of genetic causality.

References

Howe, L.J., Nivard, M.G., Morris, T.T., Hansen, A.F., Rasheed, H., Cho, Y., Chittoor, G., Ahlskog, R., Lind, P.A., Palviainen, T., et al. (2022). “Within-sibship genome-wide association analyses decrease bias in estimates of direct genetic effects.” Nature Genetics, 54(5):581–592. https://doi.org/10.1038/s41588-022-01062-7

Nivard, M.G., Belsky, D.W., Harden, K.P., et al. (2024). “More than nature and nurture, indirect genetic effects on children’s academic achievement are consequences of dynastic social processes.” Nature Human Behaviour. https://doi.org/10.1038/s41562-023-01796-2

Patel, A.P., Wang, M., Ruan, Y. et al. “A multi-ancestry polygenic risk score improves risk prediction for coronary artery disease.” Nat Med 29, 1793–1803 (2023). https://doi.org/10.1038/s41591-023-02429-x

Plomin, R., von Stumm, S. “Polygenic scores: prediction versus explanation.” Mol Psychiatry 27, 49–52 (2022). https://doi.org/10.1038/s41380-021-01348-y

Veller, C., Coop, G. (2023). “Interpreting population and family-based genome-wide association studies in the presence of confounding.” bioRxiv. https://doi.org/10.1101/2023.02.26.530052

Veller, C., Przeworski, M., Coop, G. (2023). “Causal interpretations of family GWAS in the presence of heterogeneous effects.” bioRxiv, 2023.11.13.566950. https://doi.org/10.1101/2023.11.13.566950

Young, A.I., Benonisdottir, S., Przeworski, M., Kong, A. (2019). “Deconstructing the sources of genotype–phenotype associations in humans.” Science, 365(6460):1396–1400. https://doi.org/10.1126/science.aax3710

Young, A.I., Nehzati, S.M., Benonisdottir, S., Okbay, A., Jayashankar, H., Lee, C., Cesarini, D., Benjamin, D.J., Turley, P., Kong, A. (2022). “Mendelian imputation of parental genotypes improves estimates of direct genetic effects.” Nature Genetics, 54:897–905. https://doi.org/10.1038/s41588-022-01085-0

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.