Variant scoring tools for deep mutational scanning.
Çubuk H; Jin X; Phipson B; Marsh JA; Rubin AF
Molecular systems biology2025
Deep mutational scanning (DMS) can systematically assess the effects of thousands of genetic variants in a single assay, providing insights into protein function, evolution, host-pathogen interactions, and clinical impacts. Accurate scoring of variant effects is crucial, yet the diversity of tools and experimental designs contributes considerable heterogeneity that complicates data analysis. Here, we review and compare 12 computational tools for processing DMS sequencing data and scoring variant effects. We systematically outline each tool's statistical approaches, supported experimental designs, input/output requirements, software implementation, visualisation capabilities, and key assumptions. By highlighting the strengths and limitations of these tools, we hope to guide researchers in selecting methods appropriate for their specific experiments. Furthermore, we discuss current challenges, including the need for standardised analysis protocols and sustainable software maintenance, as well as opportunities for future methods development. Ultimately, this review seeks to advance the application and adoption of DMS, facilitating deeper biological understanding and improved clinical translation.
Landscapes of missense variant impact for human superoxide dismutase 1.
Axakova A; Ding M; Cote AG; Subramaniam R; Senguttuvan V; Zhang H; Weile J; Douville SV; Gebbia M; Al-Chalabi Aet al
bioRxiv2025
Amyotrophic lateral sclerosis (ALS) is a progressive motor neuron disease for which important subtypes are caused by variation in the Superoxide Dismutase 1 gene SOD1. Diagnosis based on SOD1 sequencing can not only be definitive but also indicate specific therapies available for SOD1-associated ALS (SOD1-ALS). Unfortunately, SOD1-ALS diagnosis is limited by the fact that a substantial fraction (currently 26%) of ClinVar SOD1 missense variants are classified as "variants of uncertain significance" (VUS). Although functional assays can provide strong evidence for clinical variant interpretation, SOD1 assay validation is challenging, given the current incomplete and controversial understanding of SOD1-ALS disease mechanism. Using saturation mutagenesis and multiplexed cell-based assays, we measured the functional impact of over two thousand SOD1 amino acid substitutions on both enzymatic function and protein abundance. The resulting 'missense variant effect maps' not only reflect prior biochemical knowledge of SOD1 but also provide sequence-structure-function insights. Importantly, our variant abundance assay can discriminate pathogenic missense variation and provides new evidence for 41% of missense variants that had been previously reported as VUS, offering the potential to identify additional patients who would benefit from therapy approved for SOD1-ALS.
Multiplexed assays of variant effect for clinical variant interpretation
McEwen, Abbye E.; Tejura, Malvika; Fayer, Shawn; Starita, Lea M.; Fowler, Douglas M.
Nature Reviews Genetics2025;1-18
The rapid expansion of clinical genetic testing has markedly improved the detection of genetic variants. However, most variants lack the evidence needed to classify them as pathogenic or benign, resulting in the accumulation of variants of uncertain significance that cannot be used to diagnose or guide treatment of disease. Moreover, targeted therapy for cancer treatment increasingly depends on correctly identifying oncogenic driver mutations, but the oncogenicity of many variants identified in tumours remains unclear. To address these challenges, efforts to classify variants are increasingly using multiplexed assays of variant effect (MAVEs), which are massively scaled experiments that can generate functional data for thousands of variants simultaneously. The rise of MAVEs is accompanied by better guidance on the use of MAVE data for classifying germline variants to aid their clinical implementation. Here, we overview MAVE technologies from their inception to their increased use in the clinic, including their roles in uncovering mechanisms for variant pathogenicity and guiding targeted therapy and drug development. Multiplexed assays of variant effect (MAVEs) are highly scalable experimental approaches used to generate functional data for genetic variants. In this Review, McEwen et al. discuss the advances in MAVE technologies and guidance on how to use MAVE data in the clinic, which is helping to reveal variant pathogenicity, develop personalized drugs and inform targeted therapies.
Mapping MAVE data for use in human genomics applications.
Arbesfeld JA; Da EY; Stevenson JS; Kuzma K; Paul A; Farris T; Capodanno BJ; Grindstaff SB; Riehle K; Saraiva-Agostinho Net al
Genome biology2025;26;1;179
Experimental data from functional assays have a critical role in interpreting the impact of genetic variants. Assay data must be unambiguously mapped to a reference genome to make it accessible, but it is often reported relative to assay-specific sequences, complicating downstream use and integration of variant data across resources. To make multiplexed assays of variant effect (MAVE) data more broadly available to the research and clinical communities, the Atlas of Variant Effects Alliance mapped MAVE data from the MaveDB community database to human reference sequences, creating an extensive set of machine-readable homology mappings that are incorporated into widely used human genomics applications.
American journal of human genetics2025;112;6;1489-1495
When investigating whether a variant identified by diagnostic genetic testing is causal for disease, applied genetics professionals evaluate all available evidence to assign a clinical classification. Functional assays of higher and higher throughput are increasingly being generated and, when appropriate, can provide strong functional evidence for or against pathogenicity in variant classification. Despite functional assay data representing unprecedented value for genomic diagnostics, challenges remain around the application of functional evidence in variant curation. To investigate a growing gap articulated in recent international studies, we surveyed genetic diagnostic professionals in Australasia to assess their application of functional evidence in clinical practice. The survey results echo the universal difficulty in evaluating functional evidence but expand on this by indicating that even self-proclaimed expert respondents are not confident to apply functional evidence, mainly due to uncertainty around practice recommendations. Respondents also identified the need for support resources and educational opportunities, and in particular requested expert recommendations and updated practice guidelines to improve translation of experimental data to curation evidence. We then collated a list of 226 functional assays and the evidence strength recommended by 19 ClinGen Variant Curation Expert Panels. Specific assays for more than 45,000 variants were evaluated, but evidence recommendations were generally limited to lower throughput and strength. As an initial step, we provide our collated list of assay evidence as a source of international expert opinion on the evaluation of functional- evidence and conclude that these results highlight an opportunity to develop additional support resources to fully utilize functional evidence in clinical practice.
Validating data from multiplex assays of variant effect: A CanVIG-UK national survey of NHS clinical scientists.
Allen S; Garrett A; Rowlands CF; Durkie M; Burghel GJ; Robinson R; Callaway A; Field J; Frugtniet B; Palmer-Smith Set al
American journal of human genetics2025;112;6;1479-1488
Advances in technology have made it possible for multiplex assays of variant effect (MAVEs) to systematically generate functional data for thousands of genetic variants. Robust clinical validation and accessible online resources for MAVE data have previously been identified as barriers to the clinical adoption of new MAVEs. We delivered a survey during the November 2024 Cancer Variant Interpretation Group UK (CanVIG-UK) meeting comprising National Health Service (NHS) clinical scientists and clinical geneticists and received 46 responses from individuals regularly performing variant classification for diagnostic reporting. Only 35% reported they would accept clinical validation of the MAVE provided by the authors who conducted the assay; 20% reported they would attempt clinical validation themselves, and 61% would await clinical validation by a trusted central body. 72% reported they would use MAVE data ahead of a formal peer-reviewed publication if reviewed and clinically validated by a trusted central body. When scoring central bodies on a scale of 1-5 for confidence in their review and validation of MAVEs, CanVIG-UK (median = 5), variant curation expert panels (VCEPs; median = 5), and ClinGen SVI Functional Working Group (median = 4) all scored highly. Participants supported making variant-level data accessible via a relevant web resource (although the majority of participants expressed that additional assay-level or variant-level information would have a low likelihood of altering validation scores provided by a trusted central body). These findings, from a comparatively homogeneous clinical diagnostic group operating in a resource-constrained healthcare setting, indicate that clinical application of new MAVEs for variant classification will be delayed unless robust clinical validations are performed by a trusted central body and made readily accessible.
American journal of human genetics2025;112;6;1468-1478
Variant-level functional data are a core component of clinical variant classification and can aid in reinterpreting variants of uncertain significance (VUSs). However, the usage of functional data by genetics professionals is currently unknown. An online survey was developed and distributed in the spring of 2024 to individuals actively engaged in variant interpretation. Quantitative and qualitative methods were used to assess responses. 190 eligible individuals responded, with 93% reporting interpreting 26 or more variants per year. The median respondent reported 11-20 years of experience. The most common professional roles were laboratory medical geneticists (23%) and variant review scientists (23%). 77% reported using functional data for variant interpretation in a clinical setting, and overall, respondents felt confident assessing functional data. However, 67% indicated that functional data for variants of interest were rarely or never available, and 91% considered insufficient quality metrics or confidence in the accuracy of data as barriers to their use. 94% of respondents noted that better access to primary functional data and standardized interpretation of functional data would improve usage. Respondents also indicated that handling conflicting functional data is a common challenge in variant interpretation that is not performed in a systematic manner across institutions. The results from this survey showed a demand for a comprehensive database with reliable quality metrics to support the use of functional evidence in clinical variant interpretation. The results also highlight a need for guidelines regarding how putatively conflicting functional data should be used for variant classification.
American journal of human genetics2025;112;6;1489-1495
When investigating whether a variant identified by diagnostic genetic testing is causal for disease, applied genetics professionals evaluate all available evidence to assign a clinical classification. Functional assays of higher and higher throughput are increasingly being generated and, when appropriate, can provide strong functional evidence for or against pathogenicity in variant classification. Despite functional assay data representing unprecedented value for genomic diagnostics, challenges remain around the application of functional evidence in variant curation. To investigate a growing gap articulated in recent international studies, we surveyed genetic diagnostic professionals in Australasia to assess their application of functional evidence in clinical practice. The survey results echo the universal difficulty in evaluating functional evidence but expand on this by indicating that even self-proclaimed expert respondents are not confident to apply functional evidence, mainly due to uncertainty around practice recommendations. Respondents also identified the need for support resources and educational opportunities, and in particular requested expert recommendations and updated practice guidelines to improve translation of experimental data to curation evidence. We then collated a list of 226 functional assays and the evidence strength recommended by 19 ClinGen Variant Curation Expert Panels. Specific assays for more than 45,000 variants were evaluated, but evidence recommendations were generally limited to lower throughput and strength. As an initial step, we provide our collated list of assay evidence as a source of international expert opinion on the evaluation of functional- evidence and conclude that these results highlight an opportunity to develop additional support resources to fully utilize functional evidence in clinical practice.
At the Clinical Atlas of Variant Effects meeting (CLAVE meeting, July 2024, Pittsburgh USA), we developed recommendations for a draft atlas that can be realized by 2030, with a focus on empowering genomic medicine by resolving VUS. This document crystallizes infrastructure, technology development, data production and clinical translation efforts that CLAVE meeting attendees concluded are needed to produce a maximally useful draft 2030 atlas.
Guidelines for releasing a variant effect predictor.
Livesey BJ; Badonyi M; Dias M; Frazer J; Kumar S; Lindorff-Larsen K; McCandlish DM; Orenbuch R; Shearer CA; Muffley Let al
Genome biology2025;26;1;97
Computational methods for assessing the likely impacts of mutations, known as variant effect predictors (VEPs), are widely used in the assessment and interpretation of human genetic variation, as well as in other applications like protein engineering. Many different VEPs have been released, and there is tremendous variability in their underlying algorithms, outputs, and the ways in which the methodologies and predictions are shared. This leads to considerable difficulties for users trying to navigate the selection and application of VEPs. Here, to address these issues, we provide guidelines and recommendations for the release of novel VEPs.
MaveDB 2024: a curated community database with over seven million variant effects from multiplexed functional assays.
Rubin AF; Stone J; Bianchi AH; Capodanno BJ; Da EY; Dias M; Esposito D; Frazer J; Fu Y; Grindstaff SBet al
Genome biology2025;26;1;13
Multiplexed assays of variant effect (MAVEs) are a critical tool for researchers and clinicians to understand genetic variants. Here we describe the 2024 update to MaveDB ( https://www.mavedb.org/ ) with four key improvements to the MAVE community's database of record: more available data including over 7 million variant effect measurements, an improved data model supporting assays such as saturation genome editing, new built-in exploration and visualization tools, and powerful APIs for data federation and streamlined submission and access. Together these changes support MaveDB's role as a hub for the analysis and dissemination of MAVEs now and into the future.
Site-saturation mutagenesis of 500 human protein domains.
Beltran A; Jiang X; Shen Y; Lehner B
Nature2025;637;8047;885-894
Missense variants that change the amino acid sequences of proteins cause one-third of human genetic diseases1. Tens of millions of missense variants exist in the current human population, and the vast majority of these have unknown functional consequences. Here we present a large-scale experimental analysis of human missense variants across many different proteins. Using DNA synthesis and cellular selection experiments we quantify the effect of more than 500,000 variants on the abundance of more than 500 human protein domains. This dataset reveals that 60% of pathogenic missense variants reduce protein stability. The contribution of stability to protein fitness varies across proteins and diseases and is particularly important in recessive disorders. We combine stability measurements with protein language models to annotate functional sites across proteins. Mutational effects on stability are largely conserved in homologous domains, enabling accurate stability prediction across entire protein families using energy models. Our data demonstrate the feasibility of assaying human protein variants at scale and provides a large consistent reference dataset for clinical variant interpretation and training and benchmarking of computational methods.
Epitope mapping via in vitro deep mutational scanning methods and its applications.
Keen MM; Keith AD; Ortlund EA
The Journal of biological chemistry2025;301;1;108072
Epitope mapping is a technique employed to define the region of an antigen that elicits an immune response, providing crucial insight into the structural architecture of the antigen as well as epitope-paratope interactions. With this breadth of knowledge, immunotherapies, diagnostics, and vaccines are being developed with a rational and data-supported design. Traditional epitope mapping methods are laborious, time-intensive, and often lack the ability to screen proteins in a high-throughput manner or provide high resolution. Deep mutational scanning (DMS), however, is revolutionizing the field as it can screen all possible single amino acid mutations and provide an efficient and high-throughput way to infer the structures of both linear and three-dimensional epitopes with high resolution. Currently, more than 50 publications take this approach to efficiently identify enhancing or escaping mutations, with many then employing this information to rapidly develop broadly neutralizing antibodies, T-cell immunotherapies, vaccine platforms, or diagnostics. We provide a comprehensive review of the approaches to accomplish epitope mapping while also providing a summation of the development of DMS technology and its impactful applications.
SSEmb: A joint embedding of protein sequence and structure enables robust variant effect predictions.
Blaabjerg LM; Jonsson N; Boomsma W; Stein A; Lindorff-Larsen K
Nature Communications2024;15;1;9646
The ability to predict how amino acid changes affect proteins has a wide range of applications including in disease variant classification and protein engineering. Many existing methods focus on learning from patterns found in either protein sequences or protein structures. Here, we present a method for integrating information from sequence and structure in a single model that we term SSEmb (Sequence Structure Embedding). SSEmb combines a graph representation for the protein structure with a transformer model for processing multiple sequence alignments. We show that by integrating both types of information we obtain a variant effect prediction model that is robust when sequence information is scarce. We also show that SSEmb learns embeddings of the sequence and structure that are useful for other downstream tasks such as to predict protein-protein binding sites. We envisage that SSEmb may be useful both for variant effect predictions and as a representation for learning to predict protein properties that depend on sequence and structure.
Using multiplexed functional data to reduce variant classification inequities in underrepresented populations.
Dawood M; Fayer S; Pendyala S; Post M; Kalra D; Patterson K; Venner E; Muffley LA; Fowler DM; Rubin AFet al
Genome medicine2024;16;1;143
Multiplexed Assays of Variant Effects (MAVEs) can test all possible single variants in a gene of interest. The resulting saturation-style functional data may help resolve variant classification disparities between populations, especially for Variants of Uncertain Significance (VUS).
2024 Clinical Atlas of Variant Effects meeting summary
Fowler, Douglas M; CLAVE Meeting Attendees
Zenodo
Executive summary from the Clinical Atlas of Variant Effects meeting held in Pittsburgh, PA USA on July 23rd, 2024. This meeting was made possible in part by a grant to the University of Pittsburgh from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation.
Variation to biology: optimizing functional analysis of cancer risk variants.
Nelson S; Carrick D; Daee D; Fingerman I; Gillanders E
Journal of the National Cancer Institute2024;116;12;1882-1889
Research conducted over the past 15+ years has identified hundreds of common germline genetic variants associated with cancer risk, but understanding the biological impact of these primarily non-protein coding variants has been challenging. The National Cancer Institute sought to better understand and address those challenges by requesting input from the scientific community via a survey and a 2-day virtual meeting, which focused on discussions among participants. Here, we discuss challenges identified through the survey as important to advancing functional analysis of common cancer risk variants: 1) When is a variant truly characterized; 2) Developing and standardizing databases and computational tools; 3) Optimization and implementation of high-throughput assays; 4) Use of model organisms for understanding variant function; 5) Diversity in data and assays; and 6) Creating and improving large multidisciplinary collaborations. We define these 6 challenges, describe how success in addressing them may look, propose potential solutions, and note issues that span all the challenges. Implementation of these ideas could help develop a framework for methodically analyzing common cancer risk variants to understand their function and make effective and efficient use of the wealth of existing genomic association data.
High-throughput assays to assess variant effects on disease.
Ma K; Gauthier LO; Cheung F; Huang S; Lek M
Disease models & mechanisms2024;17;6
Interpreting the wealth of rare genetic variants discovered in population-scale sequencing efforts and deciphering their associations with human health and disease present a critical challenge due to the lack of sufficient clinical case reports. One promising avenue to overcome this problem is deep mutational scanning (DMS), a method of introducing and evaluating large-scale genetic variants in model cell lines. DMS allows unbiased investigation of variants, including those that are not found in clinical reports, thus improving rare disease diagnostics. Currently, the main obstacle limiting the full potential of DMS is the availability of functional assays that are specific to disease mechanisms. Thus, we explore high-throughput functional methodologies suitable to examine broad disease mechanisms. We specifically focus on methods that do not require robotics or automation but instead use well-designed molecular tools to transform biological mechanisms into easily detectable signals, such as cell survival rate, fluorescence or drug resistance. Here, we aim to bridge the gap between disease-relevant assays and their integration into the DMS framework.
PRKN-linked familial Parkinson's disease: cellular and molecular mechanisms of disease-linked variants.
Clausen L; Okarmus J; Voutsinos V; Meyer M; Lindorff-Larsen K; Hartmann-Petersen R
Cellular and molecular life sciences : CMLS2024;81;1;223
Parkinson's disease (PD) is a common and incurable neurodegenerative disorder that arises from the loss of dopaminergic neurons in the substantia nigra and is mainly characterized by progressive loss of motor function. Monogenic familial PD is associated with highly penetrant variants in specific genes, notably the PRKN gene, where homozygous or compound heterozygous loss-of-function variants predominate. PRKN encodes Parkin, an E3 ubiquitin-protein ligase important for protein ubiquitination and mitophagy of damaged mitochondria. Accordingly, Parkin plays a central role in mitochondrial quality control but is itself also subject to a strict protein quality control system that rapidly eliminates certain disease-linked Parkin variants. Here, we summarize the cellular and molecular functions of Parkin, highlighting the various mechanisms by which PRKN gene variants result in loss-of-function. We emphasize the importance of high-throughput assays and computational tools for the clinical classification of PRKN gene variants and how detailed insights into the pathogenic mechanisms of PRKN gene variants may impact the development of personalized therapeutics.
Analyzing the functional effects of DNA variants with gene editing.
Cooper S; Obolenski S; Waters AJ; Bassett AR; Coelho MA
Cell reports methods2024;4;5;100776
Continual advancements in genomics have led to an ever-widening disparity between the rate of discovery of genetic variants and our current understanding of their functions and potential roles in disease. Systematic methods for phenotyping DNA variants are required to effectively translate genomics data into improved outcomes for patients with genetic diseases. To make the biggest impact, these approaches must be scalable and accurate, faithfully reflect disease biology, and define complex disease mechanisms. We compare current methods to analyze the function of variants in their endogenous DNA context using genome editing strategies, such as saturation genome editing, base editing and prime editing. We discuss how these technologies can be linked to high-content readouts to gain deep mechanistic insights into variant effects. Finally, we highlight key challenges that need to be addressed to bridge the genotype to phenotype gap, and ultimately improve the diagnosis and treatment of genetic diseases.
Guidelines for releasing a variant effect predictor.
Livesey BJ; Badonyi M; Dias M; Frazer J; Kumar S; Lindorff-Larsen K; McCandlish DM; Orenbuch R; Shearer CA; Muffley Let al
ArXiv2024
Computational methods for assessing the likely impacts of mutations, known as variant effect predictors (VEPs), are widely used in the assessment and interpretation of human genetic variation, as well as in other applications like protein engineering. Many different VEPs have been released to date, and there is tremendous variability in their underlying algorithms and outputs, and in the ways in which the methodologies and predictions are shared. This leads to considerable challenges for end users in knowing which VEPs to use and how to use them. Here, to address these issues, we provide guidelines and recommendations for the release of novel VEPs. Emphasising open-source availability, transparent methodologies, clear variant effect score interpretations, standardised scales, accessible predictions, and rigorous training data disclosure, we aim to improve the usability and interpretability of VEPs, and promote their integration into analysis and evaluation pipelines. We also provide a large, categorised list of currently available VEPs, aiming to facilitate the discovery and encourage the usage of novel methods within the scientific community.
Minimum information and guidelines for reporting a multiplexed assay of variant effect.
Claussnitzer M; Parikh VN; Wagner AH; Arbesfeld JA; Bult CJ; Firth HV; Muffley LA; Nguyen Ba AN; Riehle K; Roth FPet al
Genome biology2024;25;1;100
Multiplexed assays of variant effect (MAVEs) have emerged as a powerful approach for interrogating thousands of genetic variants in a single experiment. The flexibility and widespread adoption of these techniques across diverse disciplines have led to a heterogeneous mix of data formats and descriptions, which complicates the downstream use of the resulting datasets. To address these issues and promote reproducibility and reuse of MAVE data, we define a set of minimum information standards for MAVE data and metadata and outline a controlled vocabulary aligned with established biomedical ontologies for describing these experimental designs.
Workshop report: the clinical application of data from multiplex assays of variant effect (MAVEs), 12 July 2023.
Allen S; Garrett A; Muffley L; Fayer S; Foreman J; Adams DJ; Hurles M; Rubin AF; Roth FP; Starita LMet al
European journal of human genetics : EJHG2024;32;5;593-600
Clinical classification of genomic variants identified on sequencing is often challenging, with many variants classified as Variants of Uncertain Significance (VUS) on account of insufficient evidence. Advances in sequencing and gene synthesis has made feasible multiplexed assays of variant effect (MAVEs), which quantify the functional impact of many thousands of genomic variants in a single experiment. These assays and the functional evidence they generate have the potential to empower more accurate clinical variant classification. However, there are many outstanding challenges and opportunities that require joint resolution and specification, thus necessitating communication between the research scientists who have designed and performed MAVEs and the clinicians and diagnostic scientists who will apply their data to clinical variant classification. In the ‘Clinical Application of MAVE Data’ workshop, held on 12th July 2023 at the Wellcome Connecting Science Conference Centre in between two relevant research meetings, ‘Curating the Clinical Genome 2023’ and the ‘Mutational Scanning Symposium 2023’, 44 key scientific and/or clinical stakeholders were brought together to consider important questions relating to clinical application of MAVE data, such as quantitative validation, variant truth-sets, platforms and standards for dissemination of MAVE data. The outcomes and possible next steps that were discussed encompassed development of focused workshops to develop consensus recommendations, creating a MAVE evaluation working group, and collaboration of ClinVar and MaveDB to enact software changes that support enhanced functional data submission.
Harrison PW; Amode MR; Austine-Orimoloye O; Azov AG; Barba M; Barnes I; Becker A; Bennett R; Berry A; Bhai Jet al
Nucleic acids research2024;52;D1;D891-D899
Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates and model organisms for more than two decades. In recent years, there has been a dramatic shift in the genomic landscape, with a large increase in the number and phylogenetic breadth of high-quality reference genomes, alongside major advances in the pan-genome representations of higher species. In order to support these efforts and accelerate downstream research, Ensembl continues to focus on scaling for the rapid annotation of new genome assemblies, developing new methods for comparative analysis, and expanding the depth and quality of our genome annotations. This year we have continued our expansion to support global biodiversity research, doubling the number of annotated genomes we support on our Rapid Release site to over 1700, driven by our close collaboration with biodiversity projects such as Darwin Tree of Life. We have also strengthened support for key agricultural species, including the first regulatory builds for farmed animals, and have updated key tools and resources that support the global scientific community, notably the Ensembl Variant Effect Predictor. Ensembl data, software, and tools are freely available.
Will variants of uncertain significance still exist in 2030?
Fowler DM; Rehm HL
American journal of human genetics2024;111;1;5-10
In 2020, the National Human Genome Research Institute (NHGRI) made ten "bold predictions," including that "the clinical relevance of all encountered genomic variants will be readily predictable, rendering the diagnostic designation 'variant of uncertain significance (VUS)' obsolete." We discuss the prospects for this prediction, arguing that many, if not most, VUS in coding regions will be resolved by 2030. We outline a confluence of recent changes making this possible, especially advances in the standards for variant classification that better leverage diverse types of evidence, improvements in computational variant effect predictor performance, scalable multiplexed assays of variant effect capable of saturating the genome, and data-sharing efforts that will maximize the information gained from each new individual sequenced and variant interpreted. We suggest that clinicians and researchers can realize a future where VUSs have largely been eliminated, in line with the NHGRI's bold prediction. The length of time taken to reach this future, and thus whether we are able to achieve the goal of largely eliminating VUSs by 2030, is largely a consequence of the choices made now and in the next few years. We believe that investing in eliminating VUSs is worthwhile, since their predominance remains one of the biggest challenges to precision genomic medicine.
Overview of modern genomic tools for diagnosis and precision therapy of childhood solid cancers.
Mardis ER
Current opinion in pediatrics2024;36;1;71-77
The application of technology and computational analyses to generate new data types from pediatric solid cancers is transforming diagnostic accuracy. This review provides an overview of such new capabilities in the pursuit of improved treatment for essentially rare and underserved diseases that are the highest cause of mortality in children over one year of age. Sophisticated ways of identifying therapeutic vulnerabilities for highly personalized treatment are presented alongside cutting-edge disease response monitoring by liquid biopsy.
Sequencing has revealed hundreds of millions of human genetic variants, and continued efforts will only add to this variant avalanche. Insufficient information exists to interpret the effects of most variants, limiting opportunities for precision medicine and comprehension of genome function. A solution lies in experimental assessment of the functional effect of variants, which can reveal their biological and clinical impact. However, variant effect assays have generally been undertaken reactively for individual variants only after and, in most cases long after, their first observation. Now, multiplexed assays of variant effect can characterise massive numbers of variants simultaneously, yielding variant effect maps that reveal the function of every possible single nucleotide change in a gene or regulatory element. Generating maps for every protein encoding gene and regulatory element in the human genome would create an 'Atlas' of variant effect maps and transform our understanding of genetics and usher in a new era of nucleotide-resolution functional knowledge of the genome. An Atlas would reveal the fundamental biology of the human genome, inform human evolution, empower the development and use of therapeutics and maximize the utility of genomics for diagnosing and treating disease. The Atlas of Variant Effects Alliance is an international collaborative group comprising hundreds of researchers, technologists and clinicians dedicated to realising an Atlas of Variant Effects to help deliver on the promise of genomics.
SUNi mutagenesis: Scalable and uniform nicking for efficient generation of variant libraries.
Mighell TL; Toledano I; Lehner B
PloS one2023;18;7;e0288158
Multiplexed assays of variant effects (MAVEs) have made possible the functional assessment of all possible mutations to genes and regulatory sequences. A core pillar of the approach is generation of variant libraries, but current methods are either difficult to scale or not uniform enough to enable MAVEs at the scale of gene families or beyond. We present an improved method called Scalable and Uniform Nicking (SUNi) mutagenesis that combines massive scalability with high uniformity to enable cost-effective MAVEs of gene families and eventually genomes.
Mapping MAVE data for use in human genomics applications.
Arbesfeld JA; Da EY; Stevenson JS; Kuzma K; Paul A; Farris T; Capodanno BJ; Grindstaff SB; Riehle K; Saraiva-Agostinho Net al
bioRxiv2024
The large-scale experimental measures of variant functional assays submitted to MaveDB have the potential to provide key information for resolving variants of uncertain significance, but the reporting of results relative to assayed sequence hinders their downstream utility. The Atlas of Variant Effects Alliance mapped multiplexed assays of variant effect data to human reference sequences, creating a robust set of machine-readable homology mappings. This method processed approximately 2.5 million protein and genomic variants in MaveDB, successfully mapping 98.61% of examined variants and disseminating data to resources such as the UCSC Genome Browser and Ensembl Variant Effect Predictor.
Correspondence between functional scores from deep mutational scans and predicted effects on protein stability.
Gerasimavicius L; Livesey BJ; Marsh JA
Protein science : a publication of the Protein Society2023;32;7;e4688
Many methodologically diverse computational methods have been applied to the growing challenge of predicting and interpreting the effects of protein variants. As many pathogenic mutations have a perturbing effect on protein stability or intermolecular interactions, one highly interpretable approach is to use protein structural information to model the physical impacts of variants and predict their likely effects on protein stability and interactions. Previous efforts have assessed the accuracy of stability predictors in reproducing thermodynamically accurate values and evaluated their ability to distinguish between known pathogenic and benign mutations. Here, we take an alternate approach, and explore how well stability predictor scores correlate with functional impacts derived from deep mutational scanning (DMS) experiments. In this work, we compare the predictions of 9 protein stability-based tools against mutant protein fitness values from 49 independent DMS datasets, covering 170,940 unique single amino acid variants. We find that FoldX and Rosetta show the strongest correlations with DMS-based functional scores, similar to their previous top performance in distinguishing between pathogenic and benign variants. For both methods, performance is considerably improved when considering intermolecular interactions from protein complex structures, when available. Furthermore, using these two predictors, we derive a "Foldetta" consensus score, which improves upon the performance of both, and manages to match dedicated variant effect predictors in reflecting variant functional impacts. Finally, we also highlight that predicted stability effects show consistently higher correlations with certain DMS experimental phenotypes, particularly those based upon protein abundance, and, in certain cases, can significantly outcompete sequence-based variant effect prediction methodologies for predicting functional scores from DMS experiments.
First-Tier Next Generation Sequencing for Newborn Screening: An Important Role for Biochemical Second-Tier Testing.
Stenton SL; Campagna M; Philippakis A; O'Donnell-Luria A; Gelb MH
Genetics in medicine open2023;1;1
There is discussion of expanding newborn screening (NBS) through the use of genomic sequence data; yet, challenges remain in the interpretation of DNA variants. Population-level DNA variant databases are available, and it is possible to estimate the number of newborns who would be flagged as having a risk for a genetic disease (including rare variants of unknown significance, VUS) via next-generation sequencing (NGS) positive. Estimates of the number of newborns screened as NGS positive for monogenic recessive diseases were obtained by analysis of the Genome Aggregation Database (gnomAD). For a collection of diseases for which there is interest in NBS, we provided 2 estimates for the expected number of newborns screened as NGS positive. For a set of lysosomal storage diseases, we estimated that 100 to approximately 600 NGS screen positives would be found per disease per year in a large NBS laboratory (California), and this figure may be expected to rise to a limit of about 1000 if we account for the fact that gnomAD does not contain all worldwide variants. The number of positives would drop 2.5- to 10-fold if the 10 VUS with highest allele frequency were biochemically annotated as benign. It is proposed that a second-tier biochemical assay using the same dried blood spot could be carried out as a filter and as part of NBS to reduce the number of high-risk NGS positive newborns to a manageable number.
Insertions and deletions (indels) enable evolution and cause disease. Due to technical challenges, indels are left out of most mutational scans, limiting our understanding of them in disease, biology, and evolution. We develop a low cost and bias method, DIMPLE, for systematically generating deletions, insertions, and missense mutations in genes, which we test on a range of targets, including Kir2.1. We use DIMPLE to study how indels impact potassium channel structure, disease, and evolution. We find deletions are most disruptive overall, beta sheets are most sensitive to indels, and flexible loops are sensitive to deletions yet tolerate insertions.
Unified views on variant impact across many diseases.
Kumar S; Gerstein M
Trends in genetics2023;39;6;442-450
Genomic studies of human disorders are often performed by distinct research communities (i.e., focused on rare diseases, common diseases, or cancer). Despite underlying differences in the mechanistic origin of different disease categories, these studies share the goal of identifying causal genomic events that are critical for the clinical manifestation of the disease phenotype. Moreover, these studies face common challenges, including understanding the complex genetic architecture of the disease, deciphering the impact of variants on multiple scales, and interpreting noncoding mutations. Here, we highlight these challenges in depth and argue that properly addressing them will require a more unified vocabulary and approach across disease communities. Toward this goal, we present a unified perspective on relating variant impact to various genomic disorders.
Deep mutational scanning: A versatile tool in systematically mapping genotypes to phenotypes.
Wei H; Li X
Frontiers in genetics2023;14;1087267
Unveiling how genetic variations lead to phenotypic variations is one of the key questions in evolutionary biology, genetics, and biomedical research. Deep mutational scanning (DMS) technology has allowed the mapping of tens of thousands of genetic variations to phenotypic variations efficiently and economically. Since its first systematic introduction about a decade ago, we have witnessed the use of deep mutational scanning in many research areas leading to scientific breakthroughs. Also, the methods in each step of deep mutational scanning have become much more versatile thanks to the oligo-synthesizing technology, high-throughput phenotyping methods and deep sequencing technology. However, each specific possible step of deep mutational scanning has its pros and cons, and some limitations still await further technological development. Here, we discuss recent scientific accomplishments achieved through the deep mutational scanning and describe widely used methods in each step of deep mutational scanning. We also compare these different methods and analyze their advantages and disadvantages, providing insight into how to design a deep mutational scanning study that best suits the aims of the readers' projects.
Scalable Functional Assays for the Interpretation of Human Genetic Variation.
Tabet D; Parikh V; Mali P; Roth FP; Claussnitzer M
Annual review of genetics2022;56;441-465
Scalable sequence-function studies have enabled the systematic analysis and cataloging of hundreds of thousands of coding and noncoding genetic variants in the human genome. This has improved clinical variant interpretation and provided insights into the molecular, biophysical, and cellular effects of genetic variants at an astonishing scale and resolution across the spectrum of allele frequencies. In this review, we explore current applications and prospects for the field and outline the principles underlying scalable functional assay design, with a focus on the study of single-nucleotide coding and noncoding variants.
Leveraging massively parallel reporter assays for evolutionary questions
Romero, Irene Gallego; Lea, Amanda J.
2022;
A long-standing goal of evolutionary biology is to decode how gene regulatory processes contribute to organismal diversity, both within and between species. This question has remained challenging to answer, due both to the difficulties of predicting function from non-coding sequence, and to the technological constraints of laboratory research with non-model taxa. However, a recent methodological development in functional genomics, the massively parallel reporter assay (MPRA), makes it possible to test thousands to millions of sequences for regulatory activity in a single in vitro experiment. It does so by combining traditional, single-locus episomal reporter assays (e.g., luciferase reporter assays) with the scalability of high-throughput sequencing. In this perspective, we discuss the execution, advantages, and limitations of MPRAs for research in evolutionary biology. We review recent studies that have made use of this approach to address explicitly evolutionary questions, highlighting study designs that we believe are particularly well-positioned to gain from MPRA approaches. Additionally, we propose solutions for extending these powerful assays to rare taxa and those with limited genomic resources. In doing so, we underscore the broad potential of MPRAs to drive genome-scale functional evolutionary genetics studies in non-traditional model organisms.
Redefining the hypotheses driving Parkinson's diseases research.
Farrow SL; Cooper AA; O'Sullivan JM
NPJ Parkinson's disease2022;8;1;45
Parkinson's disease (PD) research has largely focused on the disease as a single entity centred on the development of neuronal pathology within the central nervous system. However, there is growing recognition that PD is not a single entity but instead reflects multiple diseases, in which different combinations of environmental, genetic and potential comorbid factors interact to direct individual disease trajectories. Moreover, an increasing body of recent research implicates peripheral tissues and non-neuronal cell types in the development of PD. These observations are consistent with the hypothesis that the initial causative changes for PD development need not occur in the central nervous system. Here, we discuss how the use of neuronal pathology as a shared, qualitative phenotype minimises insights into the possibility of multiple origins and aetiologies of PD. Furthermore, we discuss how considering PD as a single entity potentially impairs our understanding of the causative molecular mechanisms, approaches for patient stratification, identification of biomarkers, and the development of therapeutic approaches to PD. The clear consequence of there being distinct diseases that collectively form PD, is that there is no single biomarker or treatment for PD development or progression. We propose that diagnosis should shift away from the clinical definitions, towards biologically defined diseases that collectively form PD, to enable informative patient stratification. N-of-one type, clinical designs offer an unbiased, and agnostic approach to re-defining PD in terms of a group of many individual diseases.
Democratizing the mapping of gene mutations to protein biophysics
Marks, Debora S.; Michnick, Stephen W.
Nature2022;604;7904;47-48
A general method that quantifies and disentangles the effects of a gene’s mutations on the traits of its protein enables assessments of mutational effects on protein biophysics for many of the proteins of a living organism. An integrated technique that quantifies allosteric effects.
Challenge accepted: uncovering the role of rare genetic variants in Alzheimer's disease.
Khani M; Gibbons E; Bras J; Guerreiro R
Molecular neurodegeneration2022;17;1;3
The search for rare variants in Alzheimer's disease (AD) is usually deemed a high-risk - high-reward situation. The challenges associated with this endeavor are real. Still, the application of genome-wide technologies to large numbers of cases and controls or to small, well-characterized families has started to be fruitful.Rare variants associated with AD have been shown to increase risk or cause disease, but also to protect against the development of AD. All of these can potentially be targeted for the development of new drugs.Multiple independent studies have now shown associations of rare variants in NOTCH3, TREM2, SORL1, ABCA7, BIN1, CLU, NCK2, AKAP9, UNC5C, PLCG2, and ABI3 with AD and suggested that they may influence disease via multiple mechanisms. These genes have reported functions in the immune system, lipid metabolism, synaptic plasticity, and apoptosis. However, the main pathway emerging from the collective of genes harboring rare variants associated with AD is the Aβ pathway. Associations of rare variants in dozens of other genes have also been proposed, but have not yet been replicated in independent studies. Replication of this type of findings is one of the challenges associated with studying rare variants in complex diseases, such as AD. In this review, we discuss some of these primary challenges as well as possible solutions.Integrative approaches, the availability of large datasets and databases, and the development of new analytical methodologies will continue to produce new genes harboring rare variability impacting AD. In the future, more extensive and more diverse genetic studies, as well as studies of deeply characterized families, will enhance our understanding of disease pathogenesis and put us on the correct path for the development of successful drugs.
MaveDB v2: a curated community database with over three million variant effects from multiplexed functional assays
Alan F Rubin; Joseph K Min; Nathan J Rollins; Estelle Y Da; Daniel Esposito; Matthew Harrington; Jeremy Stone; Aisha Haley Bianchi; Mafalda Dias; Jonathan Frazeret al
bioRxiv2022;2021.11.29.470445
bioRxiv - the preprint server for biology, operated by Cold Spring Harbor Laboratory, a research and educational institution
Multiplexed assays of variant effect (MAVEs) are capable of experimentally testing all possible single nucleotide or amino acid variants in selected genomic regions, generating 'variant effect maps', which provide biochemical insight and functional evidence to enable more rapid and accurate clinical interpretation of human variation. Because the international community applying MAVE approaches is growing rapidly, we developed the online MaveRegistry platform to catalyze collaboration, reduce redundant efforts, allow stakeholders to nominate targets and enable tracking and sharing of progress on ongoing MAVE projects.
Embeddings from protein language models predict conservation and variant effects.
Marquet C; Heinzinger M; Olenyi T; Dallago C; Erckert K; Bernhofer M; Nechaev D; Rost B
Human genetics2022;141;10;1629-1647
The emergence of SARS-CoV-2 variants stressed the demand for tools allowing to interpret the effect of single amino acid variants (SAVs) on protein function. While Deep Mutational Scanning (DMS) sets continue to expand our understanding of the mutational landscape of single proteins, the results continue to challenge analyses. Protein Language Models (pLMs) use the latest deep learning (DL) algorithms to leverage growing databases of protein sequences. These methods learn to predict missing or masked amino acids from the context of entire sequence regions. Here, we used pLM representations (embeddings) to predict sequence conservation and SAV effects without multiple sequence alignments (MSAs). Embeddings alone predicted residue conservation almost as accurately from single sequences as ConSeq using MSAs (two-state Matthews Correlation Coefficient-MCC-for ProtT5 embeddings of 0.596 ± 0.006 vs. 0.608 ± 0.006 for ConSeq). Inputting the conservation prediction along with BLOSUM62 substitution scores and pLM mask reconstruction probabilities into a simplistic logistic regression (LR) ensemble for Variant Effect Score Prediction without Alignments (VESPA) predicted SAV effect magnitude without any optimization on DMS data. Comparing predictions for a standard set of 39 DMS experiments to other methods (incl. ESM-1v, DeepSequence, and GEMME) revealed our approach as competitive with the state-of-the-art (SOTA) methods using MSA input. No method outperformed all others, neither consistently nor statistically significantly, independently of the performance measure applied (Spearman and Pearson correlation). Finally, we investigated binary effect predictions on DMS experiments for four human proteins. Overall, embedding-based methods have become competitive with methods relying on MSAs for SAV effect prediction at a fraction of the costs in computing/energy. Our method predicted SAV effects for the entire human proteome (~ 20 k proteins) within 40 min on one Nvidia Quadro RTX 8000. All methods and data sets are freely available for local and online execution through bioembeddings.com, https://github.com/Rostlab/VESPA , and PredictProtein.
Determinants of trafficking, conduction, and disease within a K+ channel revealed through multiparametric deep mutational scanning.
Coyote-Maestas W; Nedrud D; He Y; Schmidt D
eLife2022;11
A long-standing goal in protein science and clinical genetics is to develop quantitative models of sequence, structure, and function relationships to understand how mutations cause disease. Deep mutational scanning (DMS) is a promising strategy to map how amino acids contribute to protein structure and function and to advance clinical variant interpretation. Here, we introduce 7429 single-residue missense mutations into the inward rectifier K+ channel Kir2.1 and determine how this affects folding, assembly, and trafficking, as well as regulation by allosteric ligands and ion conduction. Our data provide high-resolution information on a cotranslationally folded biogenic unit, trafficking and quality control signals, and segregated roles of different structural elements in fold stability and function. We show that Kir2.1 surface trafficking mutants are underrepresented in variant effect databases, which has implications for clinical practice. By comparing fitness scores with expert-reviewed variant effects, we can predict the pathogenicity of 'variants of unknown significance' and disease mechanisms of known pathogenic mutations. Our study in Kir2.1 provides a blueprint for how multiparametric DMS can help us understand the mechanistic basis of genetic disorders and the structure-function relationships of proteins.
From variant to function in human disease genetics.
Lappalainen T; MacArthur DG
Science2021;373;6562;1464-1468
Over the next decade, the primary challenge in human genetics will be to understand the biological mechanisms by which genetic variants influence phenotypes, including disease risk. Although the scale of this challenge is daunting, better methods for functional variant interpretation will have transformative consequences for disease diagnosis, risk prediction, and the development of new therapies. An array of new methods for characterizing variant impact at scale, using patient tissue samples as well as in vitro models, are already being applied to dissect variant mechanisms across a range of human cell types and environments. These approaches are also increasingly being deployed in clinical settings. We discuss the rationale, approaches, applications, and future outlook for characterizing the molecular and cellular effects of genetic variants.
Centers for Mendelian Genomics: A decade of facilitating gene discovery
Samantha M. Baxter; Jennifer E. Posey; Nicole J. Lake; Nara Sobreira; Jessica X. Chong; Steven Buyske; Elizabeth E. Blue; Lisa H. Chadwick; Zeynep H. Coban-Akdemir; Kimberly F. Dohenyet al
Strategies to Uplift Novel Mendelian Gene Discovery for Improved Clinical Outcomes.
Seaby EG; Rehm HL; O'Donnell-Luria A
Frontiers in genetics2021;12;674295
Rare genetic disorders, while individually rare, are collectively common. They represent some of the most severe disorders affecting patients worldwide with significant morbidity and mortality. Over the last decade, advances in genomic methods have significantly uplifted diagnostic rates for patients and facilitated novel and targeted therapies. However, many patients with rare genetic disorders still remain undiagnosed as the genetic etiology of only a proportion of Mendelian conditions has been discovered to date. This article explores existing strategies to identify novel Mendelian genes and how these discoveries impact clinical care and therapeutics. We discuss the importance of data sharing, phenotype-driven approaches, patient-led approaches, utilization of large-scale genomic sequencing projects, constraint-based methods, integration of multi-omics data, and gene-to-patient methods. We further consider the health economic advantages of novel gene discovery and speculate on potential future methods for improved clinical outcomes.
American journal of human genetics2021;108;12;2248-2258
Clinical interpretation of missense variants is challenging because the majority identified by genetic testing are rare and their functional effects are unknown. Consequently, most variants are of uncertain significance and cannot be used for clinical diagnosis or management. Although not much can be done to ameliorate variant rarity, multiplexed assays of variant effect (MAVEs), where thousands of single-nucleotide variant effects are simultaneously measured experimentally, provide functional evidence that can help resolve variants of unknown significance (VUSs). However, a rigorous assessment of the clinical value of multiplexed functional data for variant interpretation is lacking. Thus, we systematically combined previously published BRCA1, TP53, and PTEN multiplexed functional data with phenotype and family history data for 324 VUSs identified by a single diagnostic testing laboratory. We curated 49,281 variant functional scores from MAVEs for these three genes and integrated four different TP53 multiplexed functional datasets into a single functional prediction for each variant by using machine learning. We then determined the strength of evidence provided by each multiplexed functional dataset and reevaluated 324 VUSs. Multiplexed functional data were effective in driving variant reclassification when combined with clinical data, eliminating 49% of VUSs for BRCA1, 69% for TP53, and 15% for PTEN. Thus, multiplexed functional data, which are being generated for numerous genes, are poised to have a major impact on clinical variant interpretation.