Publications
AVE mentioned or cited in Scientific Publications
Variant effect predictor correlation with functional assays is reflective of clinical classification performance.
Genome biology 2025;26;1;104
Understanding the relationship between protein sequence and function is crucial for accurate classification of missense variants. Variant effect predictors (VEPs) play a vital role in deciphering this complex relationship, yet evaluating their performance remains challenging for several reasons, including data circularity, where the same or related data is used for training and assessment. High-throughput experimental strategies like deep mutational scanning (DMS) offer a promising solution.
Site-saturation mutagenesis of 500 human protein domains.
Nature 2025;637;8047;885-894
Missense variants that change the amino acid sequences of proteins cause one-third of human genetic diseases1. Tens of millions of missense variants exist in the current human population, and the vast majority of these have unknown functional consequences. Here we present a large-scale experimental analysis of human missense variants across many different proteins. Using DNA synthesis and cellular selection experiments we quantify the effect of more than 500,000 variants on the abundance of more than 500 human protein domains. This dataset reveals that 60% of pathogenic missense variants reduce protein stability. The contribution of stability to protein fitness varies across proteins and diseases and is particularly important in recessive disorders. We combine stability measurements with protein language models to annotate functional sites across proteins. Mutational effects on stability are largely conserved in homologous domains, enabling accurate stability prediction across entire protein families using energy models. Our data demonstrate the feasibility of assaying human protein variants at scale and provides a large consistent reference dataset for clinical variant interpretation and training and benchmarking of computational methods.
PUBMED: 39779847 PMC: PMC11754108 DOI: 10.1038/s41586-024-08370-4
Epitope mapping via in vitro deep mutational scanning methods and its applications.
The Journal of biological chemistry 2025;301;1;108072
Epitope mapping is a technique employed to define the region of an antigen that elicits an immune response, providing crucial insight into the structural architecture of the antigen as well as epitope-paratope interactions. With this breadth of knowledge, immunotherapies, diagnostics, and vaccines are being developed with a rational and data-supported design. Traditional epitope mapping methods are laborious, time-intensive, and often lack the ability to screen proteins in a high-throughput manner or provide high resolution. Deep mutational scanning (DMS), however, is revolutionizing the field as it can screen all possible single amino acid mutations and provide an efficient and high-throughput way to infer the structures of both linear and three-dimensional epitopes with high resolution. Currently, more than 50 publications take this approach to efficiently identify enhancing or escaping mutations, with many then employing this information to rapidly develop broadly neutralizing antibodies, T-cell immunotherapies, vaccine platforms, or diagnostics. We provide a comprehensive review of the approaches to accomplish epitope mapping while also providing a summation of the development of DMS technology and its impactful applications.
PUBMED: 39674321 PMC: PMC11783119 DOI: 10.1016/j.jbc.2024.108072
SSEmb: A joint embedding of protein sequence and structure enables robust variant effect predictions.
Nature Communications 2024;15;1;9646
The ability to predict how amino acid changes affect proteins has a wide range of applications including in disease variant classification and protein engineering. Many existing methods focus on learning from patterns found in either protein sequences or protein structures. Here, we present a method for integrating information from sequence and structure in a single model that we term SSEmb (Sequence Structure Embedding). SSEmb combines a graph representation for the protein structure with a transformer model for processing multiple sequence alignments. We show that by integrating both types of information we obtain a variant effect prediction model that is robust when sequence information is scarce. We also show that SSEmb learns embeddings of the sequence and structure that are useful for other downstream tasks such as to predict protein-protein binding sites. We envisage that SSEmb may be useful both for variant effect predictions and as a representation for learning to predict protein properties that depend on sequence and structure.
PUBMED: 39511177 PMC: PMC11544099 DOI: 10.1038/s41467-024-53982-z
Using multiplexed functional data to reduce variant classification inequities in underrepresented populations.
Genome medicine 2024;16;1;143
Multiplexed Assays of Variant Effects (MAVEs) can test all possible single variants in a gene of interest. The resulting saturation-style functional data may help resolve variant classification disparities between populations, especially for Variants of Uncertain Significance (VUS).
PUBMED: 39627863 PMC: PMC11616159 DOI: 10.1186/s13073-024-01392-7
2024 Clinical Atlas of Variant Effects meeting summary
Zenodo
Executive summary from the Clinical Atlas of Variant Effects meeting held in Pittsburgh, PA USA on July 23rd, 2024. This meeting was made possible in part by a grant to the University of Pittsburgh from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation.
Variation to biology: optimizing functional analysis of cancer risk variants.
Journal of the National Cancer Institute 2024;116;12;1882-1889
Research conducted over the past 15+ years has identified hundreds of common germline genetic variants associated with cancer risk, but understanding the biological impact of these primarily non-protein coding variants has been challenging. The National Cancer Institute sought to better understand and address those challenges by requesting input from the scientific community via a survey and a 2-day virtual meeting, which focused on discussions among participants. Here, we discuss challenges identified through the survey as important to advancing functional analysis of common cancer risk variants: 1) When is a variant truly characterized; 2) Developing and standardizing databases and computational tools; 3) Optimization and implementation of high-throughput assays; 4) Use of model organisms for understanding variant function; 5) Diversity in data and assays; and 6) Creating and improving large multidisciplinary collaborations. We define these 6 challenges, describe how success in addressing them may look, propose potential solutions, and note issues that span all the challenges. Implementation of these ideas could help develop a framework for methodically analyzing common cancer risk variants to understand their function and make effective and efficient use of the wealth of existing genomic association data.
High-throughput assays to assess variant effects on disease.
Disease models & mechanisms 2024;17;6
Interpreting the wealth of rare genetic variants discovered in population-scale sequencing efforts and deciphering their associations with human health and disease present a critical challenge due to the lack of sufficient clinical case reports. One promising avenue to overcome this problem is deep mutational scanning (DMS), a method of introducing and evaluating large-scale genetic variants in model cell lines. DMS allows unbiased investigation of variants, including those that are not found in clinical reports, thus improving rare disease diagnostics. Currently, the main obstacle limiting the full potential of DMS is the availability of functional assays that are specific to disease mechanisms. Thus, we explore high-throughput functional methodologies suitable to examine broad disease mechanisms. We specifically focus on methods that do not require robotics or automation but instead use well-designed molecular tools to transform biological mechanisms into easily detectable signals, such as cell survival rate, fluorescence or drug resistance. Here, we aim to bridge the gap between disease-relevant assays and their integration into the DMS framework.
PRKN-linked familial Parkinson's disease: cellular and molecular mechanisms of disease-linked variants.
Cellular and molecular life sciences : CMLS 2024;81;1;223
Parkinson's disease (PD) is a common and incurable neurodegenerative disorder that arises from the loss of dopaminergic neurons in the substantia nigra and is mainly characterized by progressive loss of motor function. Monogenic familial PD is associated with highly penetrant variants in specific genes, notably the PRKN gene, where homozygous or compound heterozygous loss-of-function variants predominate. PRKN encodes Parkin, an E3 ubiquitin-protein ligase important for protein ubiquitination and mitophagy of damaged mitochondria. Accordingly, Parkin plays a central role in mitochondrial quality control but is itself also subject to a strict protein quality control system that rapidly eliminates certain disease-linked Parkin variants. Here, we summarize the cellular and molecular functions of Parkin, highlighting the various mechanisms by which PRKN gene variants result in loss-of-function. We emphasize the importance of high-throughput assays and computational tools for the clinical classification of PRKN gene variants and how detailed insights into the pathogenic mechanisms of PRKN gene variants may impact the development of personalized therapeutics.
PUBMED: 38767677 PMC: PMC11106057 DOI: 10.1007/s00018-024-05262-8
Analyzing the functional effects of DNA variants with gene editing.
Cell reports methods 2024;4;5;100776
Continual advancements in genomics have led to an ever-widening disparity between the rate of discovery of genetic variants and our current understanding of their functions and potential roles in disease. Systematic methods for phenotyping DNA variants are required to effectively translate genomics data into improved outcomes for patients with genetic diseases. To make the biggest impact, these approaches must be scalable and accurate, faithfully reflect disease biology, and define complex disease mechanisms. We compare current methods to analyze the function of variants in their endogenous DNA context using genome editing strategies, such as saturation genome editing, base editing and prime editing. We discuss how these technologies can be linked to high-content readouts to gain deep mechanistic insights into variant effects. Finally, we highlight key challenges that need to be addressed to bridge the genotype to phenotype gap, and ultimately improve the diagnosis and treatment of genetic diseases.
PUBMED: 38744287 PMC: PMC11133854 DOI: 10.1016/j.crmeth.2024.100776
Variant effect predictor correlation with functional assays is reflective of clinical classification performance
bioRxiv 2024;2024.05.12.593741
bioRxiv - the preprint server for biology, operated by Cold Spring Harbor Laboratory, a research and educational institution
Guidelines for releasing a variant effect predictor.
ArXiv 2024
Computational methods for assessing the likely impacts of mutations, known as variant effect predictors (VEPs), are widely used in the assessment and interpretation of human genetic variation, as well as in other applications like protein engineering. Many different VEPs have been released to date, and there is tremendous variability in their underlying algorithms and outputs, and in the ways in which the methodologies and predictions are shared. This leads to considerable challenges for end users in knowing which VEPs to use and how to use them. Here, to address these issues, we provide guidelines and recommendations for the release of novel VEPs. Emphasising open-source availability, transparent methodologies, clear variant effect score interpretations, standardised scales, accessible predictions, and rigorous training data disclosure, we aim to improve the usability and interpretability of VEPs, and promote their integration into analysis and evaluation pipelines. We also provide a large, categorised list of currently available VEPs, aiming to facilitate the discovery and encourage the usage of novel methods within the scientific community.
Minimum information and guidelines for reporting a multiplexed assay of variant effect.
Genome biology 2024;25;1;100
Multiplexed assays of variant effect (MAVEs) have emerged as a powerful approach for interrogating thousands of genetic variants in a single experiment. The flexibility and widespread adoption of these techniques across diverse disciplines have led to a heterogeneous mix of data formats and descriptions, which complicates the downstream use of the resulting datasets. To address these issues and promote reproducibility and reuse of MAVE data, we define a set of minimum information standards for MAVE data and metadata and outline a controlled vocabulary aligned with established biomedical ontologies for describing these experimental designs.
PUBMED: 38641812 PMC: PMC11027375 DOI: 10.1186/s13059-024-03223-9
Workshop report: the clinical application of data from multiplex assays of variant effect (MAVEs), 12 July 2023.
European journal of human genetics : EJHG 2024;32;5;593-600
Clinical classification of genomic variants identified on sequencing is often challenging, with many variants classified as Variants of Uncertain Significance (VUS) on account of insufficient evidence. Advances in sequencing and gene synthesis has made feasible multiplexed assays of variant effect (MAVEs), which quantify the functional impact of many thousands of genomic variants in a single experiment. These assays and the functional evidence they generate have the potential to empower more accurate clinical variant classification. However, there are many outstanding challenges and opportunities that require joint resolution and specification, thus necessitating communication between the research scientists who have designed and performed MAVEs and the clinicians and diagnostic scientists who will apply their data to clinical variant classification. In the ‘Clinical Application of MAVE Data’ workshop, held on 12th July 2023 at the Wellcome Connecting Science Conference Centre in between two relevant research meetings, ‘Curating the Clinical Genome 2023’ and the ‘Mutational Scanning Symposium 2023’, 44 key scientific and/or clinical stakeholders were brought together to consider important questions relating to clinical application of MAVE data, such as quantitative validation, variant truth-sets, platforms and standards for dissemination of MAVE data. The outcomes and possible next steps that were discussed encompassed development of focused workshops to develop consensus recommendations, creating a MAVE evaluation working group, and collaboration of ClinVar and MaveDB to enact software changes that support enhanced functional data submission.
PUBMED: 38433264 PMC: PMC11061192 DOI: 10.1038/s41431-024-01566-2
Ensembl 2024.
Nucleic acids research 2024;52;D1;D891-D899
Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates and model organisms for more than two decades. In recent years, there has been a dramatic shift in the genomic landscape, with a large increase in the number and phylogenetic breadth of high-quality reference genomes, alongside major advances in the pan-genome representations of higher species. In order to support these efforts and accelerate downstream research, Ensembl continues to focus on scaling for the rapid annotation of new genome assemblies, developing new methods for comparative analysis, and expanding the depth and quality of our genome annotations. This year we have continued our expansion to support global biodiversity research, doubling the number of annotated genomes we support on our Rapid Release site to over 1700, driven by our close collaboration with biodiversity projects such as Darwin Tree of Life. We have also strengthened support for key agricultural species, including the first regulatory builds for farmed animals, and have updated key tools and resources that support the global scientific community, notably the Ensembl Variant Effect Predictor. Ensembl data, software, and tools are freely available.
Will variants of uncertain significance still exist in 2030?
American journal of human genetics 2024;111;1;5-10
In 2020, the National Human Genome Research Institute (NHGRI) made ten "bold predictions," including that "the clinical relevance of all encountered genomic variants will be readily predictable, rendering the diagnostic designation 'variant of uncertain significance (VUS)' obsolete." We discuss the prospects for this prediction, arguing that many, if not most, VUS in coding regions will be resolved by 2030. We outline a confluence of recent changes making this possible, especially advances in the standards for variant classification that better leverage diverse types of evidence, improvements in computational variant effect predictor performance, scalable multiplexed assays of variant effect capable of saturating the genome, and data-sharing efforts that will maximize the information gained from each new individual sequenced and variant interpreted. We suggest that clinicians and researchers can realize a future where VUSs have largely been eliminated, in line with the NHGRI's bold prediction. The length of time taken to reach this future, and thus whether we are able to achieve the goal of largely eliminating VUSs by 2030, is largely a consequence of the choices made now and in the next few years. We believe that investing in eliminating VUSs is worthwhile, since their predominance remains one of the biggest challenges to precision genomic medicine.
PUBMED: 38086381 PMC: PMC10806733 DOI: 10.1016/j.ajhg.2023.11.005
Overview of modern genomic tools for diagnosis and precision therapy of childhood solid cancers.
Current opinion in pediatrics 2024;36;1;71-77
The application of technology and computational analyses to generate new data types from pediatric solid cancers is transforming diagnostic accuracy. This review provides an overview of such new capabilities in the pursuit of improved treatment for essentially rare and underserved diseases that are the highest cause of mortality in children over one year of age. Sophisticated ways of identifying therapeutic vulnerabilities for highly personalized treatment are presented alongside cutting-edge disease response monitoring by liquid biopsy.
PUBMED: 37972971 PMC: PMC10763706 DOI: 10.1097/MOP.0000000000001311
Predicting pathogenic protein variants.
Science 2023;381;6664;1284-1285
Machine-learning algorithm uses structure prediction to spot disease-causing mutations.
An Atlas of Variant Effects to understand the genome at nucleotide resolution.
Genome biology 2023;24;1;147
Sequencing has revealed hundreds of millions of human genetic variants, and continued efforts will only add to this variant avalanche. Insufficient information exists to interpret the effects of most variants, limiting opportunities for precision medicine and comprehension of genome function. A solution lies in experimental assessment of the functional effect of variants, which can reveal their biological and clinical impact. However, variant effect assays have generally been undertaken reactively for individual variants only after and, in most cases long after, their first observation. Now, multiplexed assays of variant effect can characterise massive numbers of variants simultaneously, yielding variant effect maps that reveal the function of every possible single nucleotide change in a gene or regulatory element. Generating maps for every protein encoding gene and regulatory element in the human genome would create an 'Atlas' of variant effect maps and transform our understanding of genetics and usher in a new era of nucleotide-resolution functional knowledge of the genome. An Atlas would reveal the fundamental biology of the human genome, inform human evolution, empower the development and use of therapeutics and maximize the utility of genomics for diagnosing and treating disease. The Atlas of Variant Effects Alliance is an international collaborative group comprising hundreds of researchers, technologists and clinicians dedicated to realising an Atlas of Variant Effects to help deliver on the promise of genomics.
PUBMED: 37394429 PMC: PMC10316620 DOI: 10.1186/s13059-023-02986-x
SUNi mutagenesis: Scalable and uniform nicking for efficient generation of variant libraries.
PloS one 2023;18;7;e0288158
Multiplexed assays of variant effects (MAVEs) have made possible the functional assessment of all possible mutations to genes and regulatory sequences. A core pillar of the approach is generation of variant libraries, but current methods are either difficult to scale or not uniform enough to enable MAVEs at the scale of gene families or beyond. We present an improved method called Scalable and Uniform Nicking (SUNi) mutagenesis that combines massive scalability with high uniformity to enable cost-effective MAVEs of gene families and eventually genomes.
PUBMED: 37418460 PMC: PMC10328370 DOI: 10.1371/journal.pone.0288158
Mapping MAVE data for use in human genomics applications.
bioRxiv 2024
The large-scale experimental measures of variant functional assays submitted to MaveDB have the potential to provide key information for resolving variants of uncertain significance, but the reporting of results relative to assayed sequence hinders their downstream utility. The Atlas of Variant Effects Alliance mapped multiplexed assays of variant effect data to human reference sequences, creating a robust set of machine-readable homology mappings. This method processed approximately 2.5 million protein and genomic variants in MaveDB, successfully mapping 98.61% of examined variants and disseminating data to resources such as the UCSC Genome Browser and Ensembl Variant Effect Predictor.
PUBMED: 38979347 PMC: PMC11230167 DOI: 10.1101/2023.06.20.545702
Correspondence between functional scores from deep mutational scans and predicted effects on protein stability.
Protein science : a publication of the Protein Society 2023;32;7;e4688
Many methodologically diverse computational methods have been applied to the growing challenge of predicting and interpreting the effects of protein variants. As many pathogenic mutations have a perturbing effect on protein stability or intermolecular interactions, one highly interpretable approach is to use protein structural information to model the physical impacts of variants and predict their likely effects on protein stability and interactions. Previous efforts have assessed the accuracy of stability predictors in reproducing thermodynamically accurate values and evaluated their ability to distinguish between known pathogenic and benign mutations. Here, we take an alternate approach, and explore how well stability predictor scores correlate with functional impacts derived from deep mutational scanning (DMS) experiments. In this work, we compare the predictions of 9 protein stability-based tools against mutant protein fitness values from 49 independent DMS datasets, covering 170,940 unique single amino acid variants. We find that FoldX and Rosetta show the strongest correlations with DMS-based functional scores, similar to their previous top performance in distinguishing between pathogenic and benign variants. For both methods, performance is considerably improved when considering intermolecular interactions from protein complex structures, when available. Furthermore, using these two predictors, we derive a "Foldetta" consensus score, which improves upon the performance of both, and manages to match dedicated variant effect predictors in reflecting variant functional impacts. Finally, we also highlight that predicted stability effects show consistently higher correlations with certain DMS experimental phenotypes, particularly those based upon protein abundance, and, in certain cases, can significantly outcompete sequence-based variant effect prediction methodologies for predicting functional scores from DMS experiments.
First-Tier Next Generation Sequencing for Newborn Screening: An Important Role for Biochemical Second-Tier Testing.
Genetics in medicine open 2023;1;1
There is discussion of expanding newborn screening (NBS) through the use of genomic sequence data; yet, challenges remain in the interpretation of DNA variants. Population-level DNA variant databases are available, and it is possible to estimate the number of newborns who would be flagged as having a risk for a genetic disease (including rare variants of unknown significance, VUS) via next-generation sequencing (NGS) positive. Estimates of the number of newborns screened as NGS positive for monogenic recessive diseases were obtained by analysis of the Genome Aggregation Database (gnomAD). For a collection of diseases for which there is interest in NBS, we provided 2 estimates for the expected number of newborns screened as NGS positive. For a set of lysosomal storage diseases, we estimated that 100 to approximately 600 NGS screen positives would be found per disease per year in a large NBS laboratory (California), and this figure may be expected to rise to a limit of about 1000 if we account for the fact that gnomAD does not contain all worldwide variants. The number of positives would drop 2.5- to 10-fold if the 10 VUS with highest allele frequency were biochemically annotated as benign. It is proposed that a second-tier biochemical assay using the same dried blood spot could be carried out as a filter and as part of NBS to reduce the number of high-risk NGS positive newborns to a manageable number.
PUBMED: 39238532 PMC: PMC11377026 DOI: 10.1016/j.gimo.2023.100821
DIMPLE: deep insertion, deletion, and missense mutation libraries for exploring protein variation in evolution, disease, and biology.
Genome biology 2023;24;1;36
Insertions and deletions (indels) enable evolution and cause disease. Due to technical challenges, indels are left out of most mutational scans, limiting our understanding of them in disease, biology, and evolution. We develop a low cost and bias method, DIMPLE, for systematically generating deletions, insertions, and missense mutations in genes, which we test on a range of targets, including Kir2.1. We use DIMPLE to study how indels impact potassium channel structure, disease, and evolution. We find deletions are most disruptive overall, beta sheets are most sensitive to indels, and flexible loops are sensitive to deletions yet tolerate insertions.
PUBMED: 36829241 PMC: PMC9951526 DOI: 10.1186/s13059-023-02880-6
Unified views on variant impact across many diseases.
Trends in genetics 2023;39;6;442-450
Genomic studies of human disorders are often performed by distinct research communities (i.e., focused on rare diseases, common diseases, or cancer). Despite underlying differences in the mechanistic origin of different disease categories, these studies share the goal of identifying causal genomic events that are critical for the clinical manifestation of the disease phenotype. Moreover, these studies face common challenges, including understanding the complex genetic architecture of the disease, deciphering the impact of variants on multiple scales, and interpreting noncoding mutations. Here, we highlight these challenges in depth and argue that properly addressing them will require a more unified vocabulary and approach across disease communities. Toward this goal, we present a unified perspective on relating variant impact to various genomic disorders.
PUBMED: 36858880 PMC: PMC10192142 DOI: 10.1016/j.tig.2023.02.002
Deep mutational scanning: A versatile tool in systematically mapping genotypes to phenotypes.
Frontiers in genetics 2023;14;1087267
Unveiling how genetic variations lead to phenotypic variations is one of the key questions in evolutionary biology, genetics, and biomedical research. Deep mutational scanning (DMS) technology has allowed the mapping of tens of thousands of genetic variations to phenotypic variations efficiently and economically. Since its first systematic introduction about a decade ago, we have witnessed the use of deep mutational scanning in many research areas leading to scientific breakthroughs. Also, the methods in each step of deep mutational scanning have become much more versatile thanks to the oligo-synthesizing technology, high-throughput phenotyping methods and deep sequencing technology. However, each specific possible step of deep mutational scanning has its pros and cons, and some limitations still await further technological development. Here, we discuss recent scientific accomplishments achieved through the deep mutational scanning and describe widely used methods in each step of deep mutational scanning. We also compare these different methods and analyze their advantages and disadvantages, providing insight into how to design a deep mutational scanning study that best suits the aims of the readers' projects.
PUBMED: 36713072 PMC: PMC9878224 DOI: 10.3389/fgene.2023.1087267
Updated benchmarking of variant effect predictors using deep mutational scanning
bioRxiv 2022;2022.11.19.517196
bioRxiv - the preprint server for biology, operated by Cold Spring Harbor Laboratory, a research and educational institution
Scalable Functional Assays for the Interpretation of Human Genetic Variation.
Annual review of genetics 2022;56;441-465
Scalable sequence-function studies have enabled the systematic analysis and cataloging of hundreds of thousands of coding and noncoding genetic variants in the human genome. This has improved clinical variant interpretation and provided insights into the molecular, biophysical, and cellular effects of genetic variants at an astonishing scale and resolution across the spectrum of allele frequencies. In this review, we explore current applications and prospects for the field and outline the principles underlying scalable functional assay design, with a focus on the study of single-nucleotide coding and noncoding variants.
Leveraging massively parallel reporter assays for evolutionary questions
2022;
A long-standing goal of evolutionary biology is to decode how gene regulatory processes contribute to organismal diversity, both within and between species. This question has remained challenging to answer, due both to the difficulties of predicting function from non-coding sequence, and to the technological constraints of laboratory research with non-model taxa. However, a recent methodological development in functional genomics, the massively parallel reporter assay (MPRA), makes it possible to test thousands to millions of sequences for regulatory activity in a single in vitro experiment. It does so by combining traditional, single-locus episomal reporter assays (e.g., luciferase reporter assays) with the scalability of high-throughput sequencing. In this perspective, we discuss the execution, advantages, and limitations of MPRAs for research in evolutionary biology. We review recent studies that have made use of this approach to address explicitly evolutionary questions, highlighting study designs that we believe are particularly well-positioned to gain from MPRA approaches. Additionally, we propose solutions for extending these powerful assays to rare taxa and those with limited genomic resources. In doing so, we underscore the broad potential of MPRAs to drive genome-scale functional evolutionary genetics studies in non-traditional model organisms.
Redefining the hypotheses driving Parkinson's diseases research.
NPJ Parkinson's disease 2022;8;1;45
Parkinson's disease (PD) research has largely focused on the disease as a single entity centred on the development of neuronal pathology within the central nervous system. However, there is growing recognition that PD is not a single entity but instead reflects multiple diseases, in which different combinations of environmental, genetic and potential comorbid factors interact to direct individual disease trajectories. Moreover, an increasing body of recent research implicates peripheral tissues and non-neuronal cell types in the development of PD. These observations are consistent with the hypothesis that the initial causative changes for PD development need not occur in the central nervous system. Here, we discuss how the use of neuronal pathology as a shared, qualitative phenotype minimises insights into the possibility of multiple origins and aetiologies of PD. Furthermore, we discuss how considering PD as a single entity potentially impairs our understanding of the causative molecular mechanisms, approaches for patient stratification, identification of biomarkers, and the development of therapeutic approaches to PD. The clear consequence of there being distinct diseases that collectively form PD, is that there is no single biomarker or treatment for PD development or progression. We propose that diagnosis should shift away from the clinical definitions, towards biologically defined diseases that collectively form PD, to enable informative patient stratification. N-of-one type, clinical designs offer an unbiased, and agnostic approach to re-defining PD in terms of a group of many individual diseases.
PUBMED: 35440633 PMC: PMC9018840 DOI: 10.1038/s41531-022-00307-w
Democratizing the mapping of gene mutations to protein biophysics
Nature 2022;604;7904;47-48
A general method that quantifies and disentangles the effects of a gene’s mutations on the traits of its protein enables assessments of mutational effects on protein biophysics for many of the proteins of a living organism. An integrated technique that quantifies allosteric effects.
Challenge accepted: uncovering the role of rare genetic variants in Alzheimer's disease.
Molecular neurodegeneration 2022;17;1;3
The search for rare variants in Alzheimer's disease (AD) is usually deemed a high-risk - high-reward situation. The challenges associated with this endeavor are real. Still, the application of genome-wide technologies to large numbers of cases and controls or to small, well-characterized families has started to be fruitful.Rare variants associated with AD have been shown to increase risk or cause disease, but also to protect against the development of AD. All of these can potentially be targeted for the development of new drugs.Multiple independent studies have now shown associations of rare variants in NOTCH3, TREM2, SORL1, ABCA7, BIN1, CLU, NCK2, AKAP9, UNC5C, PLCG2, and ABI3 with AD and suggested that they may influence disease via multiple mechanisms. These genes have reported functions in the immune system, lipid metabolism, synaptic plasticity, and apoptosis. However, the main pathway emerging from the collective of genes harboring rare variants associated with AD is the Aβ pathway. Associations of rare variants in dozens of other genes have also been proposed, but have not yet been replicated in independent studies. Replication of this type of findings is one of the challenges associated with studying rare variants in complex diseases, such as AD. In this review, we discuss some of these primary challenges as well as possible solutions.Integrative approaches, the availability of large datasets and databases, and the development of new analytical methodologies will continue to produce new genes harboring rare variability impacting AD. In the future, more extensive and more diverse genetic studies, as well as studies of deeply characterized families, will enhance our understanding of disease pathogenesis and put us on the correct path for the development of successful drugs.
PUBMED: 35000612 PMC: PMC8744312 DOI: 10.1186/s13024-021-00505-9
MaveDB v2: a curated community database with over three million variant effects from multiplexed functional assays
bioRxiv 2022;2021.11.29.470445
bioRxiv - the preprint server for biology, operated by Cold Spring Harbor Laboratory, a research and educational institution
MaveRegistry: a collaboration platform for multiplexed assays of variant effect.
Bioinformatics (Oxford, England) 2021;37;19;3382-3383
Multiplexed assays of variant effect (MAVEs) are capable of experimentally testing all possible single nucleotide or amino acid variants in selected genomic regions, generating 'variant effect maps', which provide biochemical insight and functional evidence to enable more rapid and accurate clinical interpretation of human variation. Because the international community applying MAVE approaches is growing rapidly, we developed the online MaveRegistry platform to catalyze collaboration, reduce redundant efforts, allow stakeholders to nominate targets and enable tracking and sharing of progress on ongoing MAVE projects.
PUBMED: 33774657 PMC: PMC8504617 DOI: 10.1093/bioinformatics/btab215
Embeddings from protein language models predict conservation and variant effects.
Human genetics 2022;141;10;1629-1647
The emergence of SARS-CoV-2 variants stressed the demand for tools allowing to interpret the effect of single amino acid variants (SAVs) on protein function. While Deep Mutational Scanning (DMS) sets continue to expand our understanding of the mutational landscape of single proteins, the results continue to challenge analyses. Protein Language Models (pLMs) use the latest deep learning (DL) algorithms to leverage growing databases of protein sequences. These methods learn to predict missing or masked amino acids from the context of entire sequence regions. Here, we used pLM representations (embeddings) to predict sequence conservation and SAV effects without multiple sequence alignments (MSAs). Embeddings alone predicted residue conservation almost as accurately from single sequences as ConSeq using MSAs (two-state Matthews Correlation Coefficient-MCC-for ProtT5 embeddings of 0.596 ± 0.006 vs. 0.608 ± 0.006 for ConSeq). Inputting the conservation prediction along with BLOSUM62 substitution scores and pLM mask reconstruction probabilities into a simplistic logistic regression (LR) ensemble for Variant Effect Score Prediction without Alignments (VESPA) predicted SAV effect magnitude without any optimization on DMS data. Comparing predictions for a standard set of 39 DMS experiments to other methods (incl. ESM-1v, DeepSequence, and GEMME) revealed our approach as competitive with the state-of-the-art (SOTA) methods using MSA input. No method outperformed all others, neither consistently nor statistically significantly, independently of the performance measure applied (Spearman and Pearson correlation). Finally, we investigated binary effect predictions on DMS experiments for four human proteins. Overall, embedding-based methods have become competitive with methods relying on MSAs for SAV effect prediction at a fraction of the costs in computing/energy. Our method predicted SAV effects for the entire human proteome (~ 20 k proteins) within 40 min on one Nvidia Quadro RTX 8000. All methods and data sets are freely available for local and online execution through bioembeddings.com, https://github.com/Rostlab/VESPA , and PredictProtein.
PUBMED: 34967936 PMC: PMC8716573 DOI: 10.1007/s00439-021-02411-y
From variant to function in human disease genetics.
Science 2021;373;6562;1464-1468
Over the next decade, the primary challenge in human genetics will be to understand the biological mechanisms by which genetic variants influence phenotypes, including disease risk. Although the scale of this challenge is daunting, better methods for functional variant interpretation will have transformative consequences for disease diagnosis, risk prediction, and the development of new therapies. An array of new methods for characterizing variant impact at scale, using patient tissue samples as well as in vitro models, are already being applied to dissect variant mechanisms across a range of human cell types and environments. These approaches are also increasingly being deployed in clinical settings. We discuss the rationale, approaches, applications, and future outlook for characterizing the molecular and cellular effects of genetic variants.
Centers for Mendelian Genomics: A decade of facilitating gene discovery
medRxiv 2021;2021.08.24.21261656
medRxiv - The Preprint Server for Health Sciences
Strategies to Uplift Novel Mendelian Gene Discovery for Improved Clinical Outcomes.
Frontiers in genetics 2021;12;674295
Rare genetic disorders, while individually rare, are collectively common. They represent some of the most severe disorders affecting patients worldwide with significant morbidity and mortality. Over the last decade, advances in genomic methods have significantly uplifted diagnostic rates for patients and facilitated novel and targeted therapies. However, many patients with rare genetic disorders still remain undiagnosed as the genetic etiology of only a proportion of Mendelian conditions has been discovered to date. This article explores existing strategies to identify novel Mendelian genes and how these discoveries impact clinical care and therapeutics. We discuss the importance of data sharing, phenotype-driven approaches, patient-led approaches, utilization of large-scale genomic sequencing projects, constraint-based methods, integration of multi-omics data, and gene-to-patient methods. We further consider the health economic advantages of novel gene discovery and speculate on potential future methods for improved clinical outcomes.
PUBMED: 34220947 PMC: PMC8248347 DOI: 10.3389/fgene.2021.674295
Closing the gap: Systematic integration of multiplexed functional data resolves variants of uncertain significance in BRCA1, TP53, and PTEN.
American journal of human genetics 2021;108;12;2248-2258
Clinical interpretation of missense variants is challenging because the majority identified by genetic testing are rare and their functional effects are unknown. Consequently, most variants are of uncertain significance and cannot be used for clinical diagnosis or management. Although not much can be done to ameliorate variant rarity, multiplexed assays of variant effect (MAVEs), where thousands of single-nucleotide variant effects are simultaneously measured experimentally, provide functional evidence that can help resolve variants of unknown significance (VUSs). However, a rigorous assessment of the clinical value of multiplexed functional data for variant interpretation is lacking. Thus, we systematically combined previously published BRCA1, TP53, and PTEN multiplexed functional data with phenotype and family history data for 324 VUSs identified by a single diagnostic testing laboratory. We curated 49,281 variant functional scores from MAVEs for these three genes and integrated four different TP53 multiplexed functional datasets into a single functional prediction for each variant by using machine learning. We then determined the strength of evidence provided by each multiplexed functional dataset and reevaluated 324 VUSs. Multiplexed functional data were effective in driving variant reclassification when combined with clinical data, eliminating 49% of VUSs for BRCA1, 69% for TP53, and 15% for PTEN. Thus, multiplexed functional data, which are being generated for numerous genes, are poised to have a major impact on clinical variant interpretation.
PUBMED: 34793697 PMC: PMC8715144 DOI: 10.1016/j.ajhg.2021.11.001