F-CAP: Functionalization of Variants in Clinically Actionable Pharmacogenes
Patient-to-patient variability in response to drugs creates a significant challenge for the safe and effective treatment of many human diseases. Pharmacogenomics seeks to address this challenge by linking drug response to patient genotypes at important loci, termed pharmacogenes, in order to better customize patient treatments. Genetic variation in pharmacogenes is extensive. For example, amongst 12 CYP genes, 10% of people carry at least one rare, potentially deleterious variant. Unfortunately, only a small number of variants have been unambiguously linked to alterations in drug response. Clearly, new approaches are needed to annotate the consequences of the huge pool of variants of unknown significance, including those already identified by existing large-scale sequencing programs, and those that will be discovered as clinical sequencing becomes routine. In this proposal, we seek to address this problem directly and at a scale never before possible by taking advantage of new technologies in sequencing and functional analysis. Our resource, termed F-CAP (Functionalization of Variants in Clinically Actionable Pharmacogenes) will test all possible substitutions at all amino acid residues in some of the most clinically important pharmacogenes and disseminate these data to the medical and research communities. In order to accomplish this, we will use deep mutational scanning, a method we have developed that allows parallelized, quantitative measurements to be performed on libraries of genetic variants. In Aim 1 we will create these libraries, starting with five of the most important CPIC level A or B priority genes (CYP2C9, CYP2C19, CYP2D6, TPMT and VKORC1), and test the stability and enzymatic activity of each variant en masse using a pooled selection strategy. In Aim 2, we will integrate these data to create an impact score. This impact score provides a numerical value for a variant’s functional effects that is amenable to easy integration into prescribing guidelines being developed by the pharmacogenomics community. Aim 3 will validate this score for a subset of variants that span the impact score spectrum using therapeutically relevant substrates for each pharmacogene. Finally, Aim 4 describes a key component of this resource: the dissemination of our findings to the entire pharmacogenomics community through partnership with CPIC and PharmGKB. In addition, we will make available our raw and processed data via a custom web resource that will also be developed in Aim 4. This resource will provide a series of fully annotated datasets describing the functional consequences of every possible single mutation in a series of key pharmacogenes, thereby greatly advancing the field of personalized medicine.
PROJECT NARRATIVE Pharmacogenomics seeks to identify genetic sources of inter-individual variability in drug response, with the goal of personalizing drug selection and dose to improve patient outcomes. A key barrier to using pharmacogenomic information is lack of clarity about the functional impact of variants, which hampers the provision of clear, unambiguous guidance to health care providers. We propose to connect pharmacogenomic variant discovery with novel high-throughput experimental approaches to deliver a resource to guide the use of individual pharmacogenomic information in personalizing drug treatment.
Pharmacogenomics seeks to identify genetic sources of inter-individual variability in drug response, with the goal of personalizing drug treatment to improve patient outcomes. A key barrier to using pharmacogenomic information is lack of clarity about the functional impact of variants, which hampers the provision of clear, unambiguous guidance to health care providers. The overarching goal of this application is to connect pharmacogenomic variant discovery with high-throughput experimental approaches to deliver a resource that we have termed F-CAP, (Functionalization of Variants in Clinically Actionable Pharmacogenes), which will catalog the functional impact of every genetic variant at all amino acid positions in a set of prioritized pharmacogenes. The need for such a resource is highlighted by our recent study of variation in CYP genes. 730 novel nonsynonymous variants in 12 CYPs were discovered in the exomes of ~6,500 individuals. These variants were individually rare, but collectively common because ~10% of individuals carried at least one potentially deleterious novel variant at one of these 12 loci. These genes also contained previously known rare (MAF < 0.5-1%) variants whose functional consequences remain unclear. These results, obtained from a limited number of individuals relative to the number of patients who will ultimately be genotyped, serve as a stark illustration of the scale of the problem. Traditional methods for determining the impact of newly identified coding variants are inadequate. Biochemical assays can reveal the functional consequences of variants, but they are limited in scale to, at most, hundreds of variants. Algorithms like PolyPhen2 can describe the consequences of any variant in a pharmacogene of interest, but they often produce incorrect results. We propose experimental large-scale determination of the relationship between pharmacogene sequence and function, thereby enabling accurate functional predictions for every possible genetic diagnosis.
Several groups, including PharmGKB and CPIC, have already embraced the task of providing guidelines to the medical community about the likely impact of variants. To this end, CPIC have identified 84 gene-drug pairs that are assigned a level A or level B priority, which reflects the group’s view that genetic information could be used to guide prescribing of the affected drug. In the 3 year period of funding available through this R24 grant application, we will concentrate on disseminating a comprehensive, functional catalog for all non synonymous variants in five of these genes; CYP2C9, CYP2C19, CYP2D6, TPMT and VKORC1, which feature, in the aggregate, in more than half of the 84 highest priority CPIC gene-pairs.
Aim 1: Large-scale measurement of variant effect in an important set of pharmacogenes. The goal of this aim is to deploy a set of large-scale data generation tools that we developed for experimentally measuring variation in CYP2C9, CYP2C19, CYP2D6, TPMT and VKORC1. Commonly used assays for pharmacogene activity will be adapted such that variants can be pooled and scored in parallel for their activity and stability using high-throughput DNA sequencing. The output from each parallelized assay will be data sets describing the consequences of all possible singly mutated variants for a given pharmacogene.
Aim 2: Analysis of large-scale variant effect data to produce an impact score. In Aim 1, the consequences of every possible variant on pharmacogene activity and stability will be measured. The goal of Aim 2 is to integrate these data to produce a single, easily interpretable impact score with an associated statistical confidence. We will also combine our experimental data with available evolutionary and structural data and use a machine learning approach to classify the activity level of each variant (e.g. low/absent, intermediate, normal). Our classifications will be unlike existing computational predictors of variant effect in that it will be rooted in experimental measurements.
Aim 3: Testing impact score predictions for selected genes. The goal of this aim is to validate our impact score data by characterizing a subset of variants in each pharmacogene using widely accepted in vitro and cellular assays. A subset of 20 variants from each gene/drug pair that span the activity spectrum will be assayed, enabling us to derive empirical false discovery rates for each impact score data set. To assess our ability to classify neutral variants, we will include previously unevaluated variants found in the Exome Variant Server and in the 1,000 Genomes data sets.
Aim 4: Data dissemination and translation of resources. The goal of this aim is to make the data we generate available in a user-friendly, highly accessible format. To accomplish this goal, we have engaged in extended discussions with the pharmacogenomics community to develop a dissemination strategy. This strategy relies on three major elements: rich annotation of our findings; distribution of impact score data through PharmGKB and other public databases; and distribution of all data through a custom-built web resource to provide the entire dataset, scores and annotations.