A map of constrained coding regions in the human genome

James M. Havrilla, Brent S. Pedersen, Ryan M. Layer, Aaron Quinlan

Research output: Contribution to journalArticle

Abstract

Deep catalogs of genetic variation from thousands of humans enable the detection of intraspecies constraint by identifying coding regions with a scarcity of variation. While existing techniques summarize constraint for entire genes, single gene-wide metrics conceal regional constraint variability within each gene. Therefore, we have created a detailed map of constrained coding regions (CCRs) by leveraging variation observed among 123,136 humans from the Genome Aggregation Database. The most constrained CCRs are enriched for pathogenic variants in ClinVar and mutations underlying developmental disorders. CCRs highlight protein domain families under high constraint and suggest unannotated or incomplete protein domains. The highest-percentile CCRs complement existing variant prioritization methods when evaluating de novo mutations in studies of autosomal dominant disease. Finally, we identify highly constrained CCRs within genes lacking known disease associations. This observation suggests that CCRs may identify regions under strong purifying selection that, when mutated, cause severe developmental phenotypes or embryonic lethality.

LanguageEnglish (US)
Pages88-95
Number of pages8
JournalNature genetics
Volume51
Issue number1
DOIs
StatePublished - Jan 1 2019

Fingerprint

Human Genome
Genes
Mutation
Licensure
Databases
Phenotype
Protein Domains

ASJC Scopus subject areas

  • Genetics

Cite this

A map of constrained coding regions in the human genome. / Havrilla, James M.; Pedersen, Brent S.; Layer, Ryan M.; Quinlan, Aaron.

In: Nature genetics, Vol. 51, No. 1, 01.01.2019, p. 88-95.

Research output: Contribution to journalArticle

Havrilla, James M. ; Pedersen, Brent S. ; Layer, Ryan M. ; Quinlan, Aaron. / A map of constrained coding regions in the human genome. In: Nature genetics. 2019 ; Vol. 51, No. 1. pp. 88-95.
@article{d7ab7c5faf6d49e1b677e3f8a18dadd7,
title = "A map of constrained coding regions in the human genome",
abstract = "Deep catalogs of genetic variation from thousands of humans enable the detection of intraspecies constraint by identifying coding regions with a scarcity of variation. While existing techniques summarize constraint for entire genes, single gene-wide metrics conceal regional constraint variability within each gene. Therefore, we have created a detailed map of constrained coding regions (CCRs) by leveraging variation observed among 123,136 humans from the Genome Aggregation Database. The most constrained CCRs are enriched for pathogenic variants in ClinVar and mutations underlying developmental disorders. CCRs highlight protein domain families under high constraint and suggest unannotated or incomplete protein domains. The highest-percentile CCRs complement existing variant prioritization methods when evaluating de novo mutations in studies of autosomal dominant disease. Finally, we identify highly constrained CCRs within genes lacking known disease associations. This observation suggests that CCRs may identify regions under strong purifying selection that, when mutated, cause severe developmental phenotypes or embryonic lethality.",
author = "Havrilla, {James M.} and Pedersen, {Brent S.} and Layer, {Ryan M.} and Aaron Quinlan",
year = "2019",
month = "1",
day = "1",
doi = "10.1038/s41588-018-0294-6",
language = "English (US)",
volume = "51",
pages = "88--95",
journal = "Nature Genetics",
issn = "1061-4036",
publisher = "Nature Publishing Group",
number = "1",

}

TY - JOUR

T1 - A map of constrained coding regions in the human genome

AU - Havrilla, James M.

AU - Pedersen, Brent S.

AU - Layer, Ryan M.

AU - Quinlan, Aaron

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Deep catalogs of genetic variation from thousands of humans enable the detection of intraspecies constraint by identifying coding regions with a scarcity of variation. While existing techniques summarize constraint for entire genes, single gene-wide metrics conceal regional constraint variability within each gene. Therefore, we have created a detailed map of constrained coding regions (CCRs) by leveraging variation observed among 123,136 humans from the Genome Aggregation Database. The most constrained CCRs are enriched for pathogenic variants in ClinVar and mutations underlying developmental disorders. CCRs highlight protein domain families under high constraint and suggest unannotated or incomplete protein domains. The highest-percentile CCRs complement existing variant prioritization methods when evaluating de novo mutations in studies of autosomal dominant disease. Finally, we identify highly constrained CCRs within genes lacking known disease associations. This observation suggests that CCRs may identify regions under strong purifying selection that, when mutated, cause severe developmental phenotypes or embryonic lethality.

AB - Deep catalogs of genetic variation from thousands of humans enable the detection of intraspecies constraint by identifying coding regions with a scarcity of variation. While existing techniques summarize constraint for entire genes, single gene-wide metrics conceal regional constraint variability within each gene. Therefore, we have created a detailed map of constrained coding regions (CCRs) by leveraging variation observed among 123,136 humans from the Genome Aggregation Database. The most constrained CCRs are enriched for pathogenic variants in ClinVar and mutations underlying developmental disorders. CCRs highlight protein domain families under high constraint and suggest unannotated or incomplete protein domains. The highest-percentile CCRs complement existing variant prioritization methods when evaluating de novo mutations in studies of autosomal dominant disease. Finally, we identify highly constrained CCRs within genes lacking known disease associations. This observation suggests that CCRs may identify regions under strong purifying selection that, when mutated, cause severe developmental phenotypes or embryonic lethality.

UR - http://www.scopus.com/inward/record.url?scp=85058142396&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058142396&partnerID=8YFLogxK

U2 - 10.1038/s41588-018-0294-6

DO - 10.1038/s41588-018-0294-6

M3 - Article

VL - 51

SP - 88

EP - 95

JO - Nature Genetics

T2 - Nature Genetics

JF - Nature Genetics

SN - 1061-4036

IS - 1

ER -