The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of next-generation sequencing of the protein coding regions of the human genome across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community to extend and enrich the diagnosis, management and treatment of heart, lung and blood disorders.
The groups participating and collaborating in the NHLBI GO ESP include:
The group includes some of the largest well-phenotyped populations in the United States, representing more than 200,000 individuals altogether from the:
We request that any use of data obtained from the NHLBI GO ESP Exome Variant Server be cited in publications.
Exome Variant Server, NHLBI GO Exome Sequencing Project (ESP), Seattle, WA (URL: http://evs.gs.washington.edu/EVS/) [date (month, yr) accessed].
Acknowledgment for Publication
The authors would like to thank the NHLBI GO Exome Sequencing Project and its ongoing studies which produced and provided exome variant calls for comparison: the Lung GO Sequencing Project (HL-102923), the WHI Sequencing Project (HL-102924), the Broad GO Sequencing Project (HL-102925), the Seattle GO Sequencing Project (HL-102926) and the Heart GO Sequencing Project (HL-103010).
Public Data Release
The current EVS data release (ESP6500SI-V2) is taken from 6503 samples drawn from multiple ESP cohorts and represents all of the ESP exome variant data.
Sequences were aligned to NCBI build 37 human genome reference using BWA. PCR Duplicates were removed using Picard. Alignments were recalibrated using GATK. Lane-level indel realignments and base alignment quality (BAQ) adjustments were applied.
All data were simultaneously analyzed for exome SNP variants at the University of Michigan (by the Abecasis Laboratory). SNPs were called using a two-step approach. First, genotype likelihood files (GLFs) were generated using samtools pileup on individual BAM files. Next, we used glfMultiples, a multi-sample variant caller, to generate initial SNP calls. Details of the likelihood model implemented in glfMultiples are given in Li et al (Genome Research (2011) 21:940-951; section entitled "Identifying Potential Polymorphic Sites"). The Michigan SNP calling pipeline is available at: http://genome.sph.umich.edu/wiki/UMAKE. This pipeline makes diploid calls for pseudo-autosomal regions of male samples and haploid calls for the rest of the chromosome. Female samples have diploid calls for all regions on the X chromosome. SNPs were filtered by a machine-learning technique called support vector machine (SVM) classification (for a detailed description, see Filter Status).
Small INDEL variants were analyzed at the Broad Institute (by the Genome Sequencing and Analysis group) using the GATK variation discovery pipeline following the guidelines in the GATK best practices v4. More specifically, each BAM was reduced to create a Reduced BAM, and then INDELs were discovered by analyzing all samples simultaneously with the GATK UnifiedGenotyper, and subsequently filtered by the GATK Variant Quality Score Recalibration (VQSR) filtering model, again following the V4 best practices. The INDEL genotypes for the X and Y chromosomes were adjusted to be consistent with the samples' genders. Female samples have diploid calls for all regions on the X chromosome. Male samples have diploid calls for the pseudo-autosomal regions on the X chromosome and haploid calls for the rest of the X chromosome and on the Y chromosome as well.
A subset of this data (ESP2500) that has more stringent filtering criteria is available in the latest release of dbSNP (build 134), and published by the ESP Population Genetics and Statistical Analysis Working Group. (publication)
Terms of Service
This web site is designed to disseminate exome sequencing data generated through federal funding from the NHLBI. This data is provided free-of-charge, provided the following permission statement is followed. There may be other information on the site, such as links to other sites, references to other project groups and federal grants. The University of Washington has no responsibility for these links and information.
The contents of the NHLBI ESP Exome Variant Server web site are intended for educational or research purposes. As stated above, a subset of exome variants on this website have been deposited in dbSNP, and the full dataset will be deposited in dbGaP as part of the ESP cohort data. We place no restrictions on the use of the data available from the EVS. You may download or copy the content and other downloadable items displayed on the Exome Variant Server portion of the web site, provided that in using the data, you follow the citation format given above.
How to Use the Data Browser
|The gene model is that of NCBI, June 2012. Chromosome positions are those of NCBI build 37 (UCSC hg19).|
|1. Search Type|
|There are three ways to query variations:|
|A. gene name (HUGO, upper or lower case)|
|B. gene ID (from NCBI Entrez Gene)|
|C. chromosomal location|
|For A and B, you have the option to extend the chromosome region. The choice "upstream" is on the 5' end, and "downstream" is on the 3' end of the gene.|
|When a search by gene name or gene ID is made, there are sometimes alternative transcripts. A region large enough to cover all transcripts is chosen.|
|2. Select Data|
|Two types of data, variant information and sequencing coverage information, are ouput under separated tabs.|
|The variant data are summarized by European American and African American populations. Select the populations you are interested in to display.|
|3. Display Results|
|Once the data sets are chosen, you can query variants using the ""display snp summary" button to aquire calculated values and annotations for the variants. The page "variant Summary Columns" details the quantities displayed.|
|4. Download Results|
Both variant and coverage data can be downloaded using the downloading option listed on the top of the diaplying pages. variant data can be downloaded in either text or vcf format. Both summary coverage data and detailed locus coverage data can be downloaded. The downloaded data are compressed in either gzip or zip format.
If you import the text-formatted file to Excel, it will be necessary to choose "Data/Get External Data/Import Text File" and select "Delimited" and "Space". Make sure import all columns as text in Excel.
How to Run Batch Query
|1. Download Batch Query Program|
|Please download the command-line client program here.|
|2. Run Batch Query Program|
Java 6 is required to run the command-line client program. You can learn how to use the command-line client program by running the following,|
java -jar YOUR_DOWNLOADED_EVS_CLIENT_JAR_FILE -h
Current EVS Release Version: v.0.0.20. (June 7, 2013)Changes made in EVS-v.0.0.20, (June 7, 2013)
Current EVS Release Version: v.0.0.19. (March 22, 2013)Changes made in EVS-v.0.0.19, (March 22, 2013)
Current EVS Release Version: v.0.0.18. (February 8, 2013)Changes made in EVS-v.0.0.18, (February 8, 2013)
Current EVS Release Version: v.0.0.17. (November 30, 2012)Changes made in EVS-v.0.0.17, (November 30, 2012)
Current EVS Release Version: v.0.0.16. (November 9, 2012)Changes made in EVS-v.0.0.16, (November 9, 2012)
Current EVS Release Version: v.0.0.15. (October 31, 2012)Changes made in EVS-v.0.0.15, (October 31, 2012)
Current EVS Release Version: v.0.0.14. (June 20, 2012)Changes made in EVS-v.0.0.14, (June 20, 2012)
Current EVS Release Version: v.0.0.13. (April 22, 2012)Changes made in EVS-v.0.0.13, (April 22, 2012)
Current EVS Release Version: v.0.0.12. (Feb. 15, 2012)Changes made in EVS-v.0.0.12, (Feb. 15, 2012)
Current EVS Release Version: v.0.0.11. (Jan. 13, 2012)Changes made in EVS-v.0.0.11, (Jan. 13, 2012)
Release Version: v.0.0.10. (Dec. 10, 2011)Changes made in EVS-v.0.0.10, (Dec. 10, 2011)
Release of batch-query eveClient-v.0.0.5 (Dec. 10, 2011)The new version (v.0.0.5) of EVS batch-mode query client program can be downloaded here.
Release Version: v.0.0.9. (Nov. 22, 2011)
ESP 5400 exome data initial release (Nov. 22, 2011)Changes made in v.0.0.9, (Nov. 22, 2011)
New Batch-mode query client package release (Oct. 11, 2011)A new version (v.0.0.3) of EVS batch-mode query client program can be downloaded here. This version calls the EVS web service through http://evs.gs.washington.edu/wsEVS/EVSDataQueryService?wsdl.
Batch-mode query release (August 11, 2011)To accommodate many requests for batch-mode query to the EVS database, a web service is deployed on one of our servers, and a corresponding command-line client program is released for batch query to the EVS server. Please download the command-line client program here.
If you have any questions regarding EVS and ESP data, please check the FAQ list first before sending an email.
FAQWhat is the difference between the current ESP6500 data release and the previous ESP5400 data release? [-]
The following downloadable files contain SNPs, INDELs and coverage data for the ESP 6500 exomes (chromosomes 1-22, X, and Y).ESP6500SI-V2-SSA137.snps_indels.txt.tar.gz (variants and annotations in space-delimited text format)