NHLBI Exome Sequencing Project (ESP)

Exome Variant Server

The variant callset from this site has not been updated for a long time, and the ESP data set has been incorporated into some of the other larger variant callsets like, the BRAVO and gnomAD variant browsers etc. This site will be permanently shutdown at the end of October, 2023.

The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of next-generation sequencing of the protein coding regions of the human genome across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community to extend and enrich the diagnosis, management and treatment of heart, lung and blood disorders.

The groups participating and collaborating in the NHLBI GO ESP include:

The group includes some of the largest well-phenotyped populations in the United States, representing more than 200,000 individuals altogether from the:

Target: 
examples of valid input for targets (one target per query):
Gene HUGO: ACTB
Gene ID: 60
Chr. Region: 1:1000000-1100000
Single Chr. Location: 7:5567417
rsID: rs71531321

Data Usage

We request that any use of data obtained from the NHLBI GO ESP Exome Variant Server be cited in publications.

Citation

Exome Variant Server, NHLBI GO Exome Sequencing Project (ESP), Seattle, WA (URL: http://evs.gs.washington.edu/EVS/) [date (month, yr) accessed].

Acknowledgment for Publication

The authors would like to thank the NHLBI GO Exome Sequencing Project and its ongoing studies which produced and provided exome variant calls for comparison: the Lung GO Sequencing Project (HL-102923), the WHI Sequencing Project (HL-102924), the Broad GO Sequencing Project (HL-102925), the Seattle GO Sequencing Project (HL-102926) and the Heart GO Sequencing Project (HL-103010).

Public Data Release

The current EVS data release (ESP6500SI-V2) is taken from 6503 samples drawn from multiple ESP cohorts and represents all of the ESP exome variant data.

Sequences were aligned to NCBI build 37 human genome reference using BWA. PCR Duplicates were removed using Picard. Alignments were recalibrated using GATK. Lane-level indel realignments and base alignment quality (BAQ) adjustments were applied.

All data were simultaneously analyzed for exome SNP variants at the University of Michigan (by the Abecasis Laboratory). SNPs were called using a two-step approach. First, genotype likelihood files (GLFs) were generated using samtools pileup on individual BAM files. Next, we used glfMultiples, a multi-sample variant caller, to generate initial SNP calls. Details of the likelihood model implemented in glfMultiples are given in Li et al (Genome Research (2011) 21:940-951; section entitled "Identifying Potential Polymorphic Sites"). The Michigan SNP calling pipeline is available at: http://genome.sph.umich.edu/wiki/UMAKE. This pipeline makes diploid calls for pseudo-autosomal regions of male samples and haploid calls for the rest of the chromosome. Female samples have diploid calls for all regions on the X chromosome. SNPs were filtered by a machine-learning technique called support vector machine (SVM) classification (for a detailed description, see Filter Status).

Small INDEL variants were analyzed at the Broad Institute (by the Genome Sequencing and Analysis group) using the GATK variation discovery pipeline following the guidelines in the GATK best practices v4. More specifically, each BAM was reduced to create a Reduced BAM, and then INDELs were discovered by analyzing all samples simultaneously with the GATK UnifiedGenotyper, and subsequently filtered by the GATK Variant Quality Score Recalibration (VQSR) filtering model, again following the V4 best practices. The INDEL genotypes for the X and Y chromosomes were adjusted to be consistent with the samples' genders. Female samples have diploid calls for all regions on the X chromosome. Male samples have diploid calls for the pseudo-autosomal regions on the X chromosome and haploid calls for the rest of the X chromosome and on the Y chromosome as well. However, the INDEL calls for the ESP data are preliminary and not as robust as the SNP calls at this point, users are advised to keep this difference in mind when applying the ESP data to research studies.

All SNPs and INDELs were further annotated by SeattleSeqAnnotation137, and the variant annotations at the coding-DNA and protein levels follow mostly the HGVS conventions.

A subset of this data (ESP2500) that has more stringent filtering criteria is available in the latest release of dbSNP (build 134), and published by the ESP Population Genetics and Statistical Analysis Working Group. (publication)

Terms of Service

This web site is designed to disseminate exome sequencing data generated through federal funding from the NHLBI. This data is provided free-of-charge, provided the following permission statement is followed. There may be other information on the site, such as links to other sites, references to other project groups and federal grants. The University of Washington has no responsibility for these links and information.

Permission

The contents of the NHLBI ESP Exome Variant Server web site are intended for educational or research purposes. As stated above, a subset of exome variants on this website have been deposited in dbSNP, and the full dataset will be deposited in dbGaP as part of the ESP cohort data. We place no restrictions on the use of the data available from the EVS. You may download or copy the content and other downloadable items displayed on the Exome Variant Server portion of the web site, provided that in using the data, you follow the citation format given above.

How to Use the Data Browser

The current release has been tested successfully with Firefox (v.20.0), Chrome (v.26.0), Safari (v.5.0) and IE (v.10.0 and v.9.0). To use this site, your browser must have cookies and JavaScript enabled.
The gene model is that of NCBI, June 2012. Chromosome positions are those of NCBI build 37 (UCSC hg19).
1. Search Type
There are four ways to query variations:
A. gene name (HUGO, upper or lower case)
B. gene ID (from NCBI Entrez Gene)
C. chromosomal location
D. dbSNP rs ID
When a search by gene name or gene ID is made, there are sometimes alternative transcripts. A region large enough to cover all transcripts is chosen.
2. Select Data
Two types of data, variant information and sequencing coverage information, are ouput under separated tabs.
The variant data are summarized by European American and African American populations. Select the populations you are interested in to display.
3. Display Results
Once the data sets are chosen, you can query variants using the ""display snp summary" button to aquire calculated values and annotations for the variants. The page "variant Summary Columns" details the quantities displayed.
4. Download Results
Both variant and coverage data can be downloaded using the downloading option listed on the top of the diaplying pages. variant data can be downloaded in either text or vcf format. Both summary coverage data and detailed locus coverage data can be downloaded. The downloaded data are compressed in either gzip or zip format.

If you import the text-formatted file to Excel, it will be necessary to choose "Data/Get External Data/Import Text File" and select "Delimited" and "Space". Make sure import all columns as text in Excel.

How to Run Batch Query

1. Download Batch Query Program
Please download the command-line client program here.
2. Run Batch Query Program
Java 6 is required to run the command-line client program. You can learn how to use the command-line client program by running the following,
    java   -jar   YOUR_DOWNLOADED_EVS_CLIENT_JAR_FILE   -h

Changes made in evsClient-v.0.0.16 (April 23, 2019)
   1)Due to a data security requirement, the <soap:address> in the wsEVS WSDL document has changed (please see the updated WSDL). The evsClient.jar is also changed accordingly. Please download evsClient-v.0.0.16.


Changes made in evsClient-v.0.0.15 (May 14, 2015)
   1)Due to our server re-organization, the <soap:address> in the wsEVS WSDL document has changed (please see the updated WSDL). The evsClient.jar is also changed accordingly. Please download evsClient-v.0.0.15.


Current EVS Release Version: v.0.0.30. (Nov. 3, 2014)

Changes made in EVS-v.0.0.30 (Nov. 3, 2014)
   1) Correct a typo from "NUMBER" to "Number" in a header line of the variant VCF output files. The bulk downloadable variant VCF file is also updated.

Changes made in evsClient-v.0.0.14 (Nov. 3, 2014)
   1) Correct a typo from "NUMBER" to "Number" in a header line of the variant VCF output files. You can download the evsClient-v.0.0.14.


Current EVS Release Version: v.0.0.29. (Aug. 18, 2014)

Changes made in EVS-v.0.0.29 (Aug. 18, 2014)
   1) The ESP project was primarily based on the Hg19 (GRCh37) human genome reference. A liftover of chromosomal coordinates from GRCh37 to GRCh38 is performed in order to provide the corresponding GRCh38 chromosomal locations for the variants and the target sites. Not all the sites in GRCh37 can be mapped through the liftover, the unmapped sites are listed with "-1" for their GRCh38 locations.

Changes made in wsEVS-v.0.0.15 (Aug. 18, 2014)
   1) Add a string element, "grcH38Position" for both the "snpData" and the "siteCoverageInfo" complexTypes, to include the liftover GRCh38 chromosomal locations.

Changes made in evsClient-v.0.0.13 (Aug. 18, 2014)
   1) The variant files and the site coverage file output by the evsClient program include the liftover GRCh38 chromosomal locations. You can download the evsClient-v.0.0.13.


Current EVS Release Version: v.0.0.28. (May. 9, 2014)

Changes made in EVS-v.0.0.28 and evsClient-v.0.0.12 (May. 9, 2014)
   1) Add "~" signs in front of some dbSNP rsIDs for some INDELs which are approximately mapped to the dbSNP records with those rsIDs by seattleSeqAnnotation137. Approximate mappings indicate the chromosomal locations for those INDELs in the ESP dataset don't match those listed in the dbSNP with the rsIDs. The mappings should be considered as suggestions rather than accurate mappings.
   2) To be consistent, the bulk-downloadable files in both text and VCF formats are updated with the change described in 1) for all approximately mapped INDELs.

Changes made in wsEVS-v.0.0.14 (May. 9, 2014)
   1) Add an boolean element, "approxMapped2RsId" for the "snpFunction" complexType, to identify which INDELs are approximately mapped to the dbSNP records.

Current EVS Release Version: v.0.0.27. (Apr. 18, 2014)

Changes made in EVS-v.0.0.27, wsEVS-v.0.0.13 (Apr. 18, 2014)
   1) For querying by rsID, our internal GVS database is updated with the dbSNP-138 build which contains all the SNPs from the ESP project.
   2) Add a query option of "rsOrChrom" mainly for the UCSC genome browser to link back directly to the EVS.

Current EVS Release Version: v.0.0.26. (Apr. 2, 2014)

Changes made in EVS-v.0.0.26, wsEVS-v.0.0.12 and evsClient-v.0.0.11 (Apr. 2, 2014)
   1) When querying a rsID, the mapped chromosomal starting position for a rsID is assigned to one-base prior to the variant event since that is how INDEL's chromosomal start position is listed in the EVS database.
   2) Add a little more description of MAF for multi-allelic variants in VCF files. The MAF is defined as the allele frequency in percent for all the minor alleles in the cases oof multi-allelic variants.

Current EVS Release Version: v.0.0.25. (Feb. 7, 2014)

Changes made in EVS-v.0.0.25, (Feb. 7, 2014)
   1) Update some protein HGVS notations for some frameshift INDELs due to a previous bug in the code.

Current EVS Release Version: v.0.0.24. (Jan. 10, 2014)

Changes made in EVS-v.0.0.24, (Jan 10, 2014)
   1) Backend code is reworked.

Current EVS Release Version: v.0.0.23. (December 19, 2013)

Changes made in EVS-v.0.0.23, (December 19, 2013)
   1) Add a pulldown menu for population selection above variant summary table.

Current EVS Release Version: v.0.0.22. (Oct. 17, 2013)

Changes made in EVS-v.0.0.22, (Oct. 17, 2013)
   1) Add OMIM link for gene(s) above the variant table.
   2) Add a pulldown menu for mRNA transcripts. Users can select a specific transcript to display the variant annotations.
   3) Fix a bug in sorting comparison.

Current EVS Release Version: v.0.0.21. (August 30, 2013)

Changes made in EVS-v.0.0.21, (August 30, 2013)
   1) Change the variant table inteface for the EVS website including paginating the variant table, adding filtering for the variant table, consolidating the representations for a variant involved with multiple transcripts based on function annotations, i.e., only one representative transcript is listed in the variant table if the function annotations for the variant are the same for multiple transcripts, etc.
   2) Update rsIDs for the ESP SNPs since the ESP-6500 SNPs are now included in the recently-released dbSNP-138 build. As a result, the bulk-downloadable variants files are updated with the rsIDs in the dbSNP-138 build.


Current EVS Release Version: v.0.0.20. (June 7, 2013)

Changes made in EVS-v.0.0.20, (June 7, 2013)
   1) The annotation for the ESP data set has been updated with the SeattleSeqAnnotation137 results. The annotations are based on the HG19 human genome reference sequence and the NCBI gene model, and the annotations always refer to a change from a reference allele to an alternate allele.
   2) The genomic allele changes, the cDNA allele changes and the potential protein changes are published in the HGVS-recommended formats. Therefore, the previous columns of "Amino Acid", "Protein Pos.", and "cDNA Pos." have been replaced with the new columns of "Protein Change", "cDNA Change", and "cDNA Size". These changes also affect the corresponding columns in the text-formatted downloaded files and the corresponding attributes of the INFO field in the vcf-formatted downloaded files.
   3) Allow to query directly by a rsID, and to query a single chromosome location with an input such as 7:5567417 for position 5567417 on chromosome 7.

Changes made in the web service of EVS (wsEVS-v.0.0.10) (June 7, 2013)
   1) The data model are changed accordingly due to the change 2) listed above for the annotations in HGVS-recommended formats.

Changes made in evsClient-v.0.0.10 (June 8, 2013)
   1) All changes made in EVS-v.0.0.20 listed above are implemented in evsClient-v.0.0.10.
   2)The new version (v.0.0.10) of EVS batch-mode query client program can be downloaded here.

Changes made in bulk-downloadable files
   1) The bulk-downloadable varaint files are ESP6500SI-V2-SSA137.snps_indels.vcf.tar.gz and ESP6500SI-V2-SSA137.snps_indels.txt.tar.gz which are updated with the SeattelSeqAnnotation137 results. Some columns in the text-formatted file and some attributes in the INFO field of the vcf-formatted file are changed accordingly due to the change 2) listed above for the annotations in HGVS-recommended formats.


Current EVS Release Version: v.0.0.19. (March 22, 2013)

Changes made in EVS-v.0.0.19, (March 22, 2013)
   1) The INDEL calls have been updated with the latest GATK INDEL calls performed by the Genome Sequencing and Analysis group at the Broad Institute. Please contact directly the Genome Sequencing and Analysis group at the Broad Institute regarding any concerns about the INDEL calls.

Changes made in the web service of EVS (wsEVS-v.0.0.9) (March 22, 2013)
   1) The INDEL calls have been updated with the latest GATK INDEL calls performed by the Genome Sequencing and Analysis group at the Broad Institute.


Current EVS Release Version: v.0.0.18. (February 8, 2013)

Changes made in EVS-v.0.0.18, (February 8, 2013)
   1) Add two columns, the EA Estimated Age (kyrs) and the AA Estimated Age (kyrs), in variant summary table based on the study published Nature 493: 216-220, 2013 by Fu W etc.

Changes made in the web service of EVS (wsEVS-v.0.0.8) (February 8, 2013)
   1) Include four new attributes, eaMutAge, eaMutAgeSd, aaMutAge, aaMutAgeSd, for the estimated mutation ages and the standard deviations in the European- and African- American populations respectively.

Changes made in evsClient-v.0.0.9 (February 8, 2013)
   1)Add two columns, the EA-EstimatedAge(kyrs) and the AA-EstimatedAge(kyrs) in the variant output.
   2)The new version (v.0.0.9) of EVS batch-mode query client program can be downloaded here.

Current EVS Release Version: v.0.0.17. (November 30, 2012)

Changes made in EVS-v.0.0.17, (November 30, 2012)
   1) Replace javascript dojo framework with JQuery framework for the EVS user interface because dojo doesn't work well in IE9.
   2) Change how the user query input is entered.

Current EVS Release Version: v.0.0.16. (November 9, 2012)

Changes made in EVS-v.0.0.16, (November 9, 2012)
   1) Fix the issue with the drop-down display for INDEL alleles in the variant summary table with IE browsers. It is tested and works in IE8. No other IE version is tested. The preferable browser for the EVS interface is Firefox.
   2) Change the "NA", "none" and "unknown" values to "." in the VCF output file to be consistent with the VCF specifications. The bulkdown VCF files are also updated with this change.

Changes made in evsClient-v.0.0.8 (November 9, 2012)
   1) Change the "NA", "none" and "unknown" values to "." in the VCF output file to be consistent with the VCF specifications.
   2) The new version (v.0.0.8) of EVS batch-mode query client program can be downloaded here.

Current EVS Release Version: v.0.0.15. (October 31, 2012)

Changes made in EVS-v.0.0.15, (October 31, 2012)
   1) The INDEL calls for the ESP6500 are included. Please check the description for the INDEL calls in the "Public Data Release" section under the "Data Usage and Release" tab.
   2) The SNP calls for chromosome Y are added.
   3) The bulk download-able files under the "Downloads" tab include both the SNPs and the INDELs.

Changes made in the web service of EVS (wsEVS-v.0.0.6) (October 31, 2012)
   1) Integrate the INDEL calls with the SNP calls

Changes made in evsClient-v.0.0.7 (October 31, 2012)
   1) The signature for the output filename is changed from "EVS_SNP_download_" to "EVS_variant_download_".
   2) The new version (v.0.0.7) of EVS batch-mode query client program can be downloaded here.

Current EVS Release Version: v.0.0.14. (June 20, 2012)

Changes made in EVS-v.0.0.14, (June 20, 2012)
   1) The ESP6500 dataset is comprised of a set of 2203 African-Americans and 4300 European-Americans unrelated individuals, totaling 6503 samples (13,006 chromosomes).

Current EVS Release Version: v.0.0.13. (April 22, 2012)

Changes made in EVS-v.0.0.13, (April 22, 2012)
   1) Make coverage data available under the "Coverage Results" tab even when there is no SNPs existing in the queried regions.

Current EVS Release Version: v.0.0.12. (Feb. 15, 2012)

Changes made in EVS-v.0.0.12, (Feb. 15, 2012)
   1) Add links for gene-level information to external resources such as NHGRI Catalog of Published Genome-Wide Association Studies (GWAS), gene pathways from Kyoto Encyclopedia of Genes and Genomes (KEGG), and Sanger Institute Catalogue Of Somatic Mutations In Cancer (COSMIC).
   2) Add GWAS hit information to variant annotations if variants are included in specific Genome-Wide Association Studies.

Changes made in the web service of EVS (wsEVS-v.0.0.5) (Feb. 15, 2012)
   1) Add "gwasPubmedIds" element to "snpData" complexType for specific GWAS hits.
   2) Add "locus" complexType to "evsData" complexType. The "locus" contains an element, "keggPathwayIds", for KEGG gene pathway IDs.

Changes made in evsClient-v.0.0.6 (Feb. 15, 2012)
   1) SNP Output files contain PubMed IDs of known GWAS variant hits and known KEGG gene pathway IDs.
   2) The new version (v.0.0.6) of EVS batch-mode query client program can be downloaded here.


Current EVS Release Version: v.0.0.11. (Jan. 13, 2012)

Changes made in EVS-v.0.0.11, (Jan. 13, 2012)
   1)Add another track for EVS SNPs in UCSC browser view. When the track is viewed in the pack or full mode, the rsID is displayed right next to each known SNP while the SVM-model-based filtering status is displayed for each novel SNP from ESP project.

Release Version: v.0.0.10. (Dec. 10, 2011)

Changes made in EVS-v.0.0.10, (Dec. 10, 2011)
   1) Bulk download of all SNPs and coverage data for the ESP 5400 exomes (chromosomes 1-22, and X) is available under the "Downloads" tab on this page.
   2) The URLs with embedded ";" for Clinical Association (CA) in VCF INFO fields are modified with "%3B", the hex representation of ";", since ";" is a VCF reserved character.

Release of batch-query eveClient-v.0.0.5 (Dec. 10, 2011)

   The new version (v.0.0.5) of EVS batch-mode query client program can be downloaded here.

Changes made in evsClient-v.0.0.5
   1) The URLs with embedded ";" for Clinical Association (CA) in VCF INFO fields are modified with "%3B", the hex representation of ";".
   2) The URLs listed under the "ClinicalInfo" column in the wsEVS_SNP_download_*.txt files are separated by "|" instead of "," since comma is sometimes embedded in some URLs too.

Change made in the EVS web service v.0.0.4 (Nov. 29, 2011)
   1) Add an element of "evsWebServiceVersion" in "evsData" data type.
   2) Add an element of "evsDataSourceVersion" in "evsData" data type.

Release Version: v.0.0.9.   (Nov. 22, 2011)

ESP 5400 exome data initial release (Nov. 22, 2011)

Changes made in v.0.0.9, (Nov. 22, 2011)
   1) SNPs from ESP 5400 exome data.
   2) Genotype counts are included in SNP summary tables
   3) Illumina HumanExome Chip status for each SNPs are included in SNP summary table.
   4) About 90% missense SNPs are annotated with PolyPhen.
   5) Add a "All site coverages" button on the top of the coverage stats to make it easier for users to download the detailed coverage data for every sites in the region (or gene) they are intersted in.
   6) Add sorting by mRNA accession for SNP summary table.
   7) For batch-query users, a new version (v.0.0.4) of EVS batch-mode query client program can be downloaded here.

New Batch-mode query client package release (Oct. 11, 2011)

    A new version (v.0.0.3) of EVS batch-mode query client program can be downloaded here. This version calls the EVS web service through http://evs.gs.washington.edu/wsEVS/EVSDataQueryService?wsdl.

Changes made in v.0.0.8, (Sept. 30, 2011)
   1) Update the dbSNP rs_ids in the EVS database with the dbSNP version 131. Previously, there was only a subset of dbSNP rs_ids from dbSNP 131 in our EVS database.
   2)The average sample sequence read depth can now be viewed graphically thru UCSC genome browser.

Changes made in v.0.0.7, (Sept. 21, 2011)
   1) Change the allele count format for the "EuropeanAmerican Allele #", the "AfricanAmerican Allele #" and the "All Allele #" columns from a list of allele counts delimited by "/", e.g., 1/2647, to a list of allele=count pairs delimited by "/", e.g., C=1/T=2647. This change will affect the display on the website, the text-format file-downloading and the output text-format files from EVS batch query.
   2) A new version (v.0.0.2) of EVS batch-mode query client program can be downloaded here. This version incorporates the change of the allele count format described in 1) above.
   3) For batch-mode query users, if you wish to be notified of our scheduled outages, please visit this site:

      https://mailman1.u.washington.edu/mailman/listinfo/gvsnotify

      This is a moderated site, so approval of subscriptions will only be made on weekdays. The mailing list is set up one-way: no posting by subscribers.

Changes made in v.0.0.6, (Sept. 9, 2011)
   1) Add sorting ability in SNP summary page. The SNPs can be sorted by chromosome position, minor allele count, GVS function annotation, or conservation scores.

Batch-mode query release (August 11, 2011)

To accommodate many requests for batch-mode query to the EVS database, a web service is deployed on one of our servers, and a corresponding command-line client program is released for batch query to the EVS server. Please download the command-line client program here.

Java 6 is required to run the command-line client program. You can learn how to use the command-line client program by running the following,
    java   -jar   YOUR_DOWNLOADED_EVS_CLIENT_JAR_FILE  -h

Changes made in v.0.0.5, (August 11, 2011)
   1) Fix a bug in outputting summary coverage blocks which missed the very last block, and the sequencing span for each block should be 1 bp longer.

Changes made in v.0.0.4, (July 29, 2011)
   1) Release coverage data.
   2) Add downloading options for variation output and coverage ouput.

If you have any questions regarding EVS and ESP data, please check the FAQ list first before sending an email.

FAQ

What is the difference between the current ESP6500 data release and the previous ESP5400 data release? [-]
The current, also the final data release (ESP6500) of the ESP exome sequencing project includes the majority of the exomes in the previous ESP5400 release and an additional ~1100 new exomes. When new exomes were added, we examined the kinship among the samples and removed the samples showing first-degree to third-degree relatedness. As a result, limited exomes in the ESP5400 release were excluded from the ESP6500 release. The final set of exomes are from all the unrelated samples. The SNP calls were re-generated by analyzing all the unrelated exomes simultaneously.

Is the ESP5400 data release still accessible? [-]
There is no plan to host two different versions of the EVS web interface for both the ESP6500 and the ESP5400 because it can cause confusion for users. If the ESP5400 dataset is preferred by your ongoing projects for any reason, you still can get the bulk download of the ESP5400 release at the following URLs,
ESP5400.snps.txt.tar.gz
ESP5400.snps.vcf.tar.gz
ESP5400.coverage.all_sites.txt.tar.gz
ESP5400.coverage.seq_blocks.txt.tar.gz

How do I obtain a user account to access ESP data? [-]
The NHLBI ESP Exome Variant Server (EVS) is publicly available and no account is needed. Simply click on the "Data Browser" tab to begin your search. You can also download the allele counts and frequencies for the entire dataset by selecting the "Downloads" tab.

What data is currently being served on the EVS? [-]
The current version dataset is comprised of a set of 2203 African-Americans and 4300 European-Americans unrelated individuals, totaling 6503 samples (13,006 chromosomes).

Are the samples in the ESP6500 clinical patients or healthy controls? [-]
The samples included in the ESP6500 were selected from the populations listed on the "Home" tab. Information about these populations can be found through dbGaP. In general, ESP samples were selected to contain controls, the extremes of specific traits (LDL and blood pressure), and specific diseases (early onset myocardial infarction and early onset stroke), and lung diseases. Cohort or phenotype information about any particular individual CAN NOT BE RELEASED. The goal of the ESP dataset is to release the frequency counts of specific variants without regard to phenotype.

How can I obtain the phenotype and genotype data associated with the data on the EVS? [-]
Sample level data including genotypes and phenotypes are not available on the EVS and may only be obtained from dbGaP. All of the phenotype and genotype data are moving into dbGaP. The ESP samples and genotypes listed here are available in dbGaP.

Is there a centralized mechanism through which I can apply for access to all of the data included in the ESP6500? [-]
Due to the organization of dbGaP, data are grouped by cohort. Therefore, at the present there is no centralized way to access phenotype data for the entire data set. The ESP is working with dbGaP and other groups to consider more global release formats in the future.

Have any of the ESP variants been validated by Sanger sequencing? [-]
Large scale validation of the variants was not performed. However, sequencing validation of a small number of singleton (~200) and high frequency SNP calls (~800) was performed and reported in Tennessen et. al. as published in Science online May 21, 2012 (PMID: 22604720). None of the INDEL calls was validated, In general, the INDEL calls are less robust than the SNP calls and have a higher false positive rate. When applying the ESP data to research studies, users are advised to keep this difference in mind.

What capture technology was used during exome sequencing? [-]
Samples were sequenced at two centers - the University of Washington and the Broad Institute. At the University of Washington exome capture was performed using Roche/Nimblegen capture, and at the Broad using Agilent reagents.

What calling algorithm was used to generate the ESP6500 data? [-]
All SNP data were called simultaneously at the University of Michigan (Abecasis Laboratory). The Michigan SNP calling pipeline is available here.
All INDEL data were analyzed at the Broad Institute (by the Genome Sequencing and Analysis group) using the GATK variation discovery pipeline following the guidelines in the GATK best practices v4.

Which version of the dbSNP build contains the variants from the NHLBI ESP project? [-]
The dbSNP build-134 contains a subset of the SNPs from the NHLBI-ESP project. The complete set of the SNP calls from the NHLBI ESP project is included in the dbSNP build-138. Some of the earlier submission entries in the build 134 are updated or removed in the build 138. The INDEL calls from the NHLBI ESP project are experimental, and are not submitted to the dbSNP yet.

Contact us:

evsserver@uw.edu

The following downloadable files contain SNPs, INDELs and coverage data for the ESP 6500 exomes (chromosomes 1-22, X, and Y).

The bulk files of the ESP 6500 exome data below are still primarily in GRCh37 (or HG19), the GRCh38 lifted-over positions are added in an extra column in the text file, or in an extra attribute in the INFO field in the VCF file.
ESP6500SI-V2-SSA137.GRCh38-liftover.snps_indels.txt.tar.gz (variants and annotations in space-delimited text format)
ESP6500SI-V2-SSA137.GRCh38-liftover.snps_indels.vcf.tar.gz (variants and annotations in VCF format - same data as the *.snps_indels.txt.tar.gz listed above)
ESP6500SI-V2.GRCh38-liftover.coverage.all_sites.txt.tar.gz (sequencing coverage data for every site in our targets)
ESP6500SI-V2.coverage.seq_blocks.txt.tar.gz (sequencing coverage summary data for every squencing block in our targets)

agilent_nimblegen_exome_targets_esp_project.tar.gz (the original exome target files used in the ESP project)
Privacy Terms