NHLBI Exome Sequencing Project (ESP)

Exome Variant Server

The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of next-generation sequencing of the protein coding regions of the human genome across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community to extend and enrich the diagnosis, management and treatment of heart, lung and blood disorders.

The groups participating and collaborating in the NHLBI GO ESP include:

The group includes some of the largest well-phenotyped populations in the United States, representing more than 200,000 individuals altogether from the:

Gene Name Search
gene name:
Beyond Your Target (optional)
upstream of gene (# of bases):
downstream of gene (# of bases):
 reset
Gene ID Search
gene ID:
Beyond Your Target (optional)
upstream of gene (# of bases):
downstream of gene (# of bases):
 reset
Chromosomal Location Search (hg19)
chromosome:
begin:
end:
 reset

Data Usage

We request that any use of data obtained from the NHLBI ESP Exome Variant Server be cited in publications.

Citation

Exome Variant Server,  NHLBI Exome Sequencing Project (ESP), Seattle, WA (URL: http://evs.gs.washington.edu/EVS/) [date (month, yr) accessed].

Acknowledgment for Publication

The authors would like to thank the NHLBI GO Exome Sequencing Project and its ongoing studies which produced and provided exome variant calls for comparison: the Lung GO Sequencing Project (HL-102923), the WHI Sequencing Project (HL-102924), the Broad GO Sequencing Project (HL-102925), the Seattle GO Sequencing Project (HL-102926) and the Heart GO Sequencing Project (HL-103010).

Public Data Release

The current EVS data release (ESP5400) is taken from 5379 samples drawn from multiple ESP cohorts and represent the first data freeze of the ESP exome variant data. All data were simultaneously analyzed for exome variants at the University of Michigan (Abecasis Laboratory). Sequences were aligned to NCBI build 37 human genome reference using BWA. PCR Duplicates were removed using Picard. Alignments were recalibrated using GATK. Lane-level indel realignments and base alignment quality (BAQ) adjustments were applied. SNPs were called using two-step approach. First, genotype likelihood files (GLFs) were generated using samtools pileup on individual BAM files. Next, we used glfMultiples - a multi-sample variant caller - to generate initial SNP calls. Details of the likelihood model implemented in glfMultiples are given in Li et al (Genome Research (2011) 21:940-951; section entitled “Identifying Potential Polymorphic Sites”). The Michigan SNP calling pipeline is available at: http://genome.sph.umich.edu/wiki/UMAKE. This pipeline makes diploid calls for pseudo-autosomal regions of male samples, and haploid calls for the rest of the chromosome. Female samples have diploid calls for all chrX regions. SNPs were filtered by a machine-learning technique called support vector machine (SVM) classification (detailed description, please see Filter Status). All SNPs were further annotated by SeattleSeqAnnotation134. A subset of this data (ESP2500) that has more stringent filtering criteria is available in the latest release of dbSNP (build 134). Analysis and publication of the ESP2500 is pending by the ESP Population Genetics and Statistical Analysis Working Group. Future data releases will incorporate up to ~7,000 exomes from the ESP.

Terms of Service

This web site is designed to disseminate exome sequencing data generated through federal funding from the NHLBI. This data is provided free-of-charge, provided the following permission statement is followed. There may be other information on the site, such as links to other sites, references to other project groups and federal grants. The University of Washington has no responsibility for these links and information.

Permission

The contents of the NHLBI ESP Exome Variant Server web site are intended for educational or research purposes. As stated above, a subset of exome variants on this website have been deposited in dbSNP, and the full dataset will be deposited in dbGaP as part of the ESP cohort data. We place no restrictions on the use of the data available from the EVS. You may download or copy the content and other downloadable items displayed on the Exome Variant Server portion of the web site, provided that in using the data, you follow the citation format given above.

How to Use the Data Browser

The current release has been tested successfully with Firefox v.3.0 and IE v.7.0. To use this site, your browser must have cookies and JavaScript enabled.
The gene model is that of NCBI, 2011. Chromosome positions are those of NCBI build 37 (UCSC hg19).
1. Search Type
There are three ways to query variations:
A. gene name (HUGO, upper or lower case)
B. gene ID (from NCBI Entrez Gene)
C. chromosomal location
For A and B, you have the option to extend the chromosome region. The choice "upstream" is on the 5' end, and "downstream" is on the 3' end of the gene.
When a search by gene name or gene ID is made, there are sometimes alternative transcripts. A region large enough to cover all transcripts is chosen.
2. Select Data
Two types of data, SNP information and sequencing coverage information, are ouput under separated tabs.
The SNP data are summarized by European American and African American populations. Select the populations you are interested in to display.
3. Display Results
Once the data sets are chosen, you can query SNPs using the ""display snp summary" button to aquire calculated values and annotations for the SNPs. The page "SNP Summary Columns" details the quantities displayed.
4. Download Results
Both SNP and coverage data can be downloaded using the downloading option listed on the top of the diaplying pages. SNP data can be downloaded in either text or vcf format. Both summary coverage data and detailed locus coverage data can be downloaded. The downloaded data are compressed in either gzip or zip format.

If you import the text-formatted file to Excel, it will be necessary to choose "Data/Get External Data/Import Text File" and select "Delimited" and "Space". Make sure import all columns as text in Excel.

How to Run Batch Query

1. Download Batch Query Program
Please download the command-line client program here.
2. Run Batch Query Program
Java 6 is required to run the command-line client program. You can learn how to use the command-line client program by running the following,
    java   -jar   YOUR_DOWNLOADED_EVS_CLIENT_JAR_FILE   -h

Current EVS Release Version: v.0.0.13. (April 22, 2012)

Changes made in EVS-v.0.0.13, (April 22, 2012)
   1) Make coverage data available under the "Coverage Results" tab even when there is no SNPs existing in the queried regions.

Current EVS Release Version: v.0.0.12. (Feb. 15, 2012)

Changes made in EVS-v.0.0.12, (Feb. 15, 2012)
   1) Add links for gene-level information to external resources such as NHGRI Catalog of Published Genome-Wide Association Studies (GWAS), gene pathways from Kyoto Encyclopedia of Genes and Genomes (KEGG), and Sanger Institute Catalogue Of Somatic Mutations In Cancer (COSMIC).
   2) Add GWAS hit information to variant annotations if variants are included in specific Genome-Wide Association Studies.

Changes made in the web service of EVS (wsEVS-v.0.0.5) (Feb. 15, 2012)
   1) Add "gwasPubmedIds" element to "snpData" complexType for specific GWAS hits.
   2) Add "locus" complexType to "evsData" complexType. The "locus" contains an element, "keggPathwayIds", for KEGG gene pathway IDs.

Changes made in evsClient-v.0.0.6 (Feb. 15, 2012)
   1) SNP Output files contain PubMed IDs of known GWAS variant hits and known KEGG gene pathway IDs.
   2) The new version (v.0.0.6) of EVS batch-mode query client program can be downloaded here.


Current EVS Release Version: v.0.0.11. (Jan. 13, 2012)

Changes made in EVS-v.0.0.11, (Jan. 13, 2012)
   1)Add another track for EVS SNPs in UCSC browser view. When the track is viewed in the pack or full mode, the rsID is displayed right next to each known SNP while the SVM-model-based filtering status is displayed for each novel SNP from ESP project.

Release Version: v.0.0.10. (Dec. 10, 2011)

Changes made in EVS-v.0.0.10, (Dec. 10, 2011)
   1) Bulk download of all SNPs and coverage data for the ESP 5400 exomes (chromosomes 1-22, and X) is available under the "Downloads" tab on this page.
   2) The URLs with embedded ";" for Clinical Association (CA) in VCF INFO fields are modified with "%3B", the hex representation of ";", since ";" is a VCF reserved character.

Release of batch-query eveClient-v.0.0.5 (Dec. 10, 2011)

   The new version (v.0.0.5) of EVS batch-mode query client program can be downloaded here.

Changes made in evsClient-v.0.0.5
   1) The URLs with embedded ";" for Clinical Association (CA) in VCF INFO fields are modified with "%3B", the hex representation of ";".
   2) The URLs listed under the "ClinicalInfo" column in the wsEVS_SNP_download_*.txt files are separated by "|" instead of "," since comma is sometimes embedded in some URLs too.

Change made in the EVS web service v.0.0.4 (Nov. 29, 2011)
   1) Add an element of "evsWebServiceVersion" in "evsData" data type.
   2) Add an element of "evsDataSourceVersion" in "evsData" data type.

Release Version: v.0.0.9.   (Nov. 22, 2011)

ESP 5400 exome data initial release (Nov. 22, 2011)

Changes made in v.0.0.9, (Nov. 22, 2011)
   1) SNPs from ESP 5400 exome data.
   2) Genotype counts are included in SNP summary tables
   3) Illumina HumanExome Chip status for each SNPs are included in SNP summary table.
   4) About 90% missense SNPs are annotated with PolyPhen.
   5) Add a "All site coverages" button on the top of the coverage stats to make it easier for users to download the detailed coverage data for every sites in the region (or gene) they are intersted in.
   6) Add sorting by mRNA accession for SNP summary table.
   7) For batch-query users, a new version (v.0.0.4) of EVS batch-mode query client program can be downloaded here.

New Batch-mode query client package release (Oct. 11, 2011)

    A new version (v.0.0.3) of EVS batch-mode query client program can be downloaded here. This version calls the EVS web service through http://evs.gs.washington.edu/wsEVS/EVSDataQueryService?wsdl.

Changes made in v.0.0.8, (Sept. 30, 2011)
   1) Update the dbSNP rs_ids in the EVS database with the dbSNP version 131. Previously, there was only a subset of dbSNP rs_ids from dbSNP 131 in our EVS database.
   2)The average sample sequence read depth can now be viewed graphically thru UCSC genome browser.

Changes made in v.0.0.7, (Sept. 21, 2011)
   1) Change the allele count format for the "EuropeanAmerican Allele #", the "AfricanAmerican Allele #" and the "All Allele #" columns from a list of allele counts delimited by "/", e.g., 1/2647, to a list of allele=count pairs delimited by "/", e.g., C=1/T=2647. This change will affect the display on the website, the text-format file-downloading and the output text-format files from EVS batch query.
   2) A new version (v.0.0.2) of EVS batch-mode query client program can be downloaded here. This version incorporates the change of the allele count format described in 1) above.
   3) For batch-mode query users, if you wish to be notified of our scheduled outages, please visit this site:

      https://mailman1.u.washington.edu/mailman/listinfo/gvsnotify

      This is a moderated site, so approval of subscriptions will only be made on weekdays. The mailing list is set up one-way: no posting by subscribers.

Changes made in v.0.0.6, (Sept. 9, 2011)
   1) Add sorting ability in SNP summary page. The SNPs can be sorted by chromosome position, minor allele count, GVS function annotation, or conservation scores.

Batch-mode query release (August 11, 2011)

To accommodate many requests for batch-mode query to the EVS database, a web service is deployed on one of our servers, and a corresponding command-line client program is released for batch query to the EVS server. Please download the command-line client program here.

Java 6 is required to run the command-line client program. You can learn how to use the command-line client program by running the following,
    java   -jar   YOUR_DOWNLOADED_EVS_CLIENT_JAR_FILE  -h

Changes made in v.0.0.5, (August 11, 2011)
   1) Fix a bug in outputting summary coverage blocks which missed the very last block, and the sequencing span for each block should be 1 bp longer.

Changes made in v.0.0.4, (July 29, 2011)
   1) Release coverage data.
   2) Add downloading options for variation output and coverage ouput.

Contact us:

evsserver@uw.edu

The following downloadable files contain SNPs and coverage data for the ESP 5400 exomes (chromosomes 1-22, and X).

ESP5400.snps.txt.tar.gz (SNPs and annotations in space-delimited text format)
ESP5400.snps.vcf.tar.gz (SNPs and annotations in VCF format - same data as ESP5400.snps.txt.gz)
ESP5400.coverage.all_sites.txt.tar.gz (sequencing coverage data for every site in our targets)
ESP5400.coverage.seq_blocks.txt.tar.gz (sequencing coverage summary data for every squencing block in our targets)