Column Description for Variant Summary Table |
Variant Pos:The SNV location on the chromosome (NCBI 37 or hg19) is 1-based. The INDEL location is also 1-based, but it is reported as 1-base before the actual insersion/deletion event.rs ID:dbSNP reference SNP identifier (if available)Alleles:The alleles are listed in the HGVS variant notation, for the ESP project, it always refers to a change from a reference allele to an alternate allele. For INDELs, the alleles are listed with aliases, such as, A1, A2, or An refering to the N-th alternate allele while R refers to the reference allele.EA Allele CountThe observed allele counts for the listed alleles in European American population (delimited by /). For INDELs, the alleles are listed with aliases, such as, A1, A2, or An refering to the N-th alternate allele while R refers to the reference allele.AA Allele CountThe observed allele counts for the listed alleles in African American population (delimited by /). For INDELs, the alleles are listed with aliases, such as, A1, A2, or An refering to the N-th alternate allele while R refers to the reference allele.Allele CountThe observed allele counts for the listed alleles in all populations (delimited by /). For INDELs, the alleles are listed with aliases, such as, A1, A2, or An refering to the N-th alternate allele while R refers to the reference allele.MAF (%) (EA/AA/All):the minor-allele frequency in percent listed in the order of European American (EA), African American(AA) and all populations (All) (delimited by /). For the multi-allelic variants, the MAF is defined as the allele frequency in percent for all the minor alleles.EA Genotype CountThe observed genotype counts for the listed genotypes in European American population (delimited by /). For INDELs, the alleles are listed with aliases, such as, A1, A2, or An refering to the N-th alternate allele while R refers to the reference allele.AA Genotype CountThe observed genotype counts for the listed genotypes in African American population (delimited by /). For INDELs, the alleles are listed with aliases, such as, A1, A2, or An refering to the N-th alternate allele while R refers to the reference allele.Genotype CountThe observed genotype counts for the listed alleles in all populations (delimited by /). For INDELs, the alleles are listed with aliases, such as, A1, A2, or An refering to the N-th alternate allele while R refers to the reference allele.Avg. Sample Read Depth:the average sample read depthGenes:one or more genes for which the SNP is in the coding region based on NCBI Gene.mRNA Accession #:NCBI mRNA transcripts accession numberGVS Function:the GVS functions are calculated locally and stored in our local database; they are based on the alleles for all populations and individuals; the bases in the coding region are divided into codons (if a multiple of 3), and the resulting amino acids are examined:
cDNA Change:Variant represented in the HGVS notation at the coding DNA level for a transcript.cDNA Size:The size of the coding DNA for a transcript.Protein Change:A protein change represented in the HGVS notation is translated based on the specific transcript listed in the column of "mRNA Accession".NCBI 37 Allele:The allele of the NCBI human reference sequence (also hg19).Chimp Allele:Chimp alleles are acquired from UCSC human/chimp alignment files. If the variation does not fall within an alignment block, or if it is an indel, the chimp allele is listed as "unknown". If the variation falls within a gap in the alignment, it is listed as "-".Conservation (phastCons):a number between 0 and 1 that describes the degree of sequence conservation among 17 vertebrate species; these numbers are downloaded from the UCSC Genome site and are defined as the "posterior probability that the corresponding alignment column was generated by the conserved state of the phylo-HMM, given the model parameters and the multiple alignment" (see UCSC description).Conservation (GERP):The Genomic Evolutionary Rate Profiling (GERP) score was obtained from the GERP website in September of 2011. It ranges from -12.3 to 6.17, with 6.17 being the most conserved. The detailed description can be found in this publication.Grantham ScoreGrantham Scores categorize codon replacements into classes of increasing chemical dissimilarity based on the publicationby Granthan R.in 1974, Amino acid difference formula to help explain protein evolution. Science 1974 185:862-864.PolyPhen2 (Class:Score):Prediction of possible impact of an amino acid substitution on protein structure and function based on Polymorphism Phenotyping (PolyPhen2) program. It lists both the PolyPhen2 prediction class and the PolyPhen2 score separated by a ":".Clinical Link:The potential clinical implications associated with a SNP (limited).On Exome Chip:Whether a SNP is on Illumina HumanExome chip.Filter Status:A machine-learning technique called support vector machine (SVM) classification was applied for SNP variant filtering. After the initial SNP calls were generated, we re-examined the BAM files to collect additional information about each variant site. Based on the information, variants are initially filtered by individual thresholds. For example, variants with posterior probability <99% (glfMultiples SNP quality <20), were <5bp away from an indel detected in the 1000 Genomes Pilot Project, had total depth across samples of <5,379 or >5,379,000 reads (~1-1000 reads per sample), having >65% of reads as heterozygotes carrying the variant allele or where the absolute squared correlation between allele (variant or reference) and strand (forward or reverse) was >0.15 were marked as problematic SNPs. Sites failed 3 or more criteria are used as negative examples to train SVM classifier. HapMap3 and OMNI polymorphic sites were used as positive examples. The SVM classifier produces scores for each site, and we marked ~8.5% of sites at threshold 0.3 as SVM filter-failed. The unfiltered set had Ti/Tv = 2.63, and the filtered set had Ti/Tv =2.78.
GWAS Hits:Link to known PubMed records of GWAS studies associated to a SNP based on NHGRI gwascatalog.txt.EA Est. Age (kyrs) and AA Est. Age (kyrs):The Esitmated variant age in the European-American and the African-American populations in kilo-years from the study published in Nature 493: 216-220, 2013 by Fu W, O'Connor TD, Jun G, Kang HM, Abecasis G, et al. Analysis of 6,515 exomes reveals a very recent origin of most human protein-coding variants.GRCh38 Position:A GRCh38 chromosomal position which is liftovered from a variant original GRCH37 chromosomal position. |
Description of Sequence Coverage
Several important variables from second-generation sequencing are used for identifying sequence variants using second-generation sequencing. Most importantly a minimum read depth (coverage) is needed at each genomic position for a given individual. For one individual, the criterion for a position to be considered covered is that the read depth must be 8 or higher. For a read (of about 76 bp) to be counted, it must have a mapping quality of 20 or higher. For a single read base-call to be counted, the sequence base quality must be 20 or higher.
To consolidate the results for all individuals, blocks of regions (contiguous chromosome locations) were identified, for which at least one individual was covered for each location in the block. The number of individuals covered was averaged over the block locations, and a standard deviation was calculated. In addition, the read depths were averaged over the number of individuals and over the block locations, and again a standard deviation was calculated.
Column Description for Summary of Coverage table |
Chromosome, Chr. Start Pos., and Chr. Stop Pos.:a contiguous region on a chromosome for a block as described above.Sequencing Span:the length of a sequencing block in base-pairs (bp).# of Samples Covered:the average number of samples considered to be covered, averaged over the block locations.Avg. Sample Read Depth:the read depths averaged over individuals and block locations.# of EA Samples Covered:the average number of EuropeanAmerican samples considered to be covered, averaged over the block locations.Avg. EA Sample Read Depth:the read depths averaged over EuropeanAmerican individuals and block locations.# of AA Samples Covered:the average number of AfricanAmerican samples considered to be covered, averaged over the block locations.Avg. AA Sample Read Depth:the read depths averaged over AfricanAmerican individuals and block locations. |