Download Preprocessed KPGP Data

The KPGP data contains a monozygotic twin pair (KPGP88/KPGP89), a dizygotic twin pair (KPGP90/KPGP91) and a multicultural family (KPGP1-KPGP12) with a Caucasian female from US (KPGP10), a Korean father (KPGP9) and two children (KPGP11 and KPGP12). The relations are given in the following pedigree charts.

Data Preprocessing

The Korean Personal Genome Project (KPGP) is part of the international Personal Genome Project (PGP) established by Genome Research Foundation (GRF). 39 Human genomes were sequenced on an Illumina HiSeq 2000 platform with 30x to 40x coverage.

KGPG data (vcf format)

All Chromosomes vcf.gz (1008MB), tbi (3MB) Chromosome 1 vcf.gz (78MB), tbi (1MB) Chromosome 2 vcf.gz (80MB), tbi (1MB) Chromosome 3 vcf.gz (68MB), tbi (1MB)
Chromosome 4 vcf.gz (72MB), tbi (1MB)
Chromosome 5 vcf.gz (59MB), tbi (1MB)
Chromosome 6 vcf.gz (64MB), tbi (1MB)
Chromosome 7 vcf.gz (59MB), tbi (1MB)
Chromosome 8 vcf.gz (52MB), tbi (1MB)
Chromosome 9 vcf.gz (44MB), tbi (1MB)
Chromosome 10 vcf.gz (50MB), tbi (1MB)
Chromosome 11 vcf.gz (49MB), tbi (1MB)
Chromosome 12 vcf.gz (47MB), tbi (1MB)

Chromosome 13 vcf.gz (36MB), tbi (1MB)
Chromosome 14 vcf.gz (33MB), tbi (1MB)
Chromosome 15 vcf.gz (30MB), tbi (1MB)
Chromosome 16 vcf.gz (33MB), tbi (1MB)
Chromosome 17 vcf.gz (28MB), tbi (1MB)
Chromosome 18 vcf.gz (29MB), tbi (1MB)
Chromosome 19 vcf.gz (24MB), tbi (1MB)
Chromosome 20 vcf.gz (22MB), tbi (1MB)
Chromosome 21 vcf.gz (16MB), tbi (1MB)
Chromosome 22 vcf.gz (15MB), tbi (1MB)
Chromosome X vcf.gz (28MB), tbi (1MB)
Chromosome Y vcf.gz (4MB), tbi (1MB)

KPGP / 1000 Genomes merged
The genotypes of 38 Koreans and one Caucasian female are merged with the genotype data of the 1000 Genomes Project (for additional information see the pipeline). Due to the Fort Lauderdale agreement for pre-publication data, we only provide data for the merged chromosome 1 (3.1 million SNVs of 1,134 individuals).

KPGP / 1000 Genomes merged Chromosome 1 vcf.gz (11.7GB), tbi (1MB)

Participants who want to analyze other chromosomes need to download the corresponding vcf data from the 1000 Genomes Project and the KGPG data (vcf format) from the first download box. The example below shows, how chromosome 2 can be merged using these data sets.

vcf-merge kpgpB_chr2.vcf.gz ALL.chr2.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz | bgzip -c > kpgp1000_chr2.vcf.gz 
gunzip kpgp1000_chr2.vcf.gz
sed -e 's/\t\./\t0|0:0:0,0,0/g' kpgp1000_chr2.vcf > KPGP1000_chr2.vcf
bgzip KPGP1000_chr2.vcf
tabix -p vcf KPGP1000_chr2.vcf.gz

The command line tools bgzip, tabix and vcf-merge are included in samtools and VCF tools, respectively.


Atul Butte, MD, PhD
Atul Butte, MD, PhD
Stanford University School of Medicine

Nikolaus Rajewsky, PhD
Nikolaus Rajewsky, PhD
Max-Delbrück-Center for Molecular Medicine

Terry Speed, PhD
Terry Speed, PhD
The Walter and Eliza Hall Institute of Medical Research

Sandrine Dupoit, PhD
Sandrine Dudoit, PhD
University of California, Berkeley

John Quackenbush, PhD
John Quackenbush, PhD
Harvard School of Public Health

Eran Segal, PhD
Eran Segal, PhD
Weizmann Institute of Science

John Storey, PhD
John Storey, PhD
Princeton University

Chris Sander, PhD
Chris Sander, PhD
Memorial Sloan Kettering Cancer Center

Temple F. Smith, PhD
Temple F. Smith, PhD
Boston University

Curtis Huttenhower, PhD
Curtis Huttenhower, PhD
Harvard School of Public Health

Christopher E. Mason, PhD
Christopher E. Mason, PhD
Weill Cornell Medicine

Mick Watson, PhD
Mick Watson, PhD
The Roslin Institute

Extended Abstract Proposals Due19 May 2017
Notification of Accepted Contributions26 May 2017
Early Registration Closes15 Jun 2017
CAMDA2017 Conference22-23 Jul 2017
ISMB/ECCB 2017 Conference21–25 Jul 2017
Full Paper Submission Click to save the dates!24 Sep 2017

Agilent Technologies


Biology Direct