The CAMDA Contest Challenges

CAMDA 2017 presents:

CAMDA encourages an open contest, where all analyses of the contest data sets are of interest, not limited to the questions suggested here. There is an online forum for the free discussion of the contest data sets and their analysis, in which you are encouraged to participate.

We look forward to a lively contest!

MetaSUB Inter-City Challenge

Founded in 2015, the MetaSUB International Consortium aims to create the world's only longitudinal metagenomic map of mass-transit systems and other public spaces across the globe. While Global City Sampling Day covered over 40 locations, the consortium has partnered with CAMDA for the first-ever multi-city analysis, with an early release of New York City, Boston, and Sacramento exclusive to MetaSUB and CAMDA.

Analysis suggestions:

Biological: Compare organism fingerprints from public places across cities. Investigate organism sequences and biodiversity vs location. How diverse is each city – in terms of the numbers of bacteria, eukaryotes, viruses, plasmids, and antimicrobial resistance markers (AMRs)? Can we determine if each city has a distinct AMR profile?
Technical: Which computational tools have the highest sensitivity and specificity for species detection?

Data download For this challenge, raw data are provided together with sample description file. Participants who want to use this dataset must read and accept the data download agreement for access.

Neuroblastoma Data Integration Challenge

Examine the power of data integration in a real-world clinical setting. Neuroblastoma is the most common extracranial solid tumor in children. The base study compared RNA-seq and Agilent microarray gene expression profiles for clinical endpoint prediction of 498 children patients (FDA SEQC - Zhang et al, Genome Biology 2015). The published summary data are complemented by raw signal level data for gene expression arrays, RNA-Seq expression profiles, and extended clinical meta-data (event-free & overall survival times, multiple prognostic markers, therapy data). For this challenge, we newly provide matched aCGH data for 145 of these patients for CNV/CNA analysis (Fischer lab, Köln - Stigliani et al, Neoplasia 2012, Coco at el, IJC 2012, Kocak et al, Cell Death Dis 2013, Theissen et al, Genes Chromosomes Cancer 2014).

Analysis suggestions:

Technical: Efficient data integration, both inter-type – gene expression, CNV/CNA – and intra-type – combining the expression profiles from complementary high-throughput technologies (RNA-seq and microarrays).
Biological: Better survival time prediction by effective data integration or improved models including alternative gene transcripts. Advance our understanding of the mechanisms behind cancer progression or therapy response by effective data integration, a first comprehensive transcript level analysis, or novel functional (network/pathway) analysis.

Data download For this challenge, raw microarray data (expression and CGH arrays) as well as RNA-Seq expression profiles are provided together with sample description file. Participants who want to use this dataset must read and accept the data download agreement for access. In addition, raw RNA-Seq reads can optionally be made available on completion of an ethical use agreement with the University of Cologne (Köln).

Oxford Nanopore ‘Wiggle Space’ Challenge

Several gut microbiota samples had their DNA sequenced both by Nanopore long read next-next-generation sequencing as well as more established Illumina sequencing technology (Mason lab, New York, original unpublished data). Additional ‘mystery’ samples provide an independent blind test.

Questions of interest include, but are not limited to

Technical: Improve base-calling, assembly, and signal level models of the Nanopore data with the reference sequences and/or Illumina sequencing serving as benchmark. We have samples with biological and samples with technical replicates.
Biological: Meta-genomics: Detection, discrimination, and abundance quantification of species. For some training samples, relative abundances are known (synthetic mixes). Sequence / functional predictive analysis of pathogenicity. And: Analysis and identification of the ‘mystery’ sample!