Comparative Analysis for Oral Cleft Trio Data
MetadataShow full item record
The goal of this study is to compare data generated from two sequencing studies aimed at determining genetic causes of oral-facial clefts (OFCs) to assess whether the data are of similar quality and information content, and therefore could be combined to increase power for tests of genetic association. The purpose of this study is to find a reasonable approach to combine the two data sets to gain more statistical power in further studies. The first data set is a from a previously published targeted sequencing (TS) study, which focused on 13 candidate genetic regions previously linked to or associated with risk to OFCs and included 1409 case-parent trios of different population backgrounds, including 374 European trios, who we focus on here. The recently generated whole-genome sequencing (WGS) data was collected as part of the Gabriella Miller Kids First initiative, and contains 1136 individuals (in approximately 378 case-parent trios) of European ancestry. We started by performing data cleaning of the WGS data based on the same quality control (QC) steps from the TS study, producing a clean data set of 981 individuals (in 327 trios). We then compared variant sets, and assessed concordance of genotype calls in individuals who were duplicated across the TS and WGS data sets (n=402 in 134 trios). We then generated results from the genotypic transmission-disequilibrium test (gTDT) at common variants (minor allele frequency (MAF) 0.01), along with visualizations of patterns of linkage disequilibrium (LD) in these two data sets, with a focus on the region 8q24, which has previously been strongly associated with OFC among Europeans. Overall, good concordance and high similarity were observed in the sequence variants found in both data sets. We found that combining the TS data and the WGS data provides increased power to detect association in the 8q24 region. Future work will be undertaken on take this combined data set and will be used to perform more detailed comparative analysis across populations in this region.