DesignConsiderationsinLarge-ScaleGeneticAssociationStudiesMichaelBoehnke,AndrewSkol,LauraScott,CristenWiller,GonçaloAbecasis,AnneJackson,andtheFUSIONStudyInvestigatorsDepartmentofBiostatisticsCenterforStatisticalGeneticsUniversityofMichiganOutline•AssesstheutilityofHapMapsamplesfortagSNPselectioninastudyoftype2diabetesinFinnishsubjects•Discusstheimpactofseveraldesignfactorsoncostandefficiencyofgenome-wideassociation(GWA)studiesFUSIONStudy:Finland-UnitedStatesInvestigationofNIDDMGeneticsNationalPublicHealthInstitute,HelsinkiUSCKeckSchoolofMedicine,LosAngelesNationalHumanGenomeResearchInstitute,BethesdaUniversityofMichiganSchoolofPublicHealth,AnnArborUniversityofNorthCarolinaSchoolofMedicine,ChapelHillChromosome14SNPSelection•UsedearlyHapMap(May2004)toselecttagSNPsin18Mblinkageintervalonchr14•MAF>.05,Illuminadesignscore>.40•UnselectedSNPshadr2>.8with1tagSNP•Addedannotation-basedSNPs•Doubletaggedlargebins,filledlargegapsChromosome14SNPSelectionHapMapSNPsinregion(MAF>.05)2276HapMaptagSNPs(r2>.8)1132Annotation-basedSNPs28Double-tagSNPs(largebins)11Gap-fillingSNPs211TotalSNPsattempted1382TotalSNPsgenotyped1230TotalSNPspolymorphicandinHWE1192UtilityofHapMapfortagSNPSelectionforFinnishSubjects•Question:Howcomparablewereallele,haplotypefrequencyandr2inHapMap,Finnishdata?•ComparedHapMapdataand1448FinnishsamplesfromFUSIONandFinrisk2002studies•Poster1621,Willeretal.,Friday1:303:30pmAlleleFrequencies:FUSIONvs.HapMapCEUYRICHBJPTAlleleFrequencies:FUSIONvs.CEU7.5%SNPfrequenciesdifferatp<.01r=.98LDr2:FUSIONvs.CEUr=.91HaplotypeFrequencies:FUSIONvs.CEUr=.97Summary:Chromosome14SNPSelection•CEUexcellentbasisfortagSNPselectioninFinns•Strongcorrelationbetweenallelefrequencies,haplotypefrequencies,LDintwosamples•Excessofsignificantalleleandhaplotypefrequencydifferences(7%at.01level),butmostlysmall•Nearlyallcommonhaplotypes(frequency>.05)inonesamplepresentinbothsamples–579/583fromCEUinFUSION–557/563fromFUSIONinCEUDesignofGenome-wideAssociationStudies•GWAprovidesunprecedentedopportunitytoidentifygeneticvariantspredisposingtodisease•EnabledbyHapMap,genotypingcosts•Sincewemaytype100s-1000sofsampleson100KsofSNPs,efficientstudydesigncritical•Examinetwo-stagedesignsforlarge-scalegeneticstudies(seeSatagopan,Elston,Thomas)1,2,3,………………………,N1,2,3,……………………………,MSNPsSamplesOne-StageDesignOne-StageDesignStage1Stage2samplesmarkersTwo-StageDesignTwo-StageDesign1,2,3,……………………………,MSNPsSamples1,2,3,………………………,NOne-andTwo-StageGWADesignsSNPsSamplesReplication-basedanalysisSNPsSamplesStage1Stage2One-StageDesignOne-StageDesignJointanalysisSNPsSamplesStage1Stage2Two-StageDesignTwo-StageDesignJointAnalysisisMorePowerfulthanReplication-BasedAnalysisSkoletal.,Friday8:45,180,Hall3300,000markersgenotypedon1000cases,1000controlsMultiplicativemodel,prevalence10%,GRR=1.4One-stagepowerFactorsthatInfluenceCostandEfficiencyofGWAs•FractionsamplestypedinStage1(samples)•FractionSNPstypedinStage2(markers)•Stage2toStage1pergenotypecostratio(R)Foratwo-stageGWAstudy,whatistheoptimalfractionofsamplesgenotypedinStage1(samples)?Stage2pergenotypecostR=Stage1pergenotypecostCase1:R=1Case2:R=1,2,5,10CostasaFunctionofSamplesTypedinStage1PerGenotypeCostRatioR=1FractionofMarkersFollowed-upVariestoEnsureConstantPowerForatwo-stageGWAstudy,whatistheoptimalfractionofsamplesgenotypedinStage1(samples)?Stage2pergenotypecostR=Stage1pergenotypecostCase1:R=1Case2:R=1,2,5,10CostasaFunctionofSamplesTypedinStage1PerGenotypeCostRatioR=1,2,5,10FractionofMarkersFollowed-upVariestoEnsureConstantPowerR=10R=1R=5R=2Summary:Two-StageGWADesigns•Two-stageGWAdesignsefficient,cost-effective;jointanalysismorepowerfulthanreplication•ForequalStage1,2pergenotypecosts(R=1),250KSNPs,genomewidesignificance=.05,genotype20-30%ofsamplesinStage1•ForR>1,lessstringentsignificance,fewerSNPs,genotype30-40%SNPsinStage1Acknowledgements•Chromosome14:CristenWiller,AnneJackson;FUSION,CIDR,andHapMapinvestigators•Two-stagedesigns:AndrewSkol,LauraScott,GonçaloAbecasis•Thanks!Excludedslidesfollow012304080120Position(cM)MLSFUSION1:495ASPfamiliesFUSION2:242ASPfamiliesFUSION1+2FUSIONChromosome14T2DLinkagePowerofOne-andTwo-StageDesignsHowdoesachangeinsignificancelevelchangetheoptimalproportionofsamplesinStage1(samples)?Case1:=.05/250,000genomewidesignificanceCase2:=10/250,000lessstringentsignificanceCase2’:=.05/1,250candidategenesignificanceImpactofSignificanceLevelonOptimalProportionofSamplesinStage1