Chapter5StatisticalMethodsByJinn-YiYehPh.D.4/7/2009Outline5.1STATISTICALINFERENCE5.2ASSESSINGDIFFERENCESINDATASETS5.3BAYESIANINFERENCE5.4PREDICTIVEREGRESSION5.5ANALYSISOFVARIANCE5.6LOGISTICREGRESSION5.7LOG-LINEARMODELS5.8LINEARDISCRIMINANTANALYSIS5.1STATISTICALINFERENCEDescriptivestatisticsV.SStatisticalinferencePopulation,Sample,DatasetParameterV.SStatisticInferencemethods:estimation,andtestsofhypotheses5.1STATISTICALINFERENCE(cont.)Estimation:ThegoalistogaininformationfromadatasetTinordertoestimateoneormoreparameterswbelongingtothemodelofthereal-worldsystemf(X,w)5.1STATISTICALINFERENCE(cont.)statisticaltesting:todecidewhetherahypothesisconcerningthevalueofthepopulationcharacteristicshouldbeacceptedorrejectednullhypothesisV.Salternativehypothesis5.2ASSESSINGDIFFERENCESINDATASETScentraltendency1235.2ASSESSINGDIFFERENCESINDATASETS(cont.)datadispersion125.2ASSESSINGDIFFERENCESINDATASETS(cont.)BoxplotInmanystatisticalsoftwaretools,apopularlyusedvisualizationtoolofdescriptivestatisticalmeasuresforcentraltendencyanddispersion5.3BAYESIANINFERENCENaïveBayesianClassificationProcess(SimpleBayesianClassified)根據貝氏定理為基礎,用以判斷未知類別的資料應該最接近哪一個類別監督式學習方式(需訓練資料)P(H/X):事後機率P(H):事前機率5.3BAYESIANINFERENCE(cont.)GivenanadditionaldatasampleX(itsclassisunknown),itispossibletopredicttheclassforXusingthehighestconditionalprobabilityP(Ci/X)P(X):constantforallclasses,onlytheproductP(X/Ci)·P(Ci)needstobemaximizedP(Ci):Ci/m(mistotalnumberoftrainingsamples)5.3BAYESIANINFERENCE-exampleTable5.1:TrainingdatasetforaclassificationusingNaïveBayesianClassifierSampleAttribute1A1Attribute2A2Attribute3A3ClassC112112001132122412125012162222710115.3BAYESIANINFERENCE–example(cont.)Goal:topredictclassificationofthenewsampleX={1,2,2,class=?}maximizetheproductP(X/Ci)·P(Ci)fori=1,2Step1:computepriorprobabilitiesP(Ci)5.3BAYESIANINFERENCE–example(cont.)Step2:computeconditionalprobabilitiesP(xt/Ci)foreveryattributevaluegiveninthenewsampleX={1,2,2,C=?}5.3BAYESIANINFERENCE–example(cont.)Step3:Undertheassumptionofconditionalindependenceofattributes,computeconditionalprobabilitiesP(X/Ci)5.3BAYESIANINFERENCE–example(cont.)Finally:multiplyingtheseconditionalprobabilitieswithcorrespondingprioriprobabilitiesobtainvaluesproportional()toP(C≅i/X)andfindthemaximum5.4PREDICTIVEREGRESSIONThepredictionofcontinuousvaluescanbemodeledbyastatisticaltechniquecalledregression.RegressionanalysisistheprocessofdetermininghowavariableYisrelatedtooneormoreothervariablesX1,X2,…,Xn.Modelingthistypeofrelationshipisoftencalledlinearregression.Therelationshipthatfitsasetofdataischaracterizedbyapredictionmodelcalledaregressionequation.ThemostwidelyusedformoftheregressionmodelisthegenerallinearmodelformallywrittenasY=α+β1X1+β2X2+β3X3+…+βnXnSimpleregressionSimpleregression:Y=α+βXSSE:αandβ:MultipleregressionMultipleregression:Y=α+β1X1+β2X2+β3X3+…+βnXnSSE=(Y-β.X)’.(Y-β.X)δ(SSE)/δβ=0→β=(X’.X)-1(X’.Y)correlationcoefficientcorrelationcoefficient(cont.)Acorrelationcoefficientr=0.85indicatesagoodlinearrelationshipbetweentwovariables.Additionalinterpretationispossible.Becauser2=0.72,wecansaythatapproximately72%ofthevariationsinthevaluesofYisaccountedforbyalinearrelationshipwithX.5.5...