2025年2月15日DataMining:ConceptsandTechniques1DataMining:ConceptsandTechniquesConceptDescription:CharacterizationandComparison2025年2月15日DataMining:ConceptsandTechniques2Chapter5:ConceptDescription:CharacterizationandComparisonWhatisconceptdescription?Datageneralizationandsummarization-basedcharacterizationAnalyticalcharacterization:AnalysisofattributerelevanceMiningclasscomparisons:DiscriminatingbetweendifferentclassesMiningdescriptivestatisticalmeasuresinlargedatabasesDiscussionSummary2025年2月15日DataMining:ConceptsandTechniquesWhatisConceptDescription?Descriptivevs.predictivedataminingDescriptivemining:describesdatasetsinconcise,summarative,informative,discriminativeformsPredictivemining:analyzesknowndata,constructsmodelsforthedatabase,andpredictsthetrendandpropertiesofunknowndataConcept/classdescription:aconceptreferstoacollectionofdata,e.g.frequent_buyers,graduate_studentsCharacterization:providesaconciseandsuccinctsummarizationofthegivencollectionofdataComparison/discrimination:providesdescriptionscomparingtwoormorecollectionsofdata2025年2月15日DataMining:ConceptsandTechniques4ConceptDescriptionvs.OLAPConceptdescription:canhandlecomplexdatatypesoftheattributesandtheiraggregationsamoreautomatedprocessOLAP:restrictedtononnumericdimensionsandnumericmeasuresuser-controlledprocessFutureintegrationisexpected2025年2月15日DataMining:ConceptsandTechniques5Chapter5:ConceptDescription:CharacterizationandComparisonWhatisconceptdescription?Datageneralizationandsummarization-basedcharacterizationAnalyticalcharacterization:AnalysisofattributerelevanceMiningclasscomparisons:DiscriminatingbetweendifferentclassesMiningdescriptivestatisticalmeasuresinlargedatabasesDiscussionSummary2025年2月15日DataMining:ConceptsandTechniques6DataGeneralizationandSummarization-basedCharacterizationDatageneralizationAprocesswhichabstractsalargesetoftask-relevantdatainadatabasefromrelativelylowconceptuallevelstohigherones.Approaches:Datacubeapproach(OLAPapproach)Attribute-orientedinductionapproach12345Conceptuallevels2025年2月15日DataMining:ConceptsandTechniques7Characterization:DataCubeApproach(withoutusingAO-Induction)Datawarehouse-based,precomputation-oriented,materialized-viewapproachStrengthAnefficientimplementationofdatageneralizationComputationofvariouskindsofmeasurese.g.,count(),sum(),average(),max()Generalizationandspecializationcanbeperformedonadatacubebyroll-upanddrill-downLimitationshandleonlydimensionsofsimplenonnumericdataandmeasuresofsimpleaggregatednumericvalues.Lackofintelligentanalysis,can’ttellwhichdimensionsshouldbeusedandwhatlevelsshouldthegeneralizationreach2025年2月15日DataMining:ConceptsandTechniques8Attribute-OrientedInductionProposedin1989(KDD’89workshop)NotconfinedtocategoricaldatanorparticularmeasuresHowisitdone?Collectthetask-relevantdatausingarelationaldatabasequeryPerformgeneralizationbyattributeremovalorattributegeneralization.Applyaggregationbymergingidenticalgeneralizedtuplesandaccumulatingtheirrespectivecounts.Maptheresultinggeneralizedrelationintodifferentformsforpresentationtousers.2025年2月15日DataMining:ConceptsandTechniquesBasicPrinciplesofAttribute-OrientedInductionDatafocusing:task-relevantdata,includingdimensions,andtheresultistheinitialworkingrelation.Att...