一、考虑表中二元分类问题得训练样本集1.整个训练样本集关于类属性得熵就是多少?2.关于这些训练集中a 1,a2 得信息增益就是多少?3.对于连续属性 a3,计算所有可能得划分得信息增益.4.根据信息增益,a1,a2,a3 哪个就是最佳划分?5.根据分类错误率,a1,a2 哪具最佳?6.根据 gini 指标,a1,a 2哪个最佳?答 1、P(+) = 4/9 a nd P(−) = 5/9−4/9 log2(4/9) − 5/9 lo g2(5/9) = 0、9911、答 2:(估量不考)答 3:答 4: According to informati o n gain, a 1 p rodu c es the best s pl it、答 5:For attribute a1: error rate = 2/9、For attribute a2: error rate = 4/9、T h erefore, acc o r d ing to error rat e, a 1 produces t h e best split、答 6:二、考虑如下二元分类问题得数据集 1.计算a、b信息增益,决策树归纳算法会选用哪个属性2.计算 a、b gin i指标,决策树归纳会用哪个属性?这个答案没问题3.从图4-1 3 可以瞧出熵与 gini 指标在[0,0、5]都就是单调递增,而[0、5,1]之间单调递减。有没有可能信息增益与g in i指标增益支持不同得属性?解释您得理由Yes, even though these measures have similar range and monotonousbehavior, their respective gains, Δ, which are scaled differences of themeasures, do not necessarily behave in the same way, as illustrated bythe resul t s i n p arts (a) a nd (b)、贝叶斯分类1.P(A = 1|−) = 2/5 = 0、4, P(B = 1|−) = 2/5 = 0、4,P(C = 1|−) = 1, P(A = 0|−) = 3/5 = 0、6,P(B = 0|−) = 3/5 = 0、6, P(C = 0|−) = 0; P(A = 1|+) = 3/5 = 0、6,P(B = 1|+) = 1/5 = 0、2, P(C = 1|+) = 2/5 = 0、4,P(A = 0|+) = 2/5 = 0、4, P(B = 0|+) = 4/5 = 0、8,P(C = 0|+) = 3/5 = 0、6、2.3.P(A = 0|+) = (2 + 2)/(5 + 4) = 4/9,P(A = 0|−) = (3+2)/(5 + 4) = 5/9,P(B = 1|+) = (1 + 2)/(5 + 4) = 3/9,P(B = 1|−) = (2+2)/(5 ...