目录摘要........................................................................................................................IABSTRACT..........................................................................................................II一、绪论...............................................................................................................1(一)研究背景(二)研究目的与意义(三)研究思路与论文结构二、文献综述........................................................................................................3(一)国内相关研究(二)国外相关研究(三)研究评述三、基于TF-IDF的特征提取..............................................................................5(一)TF-IDF理论基础(二)数据爬取及预处理(三)基于TF-IDF的职位特征提取四、基于决策树的数据挖掘..............................................................................11(一)C4.5算法原型(二)基于决策树算法的职位匹配(三)匹配职位决策树生成五、实验结果分析..............................................................................................16(一)检验指标(二)检验说明(三)检验结果六、总结与展望..................................................................................................18参考文献..............................................................................................................20附录外文译文两篇............................................................................................22致谢......................................................................................................................41数据挖掘技术在求职市场的应用---以IT及金融行业为例摘要本文研究了当下人力资源市场的供需现状以及存在的问题,并探寻了数据挖掘技术在人力资源市场的应用和存在的必要性,旨在通过数据挖掘技术,帮助求职者有效地与职位进行匹配。首先,本文通过Scrapy爬虫框架,从智聘网爬取了约1万条职位数据,对Java、前端、PHP三种类别的数据从薪水、教育背景、工作年限等多个维度进行了分析;其次,本文使用TF-IDF算法挖掘出职位的技能特征,并将其作为后续的匹配标准;然后,通过决策树C4.5算法,本文构建了职位匹配模型;最后,本文验证了模型的可靠性并分析了匹配规则,发现了职位数据各维度之间的潜在关系,并以此为求职者提供如何基于自身背景选择合适职位的参考规则以及各职位的核心技能特征。关键词:TF-IDFC4.5算法职位匹配TheApplicationofDataMiningTechnologyinHRMarket——TakingITFinancialMarketasanExampleABSTRACTThispaperstudiesthecurrentsituationofsupplyanddemandinthehumanresourcemarket,inadditiontoitsexistingproblems,andexplorestheapplicationofdataminingtechnologyinthehumanresourcesmarketandthenecessityofitsexistence,withtheaimtohelpjobseekerstoeffectivelyfindthebestsuitedpositionsthroughdataminingtechnology.Firstly,thispapercrawlsabout10,000jobdatafromZhipin.comthroughtheScrapyframework,andanalyzesthreecategoriesofdatainJava,front-end,andPHPfromthedimensionsofsalary,educationbackground,andworkingyears,etc.Secondly,thispaperusestheTF-IDFalgorithmtofindtheskillcharacteristicsofjobsandusesthemasthesubsequentmatchingcriterion.Thirdly,throughthedecisiontreeC4.5algorithm,thejobmatchingmodelbasedonthedecisiontreeisconstructed.Finally,thispaperanalyzesthereliabilityandmatchingrulesofthemodelandfindsoutthepotentialrelationshipsamongthedimensionsofjobdata,andprovidesjobseekerswithreferencerulesonhowtoselectsuitablepositionsbasedontheirownbackgroundandthecoreskillsofeachposition.Keywords:TF-IDFC4.5algorit...