三七数据大数据技术解决方案目录1概述................................................................................................................................................62面临的挑战.............................................................................................................................62.1数据采集..............................................................................................................................102.2数据清洗..............................................................................................................................102.3数据存储..............................................................................................................................122.4数据并行处理......................................................................................................................122.5数据分析..............................................................................................................................122.6可视化..................................................................................................................................122.7传统解决方案的分析..........................................................................................................123相关技术的研究....................................................................................................................123.1参考模型框架......................................................................................................................123.2数据采集..............................................................................................................................123.2.1结构化数据的采集......................................................................................................123.2.2半结构化数据的采集..................................................................................................123.2.3非结构化文本数据中信息的抽取...............................................................................153.3数据清洗和数据质量的保证...............................................................................................153.3.1数据质量的概念及分类..............................................................................................153.3.2数据清洗的原理..........................................................................................................183.3.3单数据源中的数据清洗..............................................................................................203.4数据的集成和融合..............................................................................................................373.4.1多数据源集成问题的分类..........................................................................................383.4.2数据标准化的研究......................................................................................................403.4.3数据集成的流程..........................................................................................................413.4.4多数据源中重复实体的清理......................................................................................413.4.5数据不一致性问题的研究..........................................................................................433.5数据的存储和处理....................................