搜索引擎网络爬虫设计与实现摘要网络中的资源非常丰富,但是如何有效的搜索信息却是一件困难的事情。建立搜索引擎就是解决这个问题的最好方法。本文首先详细介绍了基于英特网的搜索引擎的系统结构,然后具体阐述了如何设计并实现搜索引擎的搜索器——网络爬虫。多线程网络爬虫程序是从指定的Web页面中按照宽度优先算法进行解析、搜索,并把搜索到的每条URL进行抓取、保存并且以URL为新的入口在互联网上进行不断的爬行的自动执行后台程序。网络爬虫主要应用socket套接字技术、正则表达式、HTTP协议、windows网络编程技术等相关技术,以C++语言作为实现语言,并在VC6.0下调试通过。在网络爬虫的设计与实现的章节中除了详细的阐述技术核心外还结合了多线程网络爬虫的实现代码来说明,易于理解。本网络爬虫是一个能够在后台运行的以配置文件来作为初始URL,以宽度优先算法向下爬行,保存目标URL的网络程序,能够执行普通用户网络搜索任务。关键词搜索引擎;网络爬虫;URL搜索器;多线程-I-DesignandRealizationofSearchEngineNetworkSpiderAbstractTheresourceofnetworkisveryrich,buthowtosearchtheeffectiveinformationisadifficulttask.Theestablishmentofasearchengineisthebestwaytosolvethisproblem.Thispaperfirstintroducestheinternet-basedsearchenginestructure,andthenillustrateshowtoimplementsearchengine----networkspiders.Themulti-threadnetworkspiderprocedureisfromtheWebpagewhichassignsaccordingtothewidthpriorityalgorithmconnectionforanalysisandsearch,andeachURLissnatchedandpreserved,andmaketheresultURLasthenewsourceentranceunceasingcrawlingoninternettocarryoutthebackgoudautomatically.Mypaperofnetworkspidermainlyappliestothesockettechnology,theregularexpression,theHTTPagreement,thewindowsnetworkprogrammingtechnologyandothercorrelationtechnique,andtakingC++languageasimplementedlanguage,andpassesunderVC6.0debugging.Inthechapterofthespiderdesignandimplementation,besidesadetailedexpositionofthecoretechnologyinconjunctionwiththemulti-threadednetworkspidertoillustratetherealizationofthecode,itiseasytounderstand.ThisnetworkspidersisinitialURLbasedonconfigurationfileswhichcanoperateonbackground,usingwidthpriorityalgorithmtocrawldown,preservingnetworkprogrammeoftargetURL.KeywordsInternetsearchengine;Networkspider;URLsearchprogramme;Multithreaded-II-目录摘要.......................................................................................................................IAbstract................................................................................................................II第1章绪论.........................................................................................................11.1课题背景....................................................................................................11.2搜索引擎的历史和分类............................................................................21.2.1搜索引擎的历史.................................................................................21.2.2搜索引擎的分类.................................................................................21.3搜索引擎的发展趋势................................................................................31.4搜索引擎的组成部分................................................................................41.5课题研究的主要内容................................................................................4第2章网络爬虫的技术要点分析......................................................................62.1网络爬虫Spider工作原理......................................