CLC number: TP39
On-line Access: 2023-07-03
Received: 2022-07-22
Revision Accepted: 2023-01-06
Crosschecked: 2023-07-03
Cited: 0
Clicked: 968
Citations: Bibtex RefMan EndNote GB/T7714
https://orcid.org/0000-0002-0407-1522
https://orcid.org/0000-0003-4940-2812
Jingfa LIU, Zhen WANG, Guo ZHONG, Zhihe YANG. A new focused crawler using an improved tabu search algorithm incorporating ontology and host information[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2200315 @article{title="A new focused crawler using an improved tabu search algorithm incorporating ontology and host information", %0 Journal Article TY - JOUR
一种新的融合本体和主机信息的改进禁忌搜索算法的主题爬虫方法1广东外语外贸大学信息科学与技术学院,中国广州市,510006 2中国联通中南研究院,中国长沙市,410000 摘要:为解决传统主题爬虫方法存在的主题描述不完整和重复爬取已访问链接的问题,本文提出一种新的融合本体和主机信息的改进禁忌搜索算法的主题爬虫方法(FCITS_OH)。该方法基于形式概念分析(FCA)构建领域本体,在语义和知识层面描述主题。为避免重复爬取已访问的链接和扩大搜索范围,提出一种改进的禁忌搜索(ITS)算法和记忆主机信息的策略。此外,为改进未访问链接的主题相关性的评估方法,提出一种基于Web文本和链接结构的综合优先度评估方法。以旅游和暴雨灾害为主题的实验结果表明,对于不同的性能指标,所提出的爬虫方法优于文献中其它主题爬虫策略。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Asano Y, Tezuka Y, Nishizeki T, 2007. Improvements of HITS algorithms for spam links. Proc 9th Asia-Pacific Web Conf and 8th Int Conf on Web-Age Information Management, p.479-490. [2]Chakrabarti S, van den Berg M, Dom B, 1999. Focused crawling: a new approach to topic-specific Web resource discovery. Comput Netw, 31(11-16):1623-1640. [3]de Bra P, Houben GJ, Kornatzky Y, et al., 1994. Information retrieval in distributed hypertexts. Proc RIAO: Intelligent Multimedia Information Retrieval Systems and Management, p.481-491. [4]Deng SQ, 2020. Research on the focused crawler of mineral intelligence service based on semantic similarity. J Phys Conf Ser, 1575:012142. [5]Derrac J, García S, Molina D, et al., 2011. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput, 1(1):3-18. [6]Du YJ, Hai YF, Xie CZ, et al., 2014. An approach for selecting seed URLs of focused crawler based on user-interest ontology. Appl Soft Comput, 14:663-676. [7]Farag MMG, Lee S, Fox EA, 2018. Focused crawler for events. Int J Dig Libr, 19(1):3-19. [8]Fei CJ, Liu BS, 2018. Focused crawler based on LDA extended topic terms. Comput Appl Softw, 35(4):49-54(in Chinese). [9]Guan WG, Luo YC, 2016. Design and implementation of focused crawler based on concept context graph. Comput Eng Des, 37(10):2679-2684(in Chinese). [10]He S, Cheng JX, Cai XB, 2009. Focused crawler based on simulated anneal algorithm. Comput Technol Dev, 19(12):55-58, 62(in Chinese). [11]Hosseinkhani J, Taherdoost H, Keikhaee S, 2021. ANTON framework based on semantic focused crawler to support Web crime mining using SVM. Ann Data Sci, 8(2):227-240. [12]Jiang QC, Zhang Y, 2007. SiteRank-based crawling ordering strategy for search engines. Proc 7th IEEE Int Conf on Computer and Information Technology, p.259-263. [13]Khan MA, Sharma DK, 2016. Self-adaptive ontology-based focused crawling: a literature survey. Proc 5th Int Conf on Reliability, Infocom Technologies and Optimization (Trends and Future Directions), p.595-601. [14]Lakzaei B, Shmasfard M, 2021. Ontology learning from relational databases. Inform Sci, 577:280-297. [15]Li L, Zhang GY, Li ZW, 2015. Research on focused crawling technology based on SVM. Comput Sci, 42(2):118-122(in Chinese). [16]Liu JF, Li F, Jiang SY, 2019. Focused annealing crawler algorithm for rainstorm disasters based on comprehensive priority and host information. Comput Sci, 46(2):215-222(in Chinese). [17]Liu JF, Wang DW, Yan XM, 2021. Tabu search algorithm for dynamic facility layout problem. J Huazhong Univ Sci Technol (Nat Sci Ed), 49(2):44-50(in Chinese). [18]Liu JF, Dong Y, Liu ZX, et al., 2022a. Applying ontology learning and multi-objective ant colony optimization method for focused crawling to meteorological disasters domain knowledge. Expert Syst Appl, 198:116741. [19]Liu JF, Li X, Zhang QS, et al., 2022b. A novel focused crawler combining Web space evolution and domain ontology. Knowl-Based Syst, 243:108495. [20]Liu WJ, Du YJ, 2014. A novel focused crawler based on cell-like membrane computing optimization algorithm. Neurocomputing, 123:266-280. [21]Ma LL, Li HW, Lian SW, et al., 2016. A strategy of disaster focused crawler based on ontology semantics. Comput Eng, 42(11):50-56(in Chinese). [22]Prakash J, Kumar R, 2015. Web crawling through shark-search using PageRank. Proc Comput Sci, 48:210-216. [23]Rani M, Dhar AK, Vyas OP, 2017. Semi-automatic terminology ontology learning based on topic modeling. Eng Appl Artif Intell, 63:108-125. [24]Rawat S, Patil DR, 2013. Efficient focused crawling based on best first search. Proc 3rd IEEE Int Advance Computing Conf, p.908-911. [25]Tong YL, 2008. Application of focused crawler using adaptive dynamical evolutional particle swarm optimization. Geomat Inform Sci Wuhan Univ, 33(12):1296-1299(in Chinese). [26]Wang ZG, Meng BJ, 2014. A comparison of approaches to Chinese word segmentation in Hadoop. Proc IEEE Int Conf on Data Mining Workshop, p.844-850. [27]Wu TY, 2018. Research on information retrieval technology based on Word2vec+BM25. Electron World, 2018(22):135-136. [28]Wu YL, Zhao SL, Li CJ, et al., 2017. Text classification method based on TF-IDF and cosine similarity. J Chin Inform Process, 31(5):138-145(in Chinese). [29]Xiao JJ, Chen ZY, 2018. Focused crawling based on grey wolf algorithms. Comput Sci, 45(11A):146-148, 166(in Chinese). [30]Yan W, Pan L, 2018. Designing focused crawler based on improved genetic algorithm. Proc 10th Int Conf on Advanced Computational Intelligence, p.319-323. [31]Yu J, Liu G, 2015. Survey on topic-focused crawlers. Comput Eng Sci, 37(2):231-237(in Chinese). [32]Yuan ZQ, Zhang WH, Fu HJ, et al., 2017. A PageRank-improved ranking algorithm based on cheating similarity and cheating relevance. Proc IEEE/ACIS 16th Int Conf on Computer and Information Science, p.257-263. [33]Zhu G, Yang JY, Wu XH, et al., 2017. Research on construction of hierarchy relationship and ontology of meteorological disaster based on FCA. Mod Inform, 37(5):79-88(in Chinese). Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>