Báo cáo hóa học: "Research Article Profile-Based Focused Crawling for Social Media-Sharing Websites"

Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Research Article Profile-Based Focused Crawling for Social Media-Sharing Websites | Hindawi Publishing Corporation EURASIP Journal on Image and Video Processing Volume 2009 Article ID 856037 13 pages doi 2009 856037 Research Article Profile-Based Focused Crawling for Social Media-Sharing Websites Zhiyong Zhang and Olfa Nasraoui Department of Computer Engineering and Computer Sciences University of Louisville Louisville KY 40292 USA Correspondence should be addressed to Olfa Nasraoui Received 31 May 2008 Accepted 6 January 2009 Recommended by Timothy Shih We present a novel profile-based focused crawling system for dealing with the increasingly popular social media-sharing websites. In this system we treat the user profiles as ranking criteria for guiding the crawling process. Furthermore we divide a user s profile into two parts an internal part which comes from the user s own contribution and an external part which comes from the user s social contacts. In order to expand the crawling topic a cotagging topic-discovery scheme was adopted for social media-sharing websites. In order to efficiently and effectively extract data for the focused crawling a path string-based page classification method is first developed for identifying list pages detail pages and profile pages. The identification of the correct type of page is essential for our crawling since we want to distinguish between list profile and detail pages in order to extract the correct information from each type of page and subsequently estimate a reasonable ranking for each link that is encountered while crawling. Our experiments prove the robustness of our profile-based focused crawler as well as a significant improvement in harvest ratio compared to breadth-first and online page importance computation OPIC crawlers when crawling the Flickr website for two different topics. Copyright 2009 Z. Zhang and O. Nasraoui. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use distribution and .

Không thể tạo bản xem trước, hãy bấm tải xuống
TÀI LIỆU LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
187    24    1    25-11-2024
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.