Your browser does not support JavaScript!

Home    Set cover-based results caching for best match retrieval models  

Results - Details

Add to Basket
[Add to Basket]
Identifier 000356978
http://elocus.lib.uoc.gr
Title Set cover-based results caching for best match retrieval models
Alternative Title Προσωρινή αποθήκευση (catching) αποτελεσμάτων για μοντέλα ανάκτησης βέλτιστου ταιριάσματος βασισμένη στην κάλυψη
Author Παπαδάκης, Μύρων Εμμ.
Thesis advisor Τζίτζικας, Ιωάννης
Abstract Modern Web Search Engines employ various techniques in order to reduce response times, processing load and hardware requirements. Query evaluation in WSEs requires processing (retrieving and ranking) a large amount of data of its underlying index, especially when query terms have long posting lists. One of the most effective techniques for enhancing the performance of a WSE is the use of caching, which can be divided into two main categories: Results Caching (for short RC) and Posting List Caching (for short, PLC). Results caching allows answering queries that have been previously submitted at almost no cost, without accessing the main index of the WSE, since future requests for the same queries are served immediately by the cache. On the other hand, posting lists caching achieves higher cache hit ratios but it does not avoid the query evaluation costs. In this thesis, we propose and experimentally evaluate a novel results caching technique, called SCRC (Set Cover Results Cache), which bridges the gap between results caching and posting lists caching: identical queries are captured as in plain results caching while combinations of cached sub-queries are exploited as in posting lists caching and therefore SCRC can be considered as a generalization of posting lists caching. We reduce the problem of finding the appropriate cached sub-queries, to the exact set cover problem, and we adopt a greedy algorithm for solving it approximately. The correctness of this query evaluation approach is guaranteed if the scoring function of the retrieval model is additively decomposable and we prove that several best-match retrieval models which are traditionally used in Information Retrieval and WSEs (e.g VSM, Okapi BM25 as well as hybrid retrieval models) rely on such scoring functions. To estimate the practical impact of our approach, we analyzed streams of queries submitted to real world WSEs (Excite, AllTheWeb, Altavista) with metrics that we defined and our findings indicate that approximately 35% of these queries can be formulated as a disjoint union of the rest submitted queries. Subsequently, we proposed and evaluated several cache filling strategies for maximizing the speed up obtained by the SCRC. Furthermore and since most users examine only the first pages of results, we proposed a variation of SCRC called Top-K SCRC, which stores only the top-K results of each cached query, and we defined metrics for characterizing the quality of the composed top-K answer. The above techniques were comparatively evaluated over the Mitos WSE and over two versions of its index: one based on an index represented in an Object-Relational DBMS and another based on an Inverted File. The results over these indices showed that in best-match retrieval models the SCRC achieves a speedup which is two times higher than the one obtained by the RC and three times higher than the one obtained by the PLC.
Language English
Issue date 2010-02-01
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Views 495

Digital Documents
No preview available

Download document
View document
Views : 16