By Gerard Salton
Offers a idea of indexing able to rating index phrases, or topic identifiers in reducing order of value. This results in the alternative of fine rfile representations, and likewise debts for the function of words and of word list sessions within the indexing approach.
This learn is average of theoretical paintings in automated info association and retrieval, in that innovations are used from arithmetic, computing device technological know-how, and linguistics. an entire concept of details retrieval may possibly emerge from a suitable mix of those 3 disciplines.
Read or Download A Theory of Indexing PDF
Similar probability books
This vintage textual content offers a rigorous creation to simple chance thought and statistical inference, with a different stability of concept and method. attention-grabbing, proper purposes use actual information from real reviews, exhibiting how the techniques and techniques can be utilized to unravel difficulties within the box.
Built from celebrated Harvard facts lectures, creation to chance presents crucial language and instruments for realizing records, randomness, and uncertainty. The booklet explores a wide selection of purposes and examples, starting from coincidences and paradoxes to Google PageRank and Markov chain Monte Carlo (MCMC).
This ebook starts with a old essay entitled "Will the sunlight upward thrust back? " and ends with a common tackle entitled "Mathematics and Applications". The articles conceal an attractive diversity of subject matters: combinatoric possibilities, classical restrict theorems, Markov chains and approaches, capability conception, Brownian movement, Schrödinger–Feynman difficulties, and so forth.
- Classic Problems of Probability
- An introduction to probability theory and its applications
- The Statistical Analysis of Recurrent Events
- Grundlagen der Wahrscheinlichkeitsrechnung und Statistik: Ein Skript für Studierende der Informatik, der Ingenieur- und Wirtschaftswissenschaften
- Probability: With Applications and R
Extra resources for A Theory of Indexing
For t terms, this produces (2K' + \)t additions and (K' + 2)t multiplications. The last term represents the increment over and above the simple frequency counts of expressions (4) and (5). 24 G. SALTON The signal-noise calculations are more expensive to perform than the EK values. Consider first the noise Nk (formula (6)); the requirements are K' additions for Fk, 2K' divisions, K' logarithms, K' multiplications, and K' additions to compute the final sum. In addition, the computation of the signal Sk (formula (7)) adds K' logarithms and 1 subtraction.
This may be ascertained by consulting column 1 of Table 10 which contains statistical significance test results for certain pairs of weighting methods. TABLE 10 Statistical significance output for the results of Table 9 A. Term freq. f\ A. Binary weights if A. Binary with IDF vs. vs. vs. B. Term freq. weights /* B. Term freq. with IDF B. Term freq. with IDF (/? 0000 Table 10 contains t-test and Wilcoxon signed rank test values, giving in each case the probability that the output results for the two test runs could have been generated from the same distribution of values.
The output of Table 12 shows that no unified policy appears to be derivable from the test results. Indeed, for the CRAN collection, the best policy consists in not deleting any terms at all, whereas the best results for MED and Time are obtained for deletions of terms with document frequencies Bk ^ 16 and Bk ;> 104, respectively, corresponding to the elimination of about ten percent of total term occurrences. Since such a relatively small deletion percentage does not lead to substantial losses in performance for any collection, and may in fact produce considerable improvements, the ten percent deletion percentage may be productive in all environments.
A Theory of Indexing by Gerard Salton