By Gerard Salton
Provides a conception of indexing in a position to score index phrases, or topic identifiers in lowering order of significance. This ends up in the alternative of fine record representations, and in addition money owed for the position of words and of glossary periods within the indexing procedure.
This examine is regular of theoretical paintings in computerized info association and retrieval, in that options are used from arithmetic, computing device technology, and linguistics. a whole idea of info retrieval may well emerge from a suitable blend of those 3 disciplines.
Read or Download A Theory of Indexing PDF
Best probability books
Put up 12 months notice: First released January 1st 1988
While so much mathematical examples illustrate the reality of a press release, counterexamples display a statement's falsity. relaxing themes of research, counterexamples are worthwhile instruments for instructing and studying. The definitive publication at the topic with reference to likelihood, this 3rd version positive aspects the author's revisions and corrections plus a considerable new appendix.
A one-year direction in likelihood conception and the speculation of random strategies, taught at Princeton college to undergraduate and graduate scholars, types the middle of this booklet. It presents a entire and self-contained exposition of classical likelihood conception and the speculation of random approaches.
- Introduction to Probability
- Asymptotics: particles, processes and inverse problems. Festschrift for Piet Groeneboom
- Vieweg Studium, Nr.59, Einführung in die Wahrscheinlichkeitstheorie und Statistik
- Les probabilités associées a un système d’evénéments compatibles et dépendants - Première partie: Événements en nombre fini fixe
- Understanding Uncertainty (Revised Edition)
- Probability, Markov Chains, Queues, and Simulation: The Mathematical Basis of Performance Modeling
Additional resources for A Theory of Indexing
Consider first the terms in inverse document frequency (\/B or IDF) order, characterized by the frequency distributions of Table 3. The best terms are those with total frequency Fk = Bk = 1. While these terms exhibit low ranks, they are unlikely to provide optimal retrieval results because of their excessively low occurrence frequencies. Indeed, the virtue of the IDF significance measure for retrieval purposes appears to stem from its use as a combined weighting system with the standard term frequency values.
It is clear from the results of Table 16 that the information value process does not lead to satisfactory output; in each case, the frequency-based weighting process is considerably superior. A final answer concerning the merits of the information values must await a larger test in a more realistic user environment. 6. A theory of indexing. A. The construction of effective indexing vocabularies. The material presented up to now does not immediately lead to the generation of optimal indexing strategies valid in all environments.
Before some of the remaining terms may be chosen for content identification. The number of common function words included in a standard stop list may range from 50 to about 200, depending on the system in use. Since the significance measures described previously can be used to assign to each term a value reflecting its importance for content analysis purposes, one may inquire whether savings are possible by reducing the indexing vocabulary to some optimum size. In particular, following the elimination of the common words included on the stop list, the remaining terms might be arranged in decreasing order of their term weights—for example, in decreasing discrimination order—and terms whose value falls below some given threshold might be eliminated.
A Theory of Indexing by Gerard Salton