A Theory of Indexing - download pdf or read online

By Gerard Salton

ISBN-10: 0898710154

ISBN-13: 9780898710151

Provides a conception of indexing in a position to score index phrases, or topic identifiers in lowering order of significance. This ends up in the alternative of fine record representations, and in addition money owed for the position of words and of glossary periods within the indexing procedure.

This examine is regular of theoretical paintings in computerized info association and retrieval, in that options are used from arithmetic, computing device technology, and linguistics. a whole idea of info retrieval may well emerge from a suitable blend of those 3 disciplines.

Show description

Read or Download A Theory of Indexing PDF

Best probability books

New PDF release: Counterexamples in Probability (3rd Edition)

Put up 12 months notice: First released January 1st 1988
------------------------

While so much mathematical examples illustrate the reality of a press release, counterexamples display a statement's falsity. relaxing themes of research, counterexamples are worthwhile instruments for instructing and studying. The definitive publication at the topic with reference to likelihood, this 3rd version positive aspects the author's revisions and corrections plus a considerable new appendix.

New PDF release: Theory of Probability and Random Processes

A one-year direction in likelihood conception and the speculation of random strategies, taught at Princeton college to undergraduate and graduate scholars, types the middle of this booklet. It presents a entire and self-contained exposition of classical likelihood conception and the speculation of random approaches.

Additional resources for A Theory of Indexing

Sample text

Consider first the terms in inverse document frequency (\/B or IDF) order, characterized by the frequency distributions of Table 3. The best terms are those with total frequency Fk = Bk = 1. While these terms exhibit low ranks, they are unlikely to provide optimal retrieval results because of their excessively low occurrence frequencies. Indeed, the virtue of the IDF significance measure for retrieval purposes appears to stem from its use as a combined weighting system with the standard term frequency values.

It is clear from the results of Table 16 that the information value process does not lead to satisfactory output; in each case, the frequency-based weighting process is considerably superior. A final answer concerning the merits of the information values must await a larger test in a more realistic user environment. 6. A theory of indexing. A. The construction of effective indexing vocabularies. The material presented up to now does not immediately lead to the generation of optimal indexing strategies valid in all environments.

Before some of the remaining terms may be chosen for content identification. The number of common function words included in a standard stop list may range from 50 to about 200, depending on the system in use. Since the significance measures described previously can be used to assign to each term a value reflecting its importance for content analysis purposes, one may inquire whether savings are possible by reducing the indexing vocabulary to some optimum size. In particular, following the elimination of the common words included on the stop list, the remaining terms might be arranged in decreasing order of their term weights—for example, in decreasing discrimination order—and terms whose value falls below some given threshold might be eliminated.

Download PDF sample

A Theory of Indexing by Gerard Salton


by Jeff
4.2

Rated 4.76 of 5 – based on 49 votes