Tf*Idf and Random Walk For Term Candidate Selection On Automatic Subject Indexing

Nurseno Bayu Aji, Musta'inul Abdi, Ardon Rakhmadi


Subject indexing is the act of describing or classifying a document by index terms or other symbols in order to indicate what the document is about, to summarize its content or to increase its findability. The selection of term candidate on automatic subject indexing is very important, because it can influence the result of topic extraction on document. Recently on the automatic subject indexing especially in the term candidate selection only consider terms in the document collection. In contrast, indexer prefers to choose general term on manual subject indexing for selection of term candidate. In this paper, we proposed a new strategy for selecting term candidate on automatic subject indexing for extraction the main topic from the document. The proposed method uses a combination of Term Frequency Inverse Document Frequency (TF*IDF) and Random Walk on the structure of thesaurus. Experimental results show that the proposed method can select the terms candidate that relevant to the topic of the document with F-Measure of 0.24.


TF*IDF, Random Walk, Thesaurus, Automatic Subject Indexing, Term Candidate

Full Text:



J. P. Silvester, M. T. Genuardi, and P. H. Klingbiel, “Machine-Aided Indexing at NASA,” Inf. Process. Manag., vol. 30, no. 5, pp. 631–645, 1994.

F. Sebastiani, “Machine Learning in Automated Text Categorization,” vol. 34, no. 1, pp. 1–47, 2002.

N. Vleduts-Stokolov, “Concept recognition in an automatic text processing system for the life sciences.,” J. Am. Soc. Inf. Sci., vol. 38, no. 4, pp. 269–287, 1987.

S . M. Humphrey and N. E. Miller, “Knowledge-Based Indexing of the Medical Literature : The Indexing Aid Project,” J. Am. Soc. Inf. Sci., vol. 38, no. 3, pp. 184–196, 1987.

B. J. Field, “Towards Automatic Indexing: Automatic Assignment of Controlled-Language Indexing and Classification from Free Indexing,” J. Doc., vol. 31, no. 4, pp. 246–265, 1975.

C.-H. Leung and W.-K. Kan, “A Statistical Learning Approach to Automatic Indexing of Controlled Index Terms,” J. Am. Soc. Inf. Sci., vol. 48, no. 1, pp. 55–66, 1997.

C. Plaunt and B. A. Norgard, “An Association Based Method for Automatic Indexing with a Controlled Vocabulary,” J. Am. Soc. Inf. Sci., vol. 49, no. 10, pp. 888–902, 1997.

S. M. Humphrey, “Automatic Indexing of Documents from Journal Descriptors : A Preliminary Investigation,” J. Am. Soc. Inf. Sci., vol. 50, no. 8, pp. 661–674, 1999.

O. Medelyan and I. H. Witten, “Domain Independent Automatic Keyphrase Indexing with Small Training Sets,” J. Am. Soc. Inf. Sci., vol. 59, no. 7, pp. 1026–1040, 2008.

O. Medelyan, “Human-competitive automatic topic indexing,” The University of Waikato, 2009.

C. Willis and R. M. Losee, “A Random Walk on an Ontology : Using Thesaurus,” J. Am. Soc. Inf. Sci., vol. 64, no. 7, pp. 1330–1344, 2013.

K. F. H. Holle, A. Z. Arifin, and D. Purwitasari, “Preference Based Term Weighting For Arabic Fiqh Document Ranking,” J. Ilmu Komput. dan Inf., vol. 8, no. 1, pp. 45–52, 2015.

M. N. Saadah, R. W. Atmagi, D. S. Rahayu, and A. Z. Arifin, “Information Retrieval of Text Document with Weighting TF-IDF and LCS,” J. Ilmu Komput. dan Inf., vol. 6, no. 1, pp. 34–37, 2013.

W. Zhang, T. Yoshida, and X. Tang, “Expert Systems with Applications A comparative study of TF*IDF , LSI and multi-words for text classification,” Expert Syst. Appl., vol. 38, no. 3, pp. 2758–2765, 2011.

S. Hassan, R. Mihalcea, and C. Banea, “Random walk term weighting for improved text classification,” Int. J. Semant. Comput., vol. 1, no. 4, pp. 421– 439, 2007.



  • There are currently no refbacks.

ISSN: 2541-6340
Online ISSN: 2541-6359


View My Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.