Tf*Idf and Random Walk For Term Candidate Selection On Automatic Subject Indexing
DOI:
https://doi.org/10.32497/jaict.v6i1.2436Keywords:
TF*IDF, Random Walk, Thesaurus, Automatic Subject Indexing, Term CandidateAbstract
Subject indexing is the act of describing or classifying a document by index terms or other symbols in order to indicate what the document is about, to summarize its content or to increase its findability. The selection of term candidate on automatic subject indexing is very important, because it can influence the result of topic extraction on document. Recently on the automatic subject indexing especially in the term candidate selection only consider terms in the document collection. In contrast, indexer prefers to choose general term on manual subject indexing for selection of term candidate. In this paper, we proposed a new strategy for selecting term candidate on automatic subject indexing for extraction the main topic from the document. The proposed method uses a combination of Term Frequency Inverse Document Frequency (TF*IDF) and Random Walk on the structure of thesaurus. Experimental results show that the proposed method can select the terms candidate that relevant to the topic of the document with F-Measure of 0.24.
References
J. P. Silvester, M. T. Genuardi, and P. H. Klingbiel, “Machine-Aided Indexing at NASA,” Inf. Process. Manag., vol. 30, no. 5, pp. 631”“645, 1994.
F. Sebastiani, “Machine Learning in Automated Text Categorization,” vol. 34, no. 1, pp. 1”“47, 2002.
N. Vleduts-Stokolov, “Concept recognition in an automatic text processing system for the life sciences.,” J. Am. Soc. Inf. Sci., vol. 38, no. 4, pp. 269”“287, 1987.
S . M. Humphrey and N. E. Miller, “Knowledge-Based Indexing of the Medical Literature : The Indexing Aid Project,” J. Am. Soc. Inf. Sci., vol. 38, no. 3, pp. 184”“196, 1987.
B. J. Field, “Towards Automatic Indexing: Automatic Assignment of Controlled-Language Indexing and Classification from Free Indexing,” J. Doc., vol. 31, no. 4, pp. 246”“265, 1975.
C.-H. Leung and W.-K. Kan, “A Statistical Learning Approach to Automatic Indexing of Controlled Index Terms,” J. Am. Soc. Inf. Sci., vol. 48, no. 1, pp. 55”“66, 1997.
C. Plaunt and B. A. Norgard, “An Association Based Method for Automatic Indexing with a Controlled Vocabulary,” J. Am. Soc. Inf. Sci., vol. 49, no. 10, pp. 888”“902, 1997.
S. M. Humphrey, “Automatic Indexing of Documents from Journal Descriptors : A Preliminary Investigation,” J. Am. Soc. Inf. Sci., vol. 50, no. 8, pp. 661”“674, 1999.
O. Medelyan and I. H. Witten, “Domain Independent Automatic Keyphrase Indexing with Small Training Sets,” J. Am. Soc. Inf. Sci., vol. 59, no. 7, pp. 1026”“1040, 2008.
O. Medelyan, “Human-competitive automatic topic indexing,” The University of Waikato, 2009.
C. Willis and R. M. Losee, “A Random Walk on an Ontology : Using Thesaurus,” J. Am. Soc. Inf. Sci., vol. 64, no. 7, pp. 1330”“1344, 2013.
K. F. H. Holle, A. Z. Arifin, and D. Purwitasari, “Preference Based Term Weighting For Arabic Fiqh Document Ranking,” J. Ilmu Komput. dan Inf., vol. 8, no. 1, pp. 45”“52, 2015.
M. N. Saadah, R. W. Atmagi, D. S. Rahayu, and A. Z. Arifin, “Information Retrieval of Text Document with Weighting TF-IDF and LCS,” J. Ilmu Komput. dan Inf., vol. 6, no. 1, pp. 34”“37, 2013.
W. Zhang, T. Yoshida, and X. Tang, “Expert Systems with Applications A comparative study of TF*IDF , LSI and multi-words for text classification,” Expert Syst. Appl., vol. 38, no. 3, pp. 2758”“2765, 2011.
S. Hassan, R. Mihalcea, and C. Banea, “Random walk term weighting for improved text classification,” Int. J. Semant. Comput., vol. 1, no. 4, pp. 421”“ 439, 2007.
Downloads
Published
Issue
Section
License
Authors who publish with this journal agree to the following terms:Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).