Using Ontology-Based Approach to Improved Information Retrieval Semantically for Historical Domain

Fatihah Ramli, Shahrul Azman Mohd Noah, Tri Basuki Kurniawan

Abstract


Searching and retrieving documents from large historical archives prove to be challenging for the information retrieval (IR) field as historians typically employ their knowledge, experience, and intuition. There are several works done on the application of IR in historical documents. As such, the conventional IR model is mostly used a simple Bag-of-Word (BOW) approach and usually unable to support precise document retrieval for the domain of history. We proposed an ontology-based approach to semantically index and ranked rich historical documents. The historical documents relating to the Vietnam War were chosen for this study. Several existing ontologies have been reviewed to identify the most suitable concepts and properties which contain rich information pertaining to relevant entities such as an event, time, and people. The domain ontology was developed by utilizing the existing Simple News and Press (SNaP) ontology and extended with concepts related to the Vietnam War. The ontology was then semantically mapped with concepts found in a collection of 133 documents relating to the Vietnam war. In this paper, we also proposed a simple ontology-based weighting mechanism derived from the classic tf-idf scoring scheme. Finally, 20 SPARQL queries are implemented to do the evaluation. The evaluation shows that the proposed ontological-based approach achieved better results as compared to the base-line BM-25 probabilistic retrieval model in terms of precision and recall metrics. The use of the ontology-based approach in document retrieval can compete with the keyword-based approach.


Keywords


ontology; information retrieval; semantic search; historical documents; bag-of-word.

Full Text:

PDF

References


T. Elena, A. Katifori, C. Vassilakis, G. Lepouras, and C. Halatsis, “Historical research in archives: user methodology and supporting tools,†International Journal on Digital Libraries, vol. 11, no. 1, pp. 25–36, 2010.

A. Gotscharek, A. Neumann, U. Reffle, C. Ringlstetter, and K. U. Schulz, “Enabling information retrieval on historical document collections,†Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data - AND 09, 2009.

M. J. A. Cabo and R. B. Llavori, “A retrieval language for historical documents,†Lecture Notes in Computer Science Database and Expert Systems Applications, pp. 216–225, 1998.

V. Mirzaee, L. Iverson, and B. Hamidzadeh. Towards ontological modelling of historical documents. in The 16th International Conference on Software Engineering and Knowledge Engineering (SEKE). 2004.

W. Frakes, Introduction to information storage and retrieval systems. Space, 1992. 14: p. 10.

S. Shekarpour, F. Alshargi, K. Thirunaravan, V. L. Shalin, and A. Sheth, “CEVO: comprehensive event ontology enhancing cognitive annotation on relations,†in 2019 IEEE 13th International Conference on Semantic Computing (ICSC), 2019, pp. 385–391.

I. Corda, " Ontology-based representation and reasoning about the history of science ," M. Eng. thesis, The University of Leeds, 2007

D. Demner-Fushman, S. Abhyankar, A. Jimeno-Yepes. A Knowledge-Based Approach to Medical Records Retrieval. in TREC. 2011.

S. Schockaert., M. Cock, and E. Kerre, Reasoning about fuzzy temporal information from the web: towards retrieval of historical events. Soft Computing, 2010. 14(8): p. 869-886.

O. Alonso, M. Gertz, and R. Baeza-Yates, On the value of temporal information in information retrieval. SIGIR Forum, 2007. 41(2): p. 35-41.

R. Campos, G. Dias, A. M. Jorge, A. Jatowt. Survey of temporal information retrieval and related applications. ACM Computing Surveys (CSUR), 2015. 47(2): p. 15.

H. P. Blossfeld, G. Rohwer, and T. Schneider, Event history analysis with Stata, 2019: Routledge.

G. Adomi, M. Maratea, L. Pandolfo, L. Pulina. An ontology for historical research documents. in International Conference on Web Reasoning and Rule Systems. 2015. Springer.

E. Hyvönen., O. Alm, and H. Kuittinen. Using an ontology of historical events in semantic portals for cultural heritage. in Proceedings of the Cultural Heritage on the Semantic Web Workshop at the 6th International Semantic Web Conference (ISWC 2007). 2007.

D. Calvanese, A. Mosca, J. Remesal, M. Rezk, and G. Rull, “A ‘historical case’ of Ontology-Based Data Access,†in 2015 Digital Heritage, 2015, pp. 291–298.

N. Ide and D. Woolner, “Historical Ontologies,†in Words and intelligence II, K. Ahmad, C. Brewster, and M. Stevenson, Eds. Dordrecht: Springer Netherlands, 2007, pp. 137–152.

J. M. Vieira and A. Ciula. Implementing an RDF/OWL Ontology on Henry the III Fine Rolls. in OWLED. 2007. Citeseer.

O. Signore. Ontology driven access to Museum Information. in Annual Conference of CIDOC Documentation and Users CIDOC. 2005.

C. d’Amato, S. Staab, A. G. B. Tettamanzi, T. D. Minh, and F. Gandon, “Ontology enrichment by discovering multi-relational association rules from ontological knowledge bases,†in Proceedings of the 31st Annual ACM Symposium on Applied Computing - SAC ’16, New York, New York, USA, 2016, pp. 333–338.

F. Ramli and S. A. Mohd Noah, “Building an event ontology for historical domain to support semantic document retrieval,†Int. J. Adv. Sci. Eng. Inf. Technol., vol. 6, no. 6, p. 1154, Dec. 2016.

Gopnik, M., Linguistic structures in scientific texts. Vol. 129. 2018: Walter de Gruyter GmbH & Co KG.

J. Pérez-Iglesias, J. R. Perez-Aguera, V. Fresno, Integrating the probabilistic models BM25/BM25F into Lucene. arXiv preprint arXiv:0911.5046, 2009.

P. Bafna, D. Pramod, and A. Vaidya, “Document clustering: TF-IDF approach,†2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), 2016.

G. L. Zúñiga, “Ontology: its transformation from philosophy to information systems,†Proceedings of the international conference on Formal Ontology in Information Systems - FOIS 01, 2001.

F. Jian, J. X. Huang, J. Zhao, T. He, and P. Hu, “A Simple Enhancement for Ad-hoc Information Retrieval via Topic Modelling,†Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval - SIGIR 16, 2016.




DOI: http://dx.doi.org/10.18517/ijaseit.10.3.10180

Refbacks

  • There are currently no refbacks.



Published by INSIGHT - Indonesian Society for Knowledge and Human Development