Building Compact Entity Embeddings Using Wikidata

Mohamed Lubani, Shahrul Azman Mohd Noah


Representing natural language sentences has always been a challenge in statistical language modelling. Atomic discrete representations of words make it difficult to represent semantically related sentences. Other sentence components such as phrases and named-entities should be recognized and given representations as units instead of individual words. Different entity senses should be assigned different representations regardless the fact that they share identical words. In this paper, we focus on building the vector representations (embeddings) of named-entities from their contexts to facilitate the task of ontology population where named-entities need to be recognized and disambiguated in natural language text. Given a list of target named-entities, Wikidata is used to compensate for the lack of a labelled corpus to build the contexts of all target named-entities as well as all their senses. Description text and semantic relations with other named-entities are considered when building the contexts from Wikidata. To avoid noisy and uninformative features in the embeddings generated from artificially built contexts, we propose a method to build compact entity representations to sharpen entity embeddings by removing irrelevant features and emphasizing the most descriptive ones. An extended version of the Continuous Bag-of-Words model (CBOW) is used to build the joint vector representations of words and named-entities using Wikidata contexts. Each entity context is then represented by a subset of elements that maximizes the chances of keeping the most descriptive features about the target entity. The final entity representations are built by compressing the embeddings of the chosen subset using a deep stacked autoencoders model. Cosine similarity and t-SNE visualisation technique are used to evaluate the final entity vectors. Results show that semantically related entities are clustered near each other in the vector space. Entities that appear in similar contexts are assigned similar compact vector representations based on their contexts.


Entity Embeddings; Entity Vector Representations; Named Entity Disambiguation.

Full Text:



T. Mikolov, K. Chen, G. Corrado and J. Dean, “Efficient estimation of word representations in vector space,†arXiv preprint arXiv:1301.3781, 2013.

M. A. Taiye, S. S. Kamaruddin and F. K. Ahmad, “Representing Semantics of Text by Acquiring its Canonical Form,†International Journal on Advanced Science, Engineering and Information Technology, vol. 7, no. 3, pp. 808-814, 2017.

S. A. M. Noah, N. Omar and A. Y. Amruddin, “Evaluation of lexical-based approaches to the semantic similarity of Malay sentences.,†Journal of Quantitative Linguistics, vol. 22, no. 2, pp. 135-156, 2015.

M. Mohd and O. M. A. Bashaddadh, “Investigating the Combination of Bag of Words and Named Entities Approach in Tracking and Detection Tasks among Journalists.,†Journal of Information Science Theory and Practice, vol. 2, no. 4, pp. 31-48, 2014.

N. I. Y. Saat and S. A. M. Noah, “Rule-based Approach for Automatic Ontology Population of Agriculture Domain,†Information Technology Journal, vol. 46, no. 51, pp. 46-51, 2016.

Y. I. A. M. Khalid and S. A. M. Noah, “Semantic text-based image retrieval with multi-modality ontology and DBpedia,†The Electronic Library, vol. 35, no. 6, pp. 1191-1214, 2017.

W. Ammar, G. Mulcaire, Y. Tsvetkov, G. Lample, C. Dyer and N. A. Smith, “Massively Multilingual Word Embeddings,†arXiv preprint arXiv:1602.01925, 2016.

R. E. Salah and L. Q. b. Zakaria, “Arabic Rule-Based Named Entity Recognition Systems: Progress and Challenges,†International Journal on Advanced Science, Engineering and Information Technology, vol. 7, no. 3, pp. 815-821, 2017.

Z. S. Harris, “Distributional structure,†Word, vol. 10, no. 2-3, pp. 146 - 162, 1954.

J. R. Firth, “A synopsis of linguistic theory 1930-55,†in Studies in Linguistic Analysis, Vols. 1952-59, The Philological Society, 1957, pp. 1-32.

M. Sahlgren, “The distributional hypothesis,†Italian Journal of Disability Studies, vol. 20, pp. 33-53, 2008.

G. Salton, The SMART Retrieval System—Experiments in Automatic Document Processing, NJ: Prentice-Hall, Inc. Upper Saddle River, 1971.

D. E. Rumelhart and J. L. McClelland, Psychological and Biological Models, MIT Press, 1986.

D. E. Rumelhart, G. E. Hinton and R. J. Williams, “Learning internal representations by error propagation,†in Parallel distributed processing: explorations in the microstructure of cognition, Cambridge, MA, MIT Press Cambridge, MA, 1986.

Y. Bengio, R. Ducharme, P. Vincent and C. Jauvin, “A Neural Probabilistic Language Model,†Journal of Machine Learning Research, vol. 3, pp. 1137-1155, 2003.

T. Mikolov, W.-t. Yih and G. Zweig, “Linguistic regularities in continuous space word representations,†in Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2013.

T. Mikolov, I. Sutskever, K. Chen, G. Corrado and J. Dean, “Distributed representations of words and phrases and their compositionality,†in Advances in neural information processing systems, 2013.

M. U. Gutmann and A. Hyvärinen, “Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics,†The Journal of Machine Learning Research, vol. 13, no. 1, pp. 307-361, 2012.

I. Yamada, H. Shindo, H. Takeda and Y. Takefuji, “Joint learning of the embedding of words and entities for named entity disambiguation,†arXiv preprint arXiv:1601.01343, 2016.

D. Milne and I. H. Witten, “An effective, low-cost measure of semantic relatedness obtained from Wikipedia links,†in In Proceedings of the First AAAI Workshop on Wikipedia and Artificial Intelligence (WIKIAI), 2008.

Z. Wang, J. Zhang, J. Feng and Z. Chen, “Knowledge graph and text jointly embedding,†in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014.

Y. Cao, L. Huang, H. Ji, X. Chen and J. Li, “Bridging Text and Knowledge by Learning Multi-Prototype Entity Mention Embedding,†in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017.

J. G. Moreno, R. Besancon, R. Beaumont, E. D'hondt, A.-L. Ligozat, S. Rosset, X. Tannier and B. Grau, “Combining word and entity embeddings for entity linking,†in European Semantic Web Conference, 2017.

Freebase, 17 December 2014. [Online]. Available:

D. H. Ballard, “Modular learning in neural networks,†in AAAI'87 Proceedings of the sixth National conference on Artificial intelligence, 1987.

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz and L, “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,†arXiv preprint arXiv:1603.04467, 2016.

L. v. d. Maaten and G. Hinton, “Visualizing data using t-SNE,†Journal of machine learning research, pp. 2579-2605, 2008.

Z. Ibrahim, S. A. M. Noah and M. M. Noor, “Knowledge acquisition from textual documents for the construction of medicinal herbs domain ontology,†Journal of Applied Science, vol. 9, no. 4, pp. 794-798, 2009.



  • There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development