No Access

LIM Laboratory, Computer Science Department, Mohamed Cherif Messaadia University, Souk Ahras, Algeria

E-mail Address: s.amrouche@univ-soukahras.dz

Corresponding author.

Search for more papers by this author

Ryma Guefrouchi

https://orcid.org/0000-0002-1756-1256

MISC Laboratory, Abdelhamid Mehri Constantine2 University, Constantine, Algeria

E-mail Address: ryma.guefrouchi@univ-constantine2.dz

Search for more papers by this author

Nawel Zemmal

https://orcid.org/0000-0003-2827-4946

Computer Science Department, Mohamed Cherif Messaadia University, Souk Ahras, Algeria

LabGED Laboratory, Computer Science Department, Badji Mokhtar University, Annaba, Algeria

E-mail Address: n.zemmal@univ-soukahras.dz

Search for more papers by this author

, and

Sadok Ben Yahia

https://orcid.org/0000-0001-8939-8948

The Maersk Mc-Kinney Moller Institute, Centre for Industrial Software (CIS), University of Southern Denmark, Denmark

E-mail Address: say@mmmi.sdu.dk

Search for more papers by this author

https://doi.org/10.1142/S0218194025500111Cited by:0 (Source: Crossref)

Abstract

The advancement of the semantic web and Linked Open Data (LOD) cloud has led to the creation and integration of various knowledge bases defined by ontologies. A significant challenge within the LOD paradigm is identifying resources that refer to the same real-world object to enable large-scale data integration and sharing. In this context, instance matching has emerged as a key solution, linking co-referent instances from heterogeneous data sources using owl:sameAs links. Traditional approaches focus on schema-level matching but often fail to address property-level heterogeneity. Moreover, given the large scale of instances, examining all possible instance pairs is impractical. This paper proposes a scalable and efficient instance-matching approach using MongoDb (Humongous database) and Lucene. MongoDb stores instances at any scale and Lucene uses inverted indexes to identify matching candidates. Experiments on the instance matching track from the Ontology Alignment Evaluation Initiative (OAEI’2022) show that our approach matches the F-measure score of RE-Miner, the top performer in OAEI’2020, while surpassing all other participants in OAEI’2020, 2021 and 2022. Additionally, it operates 17 times faster than RE-Miner, four times faster than Lily and 15 times faster than LogMap, the fastest in OAEI’2020, 2021 and 2022, respectively. Moreover, we evaluate our approach on other knowledge bases from OAEI’2010. Once again, our approach gets highly competitive resuts compared to state-of-the-art approaches.

Keywords:

References

1. C. Bizer, T. Heath and T. Berners-Lee, Linked data-the story so far, Int. J. Semantic Web Inf. Syst. 5(3) (2009) 1–22. Crossref, Web of Science, Google Scholar
2. A. Ferrara, A. Nikolo, J. Noessner and F. Scharffe, Evaluation of instance matching tools: The experience of OAEI, J. Web Semantics 21 (2013) 49–60. Crossref, Web of Science, Google Scholar
3. J. Li, Z. Wang, X. Zhang and J. Tang, Large scale instance matching via multiple indexes and candidate selection, Knowl. Based Syst. 50 (2013) 112–120. Crossref, Web of Science, Google Scholar
4. F. Suchanek, S. Abiteboul and P. Senellart, Paris: Probabilistic alignment of relations, instances, and schema, Proc. VLDB Endow. 5(3) (2011) 157–168. Crossref, Google Scholar
5. A. C. Ngonga Ngomo, M. A. Sherif, K. Georgala, M. M. Hassan, K. Dreßler, K. Lyko and T. Soru, LIMES: A framework for link discovery on the semantic web, KI-Künstliche Intell. 35 (2021) 413–423. Crossref, Web of Science, Google Scholar
6. J. Volz, C. Bizer, M. Gaedke and G. Kobilarov, Silk-a link discovery framework for the web of data, LDOW 538 (2009) 53. Google Scholar
7. C. Li, L. Jin and S. Mehrotra, Supporting efficient record linkage for large datasets using mapping techniques, World Wide Web 9(4) (2006) 557–584. Crossref, Google Scholar
8. W. Hu and C. Jia, Bootstrapping approach to entity linkage on the semantic web, J. Web Semantics 34 (2015) 1–12. Crossref, Web of Science, Google Scholar
9. A. Nassiri, N. Pernelle, F. Saïs and G. Quercini, RE-miner for data linking results for OAEI’2020, in Proc. 15th Int. Semantic Web Conf., 2020, pp. 211–215. Google Scholar
10. V. Christophides, V. Efthymiou, T. Palpanas, G. Papadakis and K. Stefanidis, An overview of end-to-end entity resolution for big data, ACM Comput. Surv. 53(6) (2020) 1–42. Crossref, Google Scholar
11. W. Hu, J. Chen and Y. Qu, Self-training approach for resolving object coreference on the semantic web, in Proc. 20th Int. Conf. World Wide Web, 2011, pp. 87–96. Crossref, Google Scholar
12. F. Saïs, N. Pernelle and M-C. Rousset, Combining a logical and a numerical method for data reconciliation, Data Semantics 12(12) (2009) 66–94. Crossref, Google Scholar
13. A. Elmagarmid, P. Ipeirotis and V. Verykios, Duplicate record detection: A survey, IEEE Trans. Knowl. Data Eng. 19(1) (2007) 1–16. Crossref, Web of Science, Google Scholar
14. J. Noessner, M. Niepert, C. Meilicke and H. Stuckenschmidt, Leveraging terminological structure for object reconciliation, in Proc. 7th Extended Semantic Web Conf., 2010, pp. 334–348. Crossref, Google Scholar
15. F. M. Suchanek, S. Abiteboul and P. Senellart, Paris: Probabilistic alignment of relations, instances, and schema, 2011, arXiv:1111.7164. Google Scholar
16. C. Shao, L. M. Hu, J. Z. Li, Z. C. Wang, T. Chung and J. B. Xia, Rimom-im: A novel iterative framework for instance matching, J. Comput. Sci. Technol. 31 (2016) 185–197. Crossref, Web of Science, Google Scholar
17. A. Ferrara, A. Nikolov and F. Scharffe, Data linking for the semantic web, Int. J. Semantic Web Inf. Syst. 7(3) (2011) 46–76. Crossref, Web of Science, Google Scholar
18. M. Nentwig, M. Hartung, A. Ngomo and E. Rahm, A survey of current link discovery frameworks, J. Semantic Web 8(3) (2017) 419–436. Crossref, Web of Science, Google Scholar
19. Y. Raimond, C. Sutton and M. Sandler, Automatic interlinking of music datasets on the semantic web, in Proc. 1st Workshop about Linked Data on the Web, 2008, p. 369. Google Scholar
20. J. Sleeman and T. Finin, Computing FOAF co-reference relations with rules and machine learning, in Proc. 3rd Int. Workshop on Social Data on the Web, 2010, pp. 25–30. Google Scholar
21. S. Bradshaw, E. Brazil and K. Chodorow, MongoDB: The Definitive Guide: Powerful and Scalable Data Storage (O’Reilly Media, 2019). Google Scholar
22. C. J. Van Rijsbergen, Information Retrieval, 2nd edn. (Butterworths, 1979). Google Scholar
23. A. Białecki, R. Muir, G. Ingersoll and L. Imagination, Apache Lucene 4, in SIGIR 2012 Workshop on Open Source Information Retrieval, 2012, pp. 17. Google Scholar
24. D. Faria, C. Pesquita, E. Santos, M. Palmonari, I. Cruz and F. Couto, The AgreementMakerLight ontology matching system, in Proc. Move to Meaningful Internet System, 2013, pp. 527–541. Crossref, Google Scholar
25. B. Lima, D. Faria, F. Couto, I. Cruz and C. Pesquita, Results for OAEI’2020 AML and AMLC, in Proc. 15th Int. Semantic Web Conf., 2020, pp. 154–160. Google Scholar
26. S. Zou, J. Liu, Z. Yang, Y. Hu and P. Wang, Lily results for OAEI’2021, in OM@ISWC, 2021, pp. 167–174. Google Scholar
27. E. Jimenez-Ruiz, LogMap family participation in the OAEI’2020, in Proc. 15th Int. Semantic Web Conf., 2020, pp. 201–203. Google Scholar
28. H. Zhu, X. Wang, Y. Jiang, H. Fan, B. Du and Q. Liu, FTRLIM: Distributed instance matching framework for large-scale knowledge graph fusion, Entropy 23(5) (2021) 602. Crossref, Web of Science, Google Scholar
29. X. Wang, Y. Jiang, H. Fan, H. Zhu and Q. Liu, FTRLIM results for OAEI’2020, in Proc. 15th Int. Semantic Web Conf., 2020, pp. 187–193. Google Scholar
30. D. Symeonidou, V. Armant, N. Pernelle and F. Saïs, Sakey: Scalable almost key discovery in RDF data, in Proc. 13th Int. Semantic Web Conf., 2014, pp. 33–49. Crossref, Google Scholar
31. B. G. H. Happi, G. F. Pelap, D. Symeonidou and P. Larmande, DLinker results for OAEI’2022, in OM@ISWC, 2022, pp. 166–173. Google Scholar
32. J. Zobel and A. Moffat, Inverted files for text search engines, ACM Comput. Surv. 38(2) (2006). Crossref, Google Scholar
33. S. Rong, X. Niu, E. W. Xiang, H. Wang, Q. Yang and Y. Yu, A machine learning approach for instance matching based on similarity metrics, in Proc. 11th Int. Semantic Web Conf., 2012, pp. 460–475. Crossref, Google Scholar
34. S. Amrouch and S. Mostefai, Ascendant hierarchical clustering for instance matching, in Proc. 22nd Int. Arab Conf. Information Technology, 2021, pp. 1–6. Crossref, Google Scholar
35. S. Amrouch and S. Mostefai, A schema-free instance matching algorithm based on virtual document similarity, Int. Arab J. Inf. Technol. 19(3A) (2022) 432–441. Web of Science, Google Scholar
36. M. Pour, A. Algergawy, R. Amini, D. Faria, I. Fundulaki, I. Harrow, S. Hertling, E. Jimenez-Ruiz, C. Jonquet, N. Karam, A. Khiat, A. Laadhar, P. Lambrix, H. Li, Y. Li, P. Hitzler, H. Paulheim, C. Pesquita, T. Saveta, P. Shvaiko, A. Splendiani, E. Thieblin, C. Trojahn, J. Vatascinova, B. Yaman, O. Zamazal and L. Zhou, Results of the ontology alignment evaluation initiative 2020, in Proc. 15th Int. Workshop on Ontology Matching, 2020, pp. 42–138. Google Scholar
37. Y. R. Jean-Mary, E. Shironoshita and M. R. Kabuka, Ontology matching with semantic verification, Web Semantics 7(3) (2009) 235–251. Crossref, Web of Science, Google Scholar
38. J. Li, J. Tang, Y. Li and Q. Luo, Rimom: A dynamic multistrategy ontology alignment framework, IEEE Trans. Knowl. Data Eng. 21(8) (2009) 1218–1232. Crossref, Web of Science, Google Scholar