The digitalization era has brought digital documents with it, and the classification of document images has become an important need as in classical text documents. Document images, in which text documents are stored as images, contain both text and visual features, unlike images. Therefore, it is possible to use both text and visual features while classifying such data. Considering this situation, in this study, it is aimed to classify document images by using both text and visual features and to determine which feature type is more successful in classification. In the text-based approach, each document/class is labeled with the keywords associated with that document/class and the classification is realized according to whether the document contains the related key-words or not. For visual-based classification, we use four deep learning models namely CNN, NASNet-Large, InceptionV3, and EfficientNetB3. Experimental study is carried out on document images obtained from applicants of the Kocaeli University. As a result, it is seen ii that EfficientNetB3 is the most superior among all with 0.8987 F-score.

Keywords:

Remember to check out the Most Cited Articles!
Check out these titles in artificial intelligence!

References

1. S. E. Seker, Büyük Veri ve Büyük Veri Yaşam Döngüleri, YBS Ansiklopedi 2(3) (2015) 10–17. Google Scholar
2. T. K. Das and P. M. Kumar, Big data analytics: A framework for unstructured data analysis, Int. J. Eng. Sci. Technol. 5(1) (2013) 153. Google Scholar
3. M. Chen, S. Mao and Y. Liu, Big data: A survey, Mobile Netw. Appl. 19(2) (2014) 171–209. Crossref, Google Scholar
4. D. Doermann and K. Tombre, Handbook of Document Image Processing and Recognition (Springer Publishing Company Inc., 2014). Crossref, Google Scholar
5. K. Nemcová, Document Functional type classification, RASLAN, Tribun EU, 2018, pp. 95–100. Google Scholar
6. J. Kumar, P. Ye and D. Doermann, Structural similarity for document image classification and retrieval, Pattern Recognit. Lett. 43 (2014) 119–126. Crossref, Google Scholar
7. D. J. Ittner, D. D. Lewis and D. D. Ahn, Text categorization of low quality images, Symposium on Document Analysis and Information Retrieval, Citeseer (1995), pp. 301–315. Google Scholar
8. T. Nakayama, Content-oriented categorization of document images, COLING 1996 Volume 2: The 16th Int. Conf. Computational Linguistics, The COLING 2016 Organizing Committee (1996). Crossref, Google Scholar
9. L. Spitz and A. Maghbouleh, Text categorization using character shape codes, in Document Recognition and Retrieval VII, 3967, International Society for Optics and Photonics, (1999), pp. 174–181. Crossref, Google Scholar
10. L. Li and C. Lim Tan, Improving ocr text categorization accuracy with electronic ab-stracts, Second Int. Conf. Document Image Analysis for Libraries (DIAL’06). (IEEE, 2006), p. 6. Google Scholar
11. S. Şahin, S. Ardıç, T. Erdem, A. Gültekin, E. Ekinci, S. Eken, S. Ilhan Omurca and A. Sayar, Dijital Dokümanların Anahtar Kelime TabanlıDoğrulanması, in Proc. 6. Ulusal Yüksek BaşarımlıHesaplama Konferansı, Başarım (2020), pp. 1–6. Google Scholar
12. A. D. Bagdanov and M. Worring, Fine-grained document genre classification using first order random graphs, in Proc. Sixth Int. Conf. Document Analysis and Recognition (IEEE, 2001), pp. 79–83. Crossref, Google Scholar
13. L. Kang, J. Kumar, P. Ye, Y. Li and D. Doermann, Convolutional neural networks for document image classification, in 2014 22nd Int. Conf. Pattern Recognition (IEEE, 2014), pp. 3168–3172. Crossref, Google Scholar
14. A. W. Harley, A. Ufkes and K. G. Derpanis, Evaluation of deep convolutional nets for document image classification and retrieval, in 2015 13th Int. Conf. Document Analysis and Recognition (ICDAR) (IEEE, 2015), pp. 991–995. Crossref, Google Scholar
15. A. Kölsch, M. Z. Afzal, M. Ebbecke and M. Liwicki, Real-time document image clas- sification using deep CNN and extreme learning machines, in 2017 14th IAPR Int. Conf. Document Analysis and Recognition (ICDAR), Vol. 1, (IEEE, 2017), pp. 1318–1323. Google Scholar
16. V. Yadav and N. Ragot, Text extraction in document images: Highlight on using corner points, in 2016 12th IAPR Workshop on Document Analysis Systems (DAS), (IEEE, 2016), pp. 281–286. Crossref, Google Scholar
17. A. Das, S. Roy, U. Bhattacharya and S. K. Parui, Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks, in 2018 24th Int. Conf. Pattern Recognition (ICPR), (IEEE, 2018), pp. 3180–3185. Crossref, Google Scholar
18. I. V. Safonov, I. V. Kurilin, M. N. Rychagov and E. V. Tolstaya, Document image classification on the basis of layout information, in Document Image Processing for Scanning and Printing (Springer, 2019), pp. 123–143. Crossref, Google Scholar
19. S. Abuelwafa, M. Pedersoli and M. Cheriet, Unsupervised exemplar-based learning for improved document image classification, IEEE Access 7 (2019) 133738–133748. Crossref, Google Scholar
20. H. Goodrum, K. Roberts and E. V. Bernstam, Automatic classification of scanned electronic health record documents, Int. J. Med. Inf. 144 (2020) 104302. Crossref, Google Scholar
21. O. Augereau, N. Journet, A. Vialard and J. Domenger, Improving classification of an industrial document image database by combining visual and textual features, in 2014 11th IAPR Int. Workshop on Document Analysis Systems (IEEE, 2014), pp. 314–318. Crossref, Google Scholar
22. M. Rusinol, V. Frinken, D. Karatzas, A. D. Bagdanov and J. Lladós, Multimodal page classification in administrative document image streams, Int. J. Doc. Anal. Recognit. 17(4) (2014) 331–341. Crossref, Google Scholar
23. L. Noce, I. Gallo, A. Zamberletti and A. Calefati, Embedded textual content for document image classification with convolutional neural networks, in Proc. 2016 ACM Symp. Document Engineering, (ACM, 2016), pp. 165–173. Crossref, Google Scholar
24. M. Hassanpour and H. Malek, Document image classification using SqueezeNet con-volutional neural network, in 2019 5th Iran. Conf. Signal Processing and Intelligent Systems (ICSPIS) (IEEE, 2019), pp. 1–4. Google Scholar
25. R. Jain and C. Wigington, Multimodal document image classification, in 2019 Int. Conf. Document Analysis and Recognition (ICDAR) (IEEE, 2019), pp. 71–77. Crossref, Google Scholar
26. M. N. Asim, M. U. G. Khan, M. I. Malik, K. Razzaque, A. Dengel and S. Ahmed, Two stream deep network for document image classification, in 2019 Int. Conf. Document Analysis and Recognition (ICDAR) (IEEE, 2019), pp. 1410–1416. Crossref, Google Scholar
27. Y. Xu, M. Li, L. Cui, S. Huang, F. Wei and M. Zhou, Layoutlm: Pre-training of text and layout for document image understanding, in Proc. 26th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining (2020), pp. 1192–1200. Crossref, Google Scholar
28. S. Bakkali, Z. Ming, M. Coustaty and M. Rusiñol, Cross-modal deep networks for document image classification, in 2020 IEEE Int. Conf. Image Pro- cessing (ICIP) (IEEE, 2020), pp. 2556–2560. Crossref, Google Scholar
29. S. Bakkali, Z. Ming, M. Coustaty and M. Rusinol, Visual and textual deep feature fusion for document image classification, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops (IEEE, 2020), pp. 562–563. Crossref, Google Scholar
30. H. S. Hristov, Classifying digital documents reliably using machine learning, Master’s thesis (Utrecht University, 2020). Google Scholar
31. A. Anjewierden, AIDAS: Incremental logical structure discovery in PDF documents, in Proc. Sixth Int. Conf. Document Analysis and Recognition (IEEE, 2001), pp. 374–378. Crossref, Google Scholar
32. K. Hadjar, M. Rigamonti, D. Lalanne and R. Ingold, Xed: A new tool for extracting hidden structures from electronic documents, in Proc. First Int. Workshop Document Image Analysis for Libraries (IEEE, 2004), pp. 212–224. Crossref, Google Scholar
33. H. Chao and J. Fan, Layout and content extraction for pdf documents, Int. Workshop on Document Analysis Systems (Springer, 2004), pp. 213–224. Crossref, Google Scholar
34. T. Hassan, Object-level document analysis of PDF files, in Proc. 9th ACM Symp. Document Engineering (Munich, Germany, 2009), pp. 47–55. Crossref, Google Scholar
35. S. Eken, B. Atay, B. C. Sönmez and A. Sayar, DocDig: Dijitalleştirilmiş Dokümanlarda Içerik TabanlıFigür Arama, Düzce Üniversitesi Bilim ve Teknoloji Dergisi 6(1) (2018) 68–78. Crossref, Google Scholar
36. A. Kantarci, S. Eken and A. Sayar, Dijital Dokümanlar Üzerinde Otomatik Biçim Tanıma ve FarklıIçeriklere Uyarlama: Özgeçmişler Üzerinde Durum Çalışması, Avrupa Bilim ve Teknoloji Dergisi 17 (2019) 1313–1324. Google Scholar
37. R. Smith, An overview of the Tesseract OCR engine, in Ninth Int. Conf. Document Analysis and Recognition (ICDAR 2007), Vol. 2 (IEEE, 2007), pp. 629–633. Crossref, Google Scholar
38. L. Ma, K. Sun, K. Tu, L. Pan and W. Zhang, Identification of double-yolked duck egg using computer vision, PLoS One 12 (2017) e0190054. Crossref, Google Scholar
39. Y. Zhao, K. Xie, Z. Zou and J. He, Intelligent recognition of fatigue and sleepi-ness based on InceptionV3-LSTM via multi-feature fusion, IEEE Access 8 (2020) 144205–144217. Crossref, Google Scholar
40. Z. Xu, X. Guo, A. Zhu, X. He, X. Zhao, Y. Han and R. Subedi, Using deep convolutional neural networks for image-based diagnosis of nutrient deficiencies in rice, Comput. Intell. Neurosci. 2020 (2020) 7307252. Crossref, Google Scholar
41. G. Marques, D. Agarwal and I. T. Dıez, Automated medical diagnosis of COVID-19 through EfficientNet convolutional neural network, Appl. Soft Comput. 96 (2020) 106691. Crossref, Google Scholar
42. E. Ekinci, S. Ilhan Omurca and S. Sevim, Improve offensive language detection with ensemble classifiers, Int. J. Intell. Syst. Appl. Eng. 8(2) (2020) 109–115. Crossref, Google Scholar
43. M. O. Arowolo, M. Adebiyi and A. A. Ayodele, An efficient PCA Ensemble learning approach for prediction of RNA-Seq malaria vector gene expression data classification, Int. J. Eng. Res. Technol. 13 (2020) 163–169. Crossref, Google Scholar
44. X. Yin, D. Wu, Y. Shang, B. Jiang and H. Song, Using an EfficientNet-LSTM for the recognition of single cow’s motion behaviours in a complicated environment, Comput. Electron. Agric. 177 (2020) 105707. Crossref, Google Scholar