No Access

Metrics for Domain Shift Characterization: Comparisons and New Directions

Navya Nagananda

https://orcid.org/0000-0002-9963-5836

Rochester Institute of Technology, Rochester, NY 14623, USA

E-mail Address: nn3264@rit.edu

Search for more papers by this author

and

Andreas Savakis

https://orcid.org/0000-0002-9657-3027

Rochester Institute of Technology, Rochester, NY 14623, USA

E-mail Address: andreas.savakis@rit.edu

Corresponding author.

Search for more papers by this author

https://doi.org/10.1142/S0218213024500027Cited by:0 (Source: Crossref)

Abstract

Domain adaptation is an important area of research, as it aims to remedy the effects of the domain shift due to differences in the distribution between the source domain used for training and the target domain where prediction takes place. However, methods for characterizing the domain shift across datasets are lacking. In this work, we propose a domain shift metric called SpOT, which stands for spherical optimal transport, by operating on the spherical manifold. We realize our approach with a spherical network, used to obtain features, and an orthogonal projection loss, used to impose orthogonality in the feature space. The resulting spherical features have better inter-class separation and lower intra-class variation compared to features in Euclidean space. This type of feature clustering makes each domain representation more compact and more suitable for further analysis. The domain shift between the datasets is calculated using the optimal transport on the spherical features, which has a sound theoretical basis. Our results are further supported by experiments that show the correlation of SpOT with a new gain of transfer measure across domain adaptation datasets.

Keywords:

Remember to check out the Most Cited Articles!
Check out Notable Titles in Artificial Intelligence.

References

1. H. Elsahar and M. Gallé, To annotate or not? Predicting performance drop under domain shift, in Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. on Natural Language Processing, Hong Kong, China (EMNLP-IJCNLP) (Association for Computational Linguistics, 2019), pp. 2163–2173. Google Scholar
2. V. M. Patel, R. Gopalan, R. Li and R. Chellappa , Visual domain adaptation: A survey of recent advances, IEEE Signal Processing Magazine 32(3) (2015) 53–69. Crossref, Google Scholar
3. G. Wilson and D. J. Cook , A survey of unsupervised deep domain adaptation, ACM Transactions on Intelligent Systems and Technology 11 (2020) 1–46. Crossref, Web of Science, Google Scholar
4. N. Nagananda, A. M. N. Taufique, R. Madappa, C. S. Jahan, B. Minnehan, T. Rovito and A. Savakis , Benchmarking domain adaptation methods on aerial datasets, Sensors 21(23) (2021) 8070. Crossref, Web of Science, Google Scholar
5. S. Ben-David, J. Blitzer, K. Crammer and F. Pereira , Analysis of representations for domain adaptation, in Advances in Neural Information Processing Systems, Vol. 19 (MIT Press, 2007). Crossref, Google Scholar
6. C. Shui, Q. Chen, J. Wen, F. Zhou, C. Gagné and B. Wang, Beyond H-Divergence: Domain adaptation theory with Jensen–Shannon divergence, arXiv:2007.15567 abs/2007.15567 (2020). Google Scholar
7. D. Alvarez-Melis and N. Fusi , Geometric dataset distances via optimal transport, in Advances in Neural Information Processing Systems, Vol. 33 (Curran Associates, 2020). Google Scholar
8. M. Federici, R. Tomioka and P. Forré , An information-theoretic approach to distribution shifts, CoRR abs/2106.03783 (2021). Google Scholar
9. Y. Gao and P. Chaudhari, An information-geometric distance on the space of tasks, arXiv:2011.00613 (2021). Google Scholar
10. M. Pándy, A. Agostinelli, J. Uijlings, V. Ferrari and T. Mensink, Transferability estimation using Bhattacharyya class separability, arXiv:2111.12780 (2021). Google Scholar
11. S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira and J. Vaughan , A theory of learning from different domains, Machine Learning 79 (2010) 151–175. Crossref, Web of Science, Google Scholar
12. C. Guo, G. Pleiss, Y. Sun and K. Q. Weinberger, On calibration of modern neural networks, arXiv:1706.04599 (2017). Google Scholar
13. W. Fan and I. Davidson , Reverse testing: An efficient framework to select amongst classifiers under sample selection bias, in Proc. of the 12th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (Association for Computing Machinery, New York, 2006), pp. 147–156. Crossref, Google Scholar
14. E. Zhong, W. Fan, Q. Yang, O. Verscheure and J. Ren , Cross validation framework to choose amongst models and datasets for transfer learning, in Proc. of the 2010 European Conf. on Machine Learning and Knowledge Discovery in Databases: Part III (ECML PKDD’10) (Springer-Verlag, Berlin, 2010), pp. 547–562. Crossref, Google Scholar
15. C. Villani , Optimal Transport: Old and New (Springer, 2009). Crossref, Google Scholar
16. B. Muzellec and M. Cuturi , Generalizing point embeddings using the Wasserstein space of elliptical distributions, in Proc. of the 32nd Int. Conf. on Neural Information Processing Systems (NIPS’18) (Curran Associates, Red Hook, NY, 2018), pp. 10258–10269. Google Scholar
17. C. Frogner, F. Mirzazadeh and J. Solomon, Learning embeddings into entropic Wasserstein spaces, arXiv:1905.03329 (2019). Google Scholar
18. D. Alvarez-Melis and T. S. Jaakkola , Gromov-Wasserstein alignment of word embedding Spaces, in Proc. of the 2018 Conf. on Empirical Methods in Natural Language Processing, Brussels, Belgium (Association for Computational Linguistics, 2018), pp. 1881–1890. Crossref, Google Scholar
19. F. Mémoli , Distances between datasets, in Modern Approaches to Discrete Curvature (Springer, 2017), pp. 115–132. Crossref, Google Scholar
20. R. Ranjan, C. D. Castillo and R. Chellappa, $L_{2}$ -constrained softmax loss for discriminative face verification, arXiv:1703.09507 (2017). Google Scholar
21. W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj and L. Song , Sphereface: Deep hypersphere embedding for face recognition, in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (2017), Honolulu, HI, USA, pp. 212–220. Crossref, Google Scholar
22. J. Deng, J. Guo, N. Xue and S. Zafeiriou , Arcface: Additive angular margin loss for deep face recognition, in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (2019), Long Beach, CA, USA, pp. 4690–4699. Crossref, Google Scholar
23. H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li and W. Liu , Cosface: Large margin cosine loss for deep face recognition, in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (2018), Salt Lake City, Utah, USA, pp. 5265–5274. Crossref, Google Scholar
24. K. Saito, D. Kim, S. Sclaroff, T. Darrell and K. Saenko , Semi-supervised domain adaptation via minimax entropy, in Proc. of the IEEE/CVF Int. Conf. on Computer Vision (2019), Long Beach, CA, USA, pp. 8050–8058. Crossref, Google Scholar
25. K. Ranasinghe, M. Naseer, M. Hayat, S. Khan and F. S. Khan , Orthogonal projection loss, in Proc. of the IEEE/CVF Int. Conf. on Computer Vision (ICCV), October 2021, Virtual, pp. 12333–12343. Google Scholar
26. L. Kantorovich , On the translocation of masses, Journal of Mathematical Sciences 133 (2006) 1381–1382. Crossref, Google Scholar
27. M. Cuturi , Sinkhorn distances: Lightspeed computation of optimal transport, Advances in Neural Information Processing Systems 26 (2013) 2292–2300. Google Scholar
28. G. Peyré et al., Computational optimal transport: With applications to data science, Foundations and Trends® in Machine Learning 11(5–6) (2019) 355–607. Crossref, Google Scholar
29. J. Altschuler, J. Weed and P. Rigollet , Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration, in Advances in Neural Information Processing Systems 30: Annual Conf. on Neural Information Processing Systems, Long Beach, CA, December 4–9, 2017, pp. 1964–1974. Google Scholar
30. Y. LeCun, C. Cortes and C. J. Burges, MNIST handwritten digit database, 2010, Retrieved from http://yann.lecun.com/exdb/mnist/. Google Scholar
31. J. Hull , A database for handwritten text recognition research, IEEE Transactions on Pattern Analysis and Machine Intelligence 16(5) (1994) 550–554. Crossref, Web of Science, Google Scholar
32. T. Clanuwat, M. Bober-Irizar, A. Kitamoto, A. Lamb, K. Yamamoto and D. Ha, Deep learning for classical Japanese literature, arXiv preprint arXiv:1812.01718 (2018). Google Scholar
33. Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu and A. Y. Ng , Reading digits in natural images with unsupervised feature learning, in NeurIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011), p. 7. Google Scholar
34. Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand and V. Lempitsky , Domain-adversarial training of neural networks, The Journal of Machine Learning Research 17(1) (2016) 2096–2030. Google Scholar
35. K. He, X. Zhang, S. Ren and J. Sun , Deep residual learning for image recognition, in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (2016), Seattle, WA, USA, pp. 770–778. Crossref, Google Scholar
36. H. Robbins and S. Monro , A stochastic approximation method, The Annals of Mathematical Statistics 22(3) (1951) 400–407. Crossref, Web of Science, Google Scholar
37. K. Saenko, B. Kulis, M. Fritz and T. Darrell , Adapting visual category models to new domains, in European Conf. in Computer Vision, Vol. 6314 (Springer, 2010), pp. 213–226. Crossref, Google Scholar