World Scientific
  • Search
  •   
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.
Special Issue: Best Papers from SEKE 2020
Guest Editor: Shi-Kuo Chang
No Access

Plagiarism Detection of Multi-threaded Programs Using Frequent Behavioral Pattern Mining

    https://doi.org/10.1142/S0218194020400252Cited by:2 (Source: Crossref)

    Software dynamic birthmark techniques construct birthmarks using the captured execution traces from running the programs, which serve as one of the most promising methods for obfuscation-resilient software plagiarism detection. However, due to the perturbation caused by non-deterministic thread scheduling in multi-threaded programs, such dynamic approaches optimized for sequential programs may suffer from the randomness in multi-threaded program plagiarism detection. In this paper, we propose a new dynamic thread-aware birthmark FPBirth to facilitate multi-threaded program plagiarism detection. We first explore dynamic monitoring to capture multiple execution traces with respect to system calls for each multi-threaded program under a specified input, and then leverage the Apriori algorithm to mine frequent patterns to formulate our dynamic birthmark, which can not only depict the program’s behavioral semantics, but also resist the changes and perturbations over execution traces caused by the thread scheduling in multi-threaded programs. Using FPBirth, we design a multi-threaded program plagiarism detection system. The experimental results based on a public software plagiarism sample set demonstrate that the developed system integrating our proposed birthmark FPBirth copes better with multi-threaded plagiarism detection than alternative approaches. Compared against the dynamic birthmark System Call Short Sequence Birthmark (SCSSB), FPBirth achieves 12.4%, 4.1% and 7.9% performance improvements with respect to union of resilience and credibility (URC), F-Measure and matthews correlation coefficient (MCC) metric, respectively.

    References

    • 1. K. Chen, P. Liu and Y. Zhang, Achieving accuracy and scalability simultaneously in detecting application clones on android markets, in Proc. Int. Conf. Software Engineering, 2014, pp. 175–186. Google Scholar
    • 2. L. Luo, Y. Fu, D. Wu, S. Zhu and P. Liu, Repackage-proofing android apps, 46th Annual IEEE/IFIP Int. Conf. Dependable Systems and Networks, 2016, pp. 550–561. Google Scholar
    • 3. W. Zhou, Y. Zhou, X. Jiang and P. Ning, Detecting repackaged smartphone applications in third-party android marketplaces, in Proc. ACM Conf. Data and Application Security and Privacy, 2012, pp. 317–326. Google Scholar
    • 4. H. Tamada, M. Nakamura, A. Monden and K.-I. Matsumoto, Design and evaluation of birthmarks for detecting theft of Java programs, IASTED Conf. Software Engineering, 2004, pp. 569–574. Google Scholar
    • 5. G. Myles and C. Collberg, Detecting software theft via whole program path birthmarks, Int. Conf. Information Security, 2004, pp. 404–415. Google Scholar
    • 6. Z. Tian, T. Liu, Q.-h. Zheng, F. Tong, D. Wu, S. Zhu and K. Chen, Software plagiarism detection: A survey, J. Cyber Secur. 1(3) (2016) 52–76. Google Scholar
    • 7. Z. Tian, Q. Zheng, T. Liu, M. Fan, E. Zhuang and Z. Yang, Software plagiarism detection with birthmarks based on dynamic key instruction sequences, IEEE Trans. Softw. Eng. 41(12) (2015) 1217–1235. ISIGoogle Scholar
    • 8. F. Zhang, D. Wu, P. Liu and S. Zhu, Program logic based software plagiarism detection, IEEE Int. Symp. Software Reliability Engineering, 2014, pp. 66–77. Google Scholar
    • 9. Y.-C. Jhi, X. Jia, X. Wang, S. Zhu, P. Liu and D. Wu, Program characterization using runtime values and its application to software plagiarism detection, IEEE Trans. Softw. Eng. 41(9) (2015) 925–943. ISIGoogle Scholar
    • 10. J. Ming, D. Xu, Y. Jiang and D. Wu, Binsim: Trace-based semantic binary diffing via system call sliced segment equivalence checking, 26th USENIX Security Symp., 2017, pp. 253–270. Google Scholar
    • 11. Y.-C. Jhi, X. Wang, X. Jia, S. Zhu, P. Liu and D. Wu, Value-based program characterization and its application to software plagiarism detection, in Proc. Int. Conf. Software Engineering, 2011, pp. 756–765. Google Scholar
    • 12. Z. Tian, T. Liu, Q. Zheng, E. Zhuang, M. Fan and Z. Yang, Reviving sequential program birthmarking for multithreaded software plagiarism detection, IEEE Trans. Softw. Eng. 44(5) (2017) 491–511. ISIGoogle Scholar
    • 13. Z. Tian, T. Liu, Q. Zheng, M. Fan, E. Zhuang and Z. Yang, Exploiting thread-related system calls for plagiarism detection of multithreaded programs, J. Syst. Softw. 119 (2016) 136–148. ISIGoogle Scholar
    • 14. R. Agrawal and R. Srikant, Mining sequential patterns, in Proc. Int. Conf. Data Engineering, 1995, pp. 3–14. Google Scholar
    • 15. Z. Tian, Q. Zheng, T. Liu, M. Fan, X. Zhang and Z. Yang, Plagiarism detection for multithreaded software based on thread-aware software birthmarks, in Proc. Int. Conf. Program Comprehension, 2014, pp. 304–313. Google Scholar
    • 16. L. Luo, J. Ming, D. Wu, P. Liu and S. Zhu, Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection, in Proc. ACM SIGSOFT Int. Symp. Foundations of Software Engineering, 2014, pp. 389–400. Google Scholar
    • 17. G. Myles and C. Collberg, K-gram based software birthmarks, in Proc. ACM Symp. Applied Computing, 2005, pp. 314–318. Google Scholar
    • 18. J. Han, H. Cheng, D. Xin and X. Yan, Frequent pattern mining: Current status and future directions, Data Min. Knowl. Discov. 15(1) (2007) 55–86. ISIGoogle Scholar
    • 19. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi and K. Hazelwood, Pin: Building customized program analysis tools with dynamic instrumentation, in Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 2005, pp. 190–200. Google Scholar
    • 20. X. Wang, Y.-C. Jhi, S. Zhu and P. Liu, Detecting software theft via system call based birthmarks, Annual Computer Security Applications Conf., 2009, pp. 149–158. Google Scholar
    • 21. X. Xie, F. Liu, B. Lu and L. Chen, A software birthmark based on weighted k-gram, 2010 IEEE Int. Conf. Intelligent Computing and Intelligent Systems, 2010, pp. 400–405. Google Scholar
    • 22. B. W. Matthews, Comparison of the predicted and observed secondary structure of T4 phage lysozyme, Biochim. Biophys. Acta (BBA)-Protein Struct. 405(2) (1975) 442–451. ISIGoogle Scholar
    • 23. A. A. Pandit and G. Toksha, Review of plagiarism detection technique in source code, Int. Conf. Intelligent Computing and Smart Communication, 2020, pp. 393–405. Google Scholar
    • 24. C. Liu, C. Chen, J. Han and P. S. Yu, GPLAG: Detection of software plagiarism by program dependence graph analysis, in Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2006, pp. 872–881. Google Scholar
    • 25. L. Prechelt, G. Malpohl and M. Philippsen, Finding plagiarisms among a set of programs with JPlag, J. Univers. Comput. Sci. 8(11) (2002) 1016. ISIGoogle Scholar
    • 26. G. Cosma and M. Joy, An approach to source-code plagiarism detection and investigation using latent semantic analysis, IEEE Trans. Comput. 61(3) (2011) 379–394. ISIGoogle Scholar
    • 27. S. Choi, H. Park, H.-i. Lim and T. Han, A static API birthmark for Windows binary executables, J. Syst. Softw. 82(5) (2009) 862–873. ISIGoogle Scholar
    • 28. H.-i. Lim, H. Park, S. Choi and T. Han, A method for detecting the theft of Java programs through analysis of the control flow information, Inf. Softw. Technol. 51(9) (2009) 1338–1350. ISIGoogle Scholar
    • 29. H.-i. Lim and H. Taisook, Analyzing stack flows to compare Java programs, IEICE Trans. Inf. Syst. 95(2) (2012) 565–576. ISIGoogle Scholar
    • 30. H. Park, H.-i. Lim, S. Choi and T. Han, Detecting common modules in Java packages based on static object trace birthmark, Comput. J. 54(1) (2011) 108–124. ISIGoogle Scholar
    • 31. L. Luo, J. Ming, D. Wu, P. Liu and S. Zhu, Semantics-based obfuscation-resilient binary code similarity comparison with applications to software and algorithm plagiarism detection, IEEE Trans. Softw. Eng. 43(12) (2017) 1157–1177. ISIGoogle Scholar
    • 32. Y. David, N. Partush and E. Yahav, Statistical similarity of binaries, in Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 2016, pp. 266–280. Google Scholar
    • 33. Y. David, N. Partush and E. Yahav, Similarity of binaries through re-optimization, in Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 2017, pp. 79–94. Google Scholar
    • 34. S. H. Ding, B. C. Fung and P. Charland, Asm2Vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization, IEEE Symp. Security and Privacy, 2019, pp. 472–489. Google Scholar
    • 35. Q. Le and T. Mikolov, Distributed representations of sentences and documents, Int. Conf. Machine Learning, 2014, pp. 1188–1196. Google Scholar
    • 36. X. Xu, C. Liu, Q. Feng, H. Yin, L. Song and D. Song, Neural network-based graph embedding for cross-platform binary code similarity detection, in Proc. ACM SIGSAC Conf. Computer and Communications Security, 2017, pp. 363–376. Google Scholar
    • 37. F. Zuo, X. Li, P. Young, L. Luo, Q. Zeng and Z. Zhang, Neural machine translation inspired binary code similarity comparison beyond function pairs, in Proc. 2019 Network and Distributed Systems Security Symp., 2019, pp. 1–15. Google Scholar
    • 38. F. Zhang, H. Huang, S. Zhu, D. Wu and P. Liu, ViewDroid: Towards obfuscation-resilient mobile application repackaging detection, in Proc. ACM Conf. Security and Privacy in Wireless & Mobile Networks, 2014, pp. 25–36. Google Scholar
    • 39. D. Schuler, V. Dallmeier and C. Lindig, A dynamic birthmark for Java, in Proc. IEEE/ACM Int. Conf. Automated Software Engineering, 2007, pp. 274–283. Google Scholar
    • 40. X. Wang, Y.-C. Jhi, S. Zhu and P. Liu, Behavior based software theft detection, in Proc. ACM Conf. Computer and Communications Security, 2009, pp. 280–290. Google Scholar
    • 41. P. P. F. Chan, L. C. K. Hui and S.-M. Yiu, Heap graph based software theft detection, IEEE Trans. Inf. Forensics Secur. 8(1) (2013) 101–110. ISIGoogle Scholar
    • 42. D.-K. Chae, S.-W. Kim, S.-J. Cho and Y. Kim, Effective and efficient detection of software theft via dynamic API authority vectors, J. Syst. Softw. 110 (2015) 1–9. ISIGoogle Scholar
    Remember to check out the Most Cited Articles!

    Check out our titles in C++ Programming!