Plagiarism Detection of Multi-threaded Programs Using Frequent Behavioral Pattern Mining
Abstract
Software dynamic birthmark techniques construct birthmarks using the captured execution traces from running the programs, which serve as one of the most promising methods for obfuscation-resilient software plagiarism detection. However, due to the perturbation caused by non-deterministic thread scheduling in multi-threaded programs, such dynamic approaches optimized for sequential programs may suffer from the randomness in multi-threaded program plagiarism detection. In this paper, we propose a new dynamic thread-aware birthmark FPBirth to facilitate multi-threaded program plagiarism detection. We first explore dynamic monitoring to capture multiple execution traces with respect to system calls for each multi-threaded program under a specified input, and then leverage the Apriori algorithm to mine frequent patterns to formulate our dynamic birthmark, which can not only depict the program’s behavioral semantics, but also resist the changes and perturbations over execution traces caused by the thread scheduling in multi-threaded programs. Using FPBirth, we design a multi-threaded program plagiarism detection system. The experimental results based on a public software plagiarism sample set demonstrate that the developed system integrating our proposed birthmark FPBirth copes better with multi-threaded plagiarism detection than alternative approaches. Compared against the dynamic birthmark System Call Short Sequence Birthmark (SCSSB), FPBirth achieves 12.4%, 4.1% and 7.9% performance improvements with respect to union of resilience and credibility (URC), F-Measure and matthews correlation coefficient (MCC) metric, respectively.
References
- 1. , Achieving accuracy and scalability simultaneously in detecting application clones on android markets, in Proc. Int. Conf. Software Engineering, 2014, pp. 175–186. Google Scholar
- 2. , Repackage-proofing android apps, 46th Annual IEEE/IFIP Int. Conf. Dependable Systems and Networks, 2016, pp. 550–561. Google Scholar
- 3. , Detecting repackaged smartphone applications in third-party android marketplaces, in Proc. ACM Conf. Data and Application Security and Privacy, 2012, pp. 317–326. Google Scholar
- 4. , Design and evaluation of birthmarks for detecting theft of Java programs, IASTED Conf. Software Engineering, 2004, pp. 569–574. Google Scholar
- 5. , Detecting software theft via whole program path birthmarks, Int. Conf. Information Security, 2004, pp. 404–415. Google Scholar
- 6. , Software plagiarism detection: A survey, J. Cyber Secur. 1(3) (2016) 52–76. Google Scholar
- 7. , Software plagiarism detection with birthmarks based on dynamic key instruction sequences, IEEE Trans. Softw. Eng. 41(12) (2015) 1217–1235. ISI, Google Scholar
- 8. , Program logic based software plagiarism detection, IEEE Int. Symp. Software Reliability Engineering, 2014, pp. 66–77. Google Scholar
- 9. , Program characterization using runtime values and its application to software plagiarism detection, IEEE Trans. Softw. Eng. 41(9) (2015) 925–943. ISI, Google Scholar
- 10. , Binsim: Trace-based semantic binary diffing via system call sliced segment equivalence checking, 26th USENIX Security Symp., 2017, pp. 253–270. Google Scholar
- 11. , Value-based program characterization and its application to software plagiarism detection, in Proc. Int. Conf. Software Engineering, 2011, pp. 756–765. Google Scholar
- 12. , Reviving sequential program birthmarking for multithreaded software plagiarism detection, IEEE Trans. Softw. Eng. 44(5) (2017) 491–511. ISI, Google Scholar
- 13. , Exploiting thread-related system calls for plagiarism detection of multithreaded programs, J. Syst. Softw. 119 (2016) 136–148. ISI, Google Scholar
- 14. , Mining sequential patterns, in Proc. Int. Conf. Data Engineering, 1995, pp. 3–14. Google Scholar
- 15. , Plagiarism detection for multithreaded software based on thread-aware software birthmarks, in Proc. Int. Conf. Program Comprehension, 2014, pp. 304–313. Google Scholar
- 16. , Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection, in Proc. ACM SIGSOFT Int. Symp. Foundations of Software Engineering, 2014, pp. 389–400. Google Scholar
- 17. , K-gram based software birthmarks, in Proc. ACM Symp. Applied Computing, 2005, pp. 314–318. Google Scholar
- 18. , Frequent pattern mining: Current status and future directions, Data Min. Knowl. Discov. 15(1) (2007) 55–86. ISI, Google Scholar
- 19. , Pin: Building customized program analysis tools with dynamic instrumentation, in Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 2005, pp. 190–200. Google Scholar
- 20. , Detecting software theft via system call based birthmarks, Annual Computer Security Applications Conf., 2009, pp. 149–158. Google Scholar
- 21. , A software birthmark based on weighted k-gram, 2010 IEEE Int. Conf. Intelligent Computing and Intelligent Systems, 2010, pp. 400–405. Google Scholar
- 22. , Comparison of the predicted and observed secondary structure of T4 phage lysozyme, Biochim. Biophys. Acta (BBA)-Protein Struct. 405(2) (1975) 442–451. ISI, Google Scholar
- 23. , Review of plagiarism detection technique in source code, Int. Conf. Intelligent Computing and Smart Communication, 2020, pp. 393–405. Google Scholar
- 24. , GPLAG: Detection of software plagiarism by program dependence graph analysis, in Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2006, pp. 872–881. Google Scholar
- 25. , Finding plagiarisms among a set of programs with JPlag, J. Univers. Comput. Sci. 8(11) (2002) 1016. ISI, Google Scholar
- 26. , An approach to source-code plagiarism detection and investigation using latent semantic analysis, IEEE Trans. Comput. 61(3) (2011) 379–394. ISI, Google Scholar
- 27. , A static API birthmark for Windows binary executables, J. Syst. Softw. 82(5) (2009) 862–873. ISI, Google Scholar
- 28. , A method for detecting the theft of Java programs through analysis of the control flow information, Inf. Softw. Technol. 51(9) (2009) 1338–1350. ISI, Google Scholar
- 29. , Analyzing stack flows to compare Java programs, IEICE Trans. Inf. Syst. 95(2) (2012) 565–576. ISI, Google Scholar
- 30. , Detecting common modules in Java packages based on static object trace birthmark, Comput. J. 54(1) (2011) 108–124. ISI, Google Scholar
- 31. , Semantics-based obfuscation-resilient binary code similarity comparison with applications to software and algorithm plagiarism detection, IEEE Trans. Softw. Eng. 43(12) (2017) 1157–1177. ISI, Google Scholar
- 32. , Statistical similarity of binaries, in Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 2016, pp. 266–280. Google Scholar
- 33. , Similarity of binaries through re-optimization, in Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 2017, pp. 79–94. Google Scholar
- 34. , Asm2Vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization, IEEE Symp. Security and Privacy, 2019, pp. 472–489. Google Scholar
- 35. , Distributed representations of sentences and documents, Int. Conf. Machine Learning, 2014, pp. 1188–1196. Google Scholar
- 36. , Neural network-based graph embedding for cross-platform binary code similarity detection, in Proc. ACM SIGSAC Conf. Computer and Communications Security, 2017, pp. 363–376. Google Scholar
- 37. , Neural machine translation inspired binary code similarity comparison beyond function pairs, in Proc. 2019 Network and Distributed Systems Security Symp., 2019, pp. 1–15. Google Scholar
- 38. , ViewDroid: Towards obfuscation-resilient mobile application repackaging detection, in Proc. ACM Conf. Security and Privacy in Wireless & Mobile Networks, 2014, pp. 25–36. Google Scholar
- 39. , A dynamic birthmark for Java, in Proc. IEEE/ACM Int. Conf. Automated Software Engineering, 2007, pp. 274–283. Google Scholar
- 40. , Behavior based software theft detection, in Proc. ACM Conf. Computer and Communications Security, 2009, pp. 280–290. Google Scholar
- 41. , Heap graph based software theft detection, IEEE Trans. Inf. Forensics Secur. 8(1) (2013) 101–110. ISI, Google Scholar
- 42. , Effective and efficient detection of software theft via dynamic API authority vectors, J. Syst. Softw. 110 (2015) 1–9. ISI, Google Scholar
Remember to check out the Most Cited Articles! |
---|
Check out our titles in C++ Programming! |