Developing and maintaining reliable object-oriented software requires a precise understanding of how individual classes must be used. Unfortunately, for many systems, especially those that are large, the available documentation is inadequate. Developers are left with incomplete information concerning the allowable set of call sequences that each class can accommodate. Techniques for reverse engineering this information and presenting it to developers in an intellectually scalable manner are critical.
In this paper, we present four contributions to address this challenge. First, we describe a runtime trace collection system for large C++ applications. Second, we present a methodology for reverse engineering interface protocols from collected trace data. Third, we present a scalable, tunable algorithm for generating compact specifications of these protocols. Finally, we present a detailed case study involving the Mozilla Necko library. We consider popular applications in common use constructed using this library. The results are promising both in terms of the performance of the approach and the utility of the identified protocols.