Randomized benchmarking (RB) is a popular procedure used to gauge the performance of a set of gates useful for quantum information processing (QIP). Recently, Proctor et al. [Phys. Rev. Lett. 119 (2017) 130502] demonstrated a practically relevant example where the RB measurements give a number r, very different from the actual average gate-set infidelity ϵ, despite past theoretical assurances that the two should be equal. Here, we derive formulas for ϵ, and for r from the RB protocol, in a manner permitting easy comparison of the two. We show in general that, indeed, r≠ϵ, i.e. RB does not measure average infidelity, and, in fact, neither one bounds the other. We give several examples, all plausible in real experiments, to illustrate the differences in ϵ and r. Many recent papers on experimental implementations of QIP have claimed the ability to perform high-fidelity gates because they demonstrated small r values using RB. Our analysis shows that such a statement from RB alone has to be interpreted with caution.