A number of different measures have been proposed for evaluating computational models of human syntactic category acquisition. They all rely on a gold standard set of manually determined categories. However, children's syntactic categories change during language development, so evaluating against a fixed and final set of adult categories is not appropriate. In this paper, we propose a new measure, substitutable precision and recall, based on the idea that words which occur in similar syntactic environments share the same category. We use this measure to evaluate three standard category acquisition models (hierarchical clustering, frequent frames, Bayesian HMM) and show that the results correlate well with those obtained using two gold-standard-based measures.