Evidence suggests that sparse coding allows for a more efficient and effective way to distill structural information about the environment. Our simple recurrent network has demonstrated the same to be true of learning musical structure. Two experiments are presented that examine the learning trajectory of a simple recurrent network exposed to musical input. Both experiments compare the networks internal representations to behavioral data: Listeners rate the networks own novel musical output from different points along the learning trajectory. The first study focused on learning the tonal relationships inherent in five simple melodies. The developmental trajectory of the network was studied by examining sparseness of the hidden layer activations and the sophistication of the networks compositions. The second study used more complex musical input and focused on both tonal and rhythmic relationships in music. We found that increasing sparseness of the hidden layer activations strongly correlated with the increasing sophistication of the networks output. Interestingly, sparseness was not programmed into the network; this property simply arose from learning the musical input. We argue that sparseness underlies the networks success: It is the mechanism through which musical characteristics are learned and distilled, and facilitates the networks ability to produce more complex and stylistic novel compositions over time.