The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. Its purposes are: To encourage research on algorithms that scale to commercial sizes To provide a reference dataset for evaluating research As a shortcut alternative to creating a large dataset with APIs (e.g. The Echo Nest’s) To help new researchers get started in the MIR field Full Info: http://labrosa.ee…( read more )
More here:
Processing Million Songs Dataset with Pig scripts on Apache Hadoop on Windows Azure