Home / Technology / AI Trains on Millions of Unlicensed Music Tracks
AI Trains on Millions of Unlicensed Music Tracks
21 Jun
Summary
- Four large datasets of music used for AI training were found.
- Some AI developers use tools violating platform terms of service.
- Artists from pop stars to experimental composers are included.

Recently, four datasets of music utilized for AI model training were uncovered and made searchable, revealing the scale of audio data being used. Two of these datasets are exceptionally large, containing 12 million and 9 million tracks, while two others each exceed 100,000 songs. These collections have seen thousands of downloads, with major entities like Google and Stability confirming their use in research.
Several of the music sources are available for personal streaming but require licensing for commercial applications, a distinction often bypassed in AI training. A significant concern is that three datasets consist of links to songs on platforms like YouTube and Spotify. Developers are employing automated tools to download the actual audio, potentially violating the terms of service of these platforms by bypassing revenue-generating mechanisms and creator monetization.
The included artists span a wide spectrum, from global pop figures like Lady Gaga and Fred Again.. to rock bands like Radiohead, electronic artists such as Aphex Twin, and hip-hop legends Wu-Tang Clan. Experimental composer Hainbach is also noted. The public can now explore these datasets on the Atlantic's AI Watchdog site to see what media is training AI models.