AI research nonprofit EleutherAI releases the Common Pile v0.1, an 8TB dataset of licensed and open-domain text for AI models that it says is one of the largest (Kyle Wiggers/TechCrunch)

Kyle Wiggers / TechCrunch: AI research nonprofit EleutherAI releases the Common Pile v0.1, an 8TB dataset of licensed and open-domain text for AI models that it says is one of the largest  —  EleutherAI, an AI research organization, has released what it claims is one of the largest collections of licensed and open-domain text for training AI models.

Jun 7, 2025 - 17:10
 0
AI research nonprofit EleutherAI releases the Common Pile v0.1, an 8TB dataset of licensed and open-domain text for AI models that it says is one of the largest (Kyle Wiggers/TechCrunch)

Kyle Wiggers / TechCrunch:
AI research nonprofit EleutherAI releases the Common Pile v0.1, an 8TB dataset of licensed and open-domain text for AI models that it says is one of the largest  —  EleutherAI, an AI research organization, has released what it claims is one of the largest collections of licensed and open-domain text for training AI models.