Harvard releases Institutional Books 1.0, a dataset for AI researchers with 242B tokens, from 394M scanned pages and 983K public domain books in 254 languages (Matt O'Brien/Associated Press)

Matt O'Brien / Associated Press: Harvard releases Institutional Books 1.0, a dataset for AI researchers with 242B tokens, from 394M scanned pages and 983K public domain books in 254 languages  —  Everything ever said on the internet was just the start of teaching artificial intelligence about humanity.

Jun 14, 2025 - 05:30
 0
Harvard releases Institutional Books 1.0, a dataset for AI researchers with 242B tokens, from 394M scanned pages and 983K public domain books in 254 languages (Matt O'Brien/Associated Press)

Matt O'Brien / Associated Press:
Harvard releases Institutional Books 1.0, a dataset for AI researchers with 242B tokens, from 394M scanned pages and 983K public domain books in 254 languages  —  Everything ever said on the internet was just the start of teaching artificial intelligence about humanity.