I Built an Open-Source Framework to Make LLM Data Extraction Dead Simple
After getting tired of writing endless boilerplate to extract structured data from documents with LLMs, I built ContextGem - a free, open-source framework that makes this radically easier. What makes it different? ✅ Automated dynamic prompts and data modeling ✅ Precise reference mapping to source content ✅ Built-in justifications for extractions ✅ Nested context extraction ✅ Works with any LLM provider and more built-in abstractions that save developer time. Simple LLM extraction in just a few lines: from contextgem import Aspect, Document, DocumentLLM # Define what to extract doc = Document(raw_text="Your document text here...") doc.aspects = [ Aspect( name="Intellectual property", description="Clauses on intellectual property rights", ) ] # Extract with any LLM llm = DocumentLLM(model="/", api_key="") doc = llm.extract_all(doc) # Get results print(doc.aspects[0].extracted_items) Features a native DOCX converter, support for multiple LLMs, and full serialization - all under Apache 2.0 permissive license. View project on GitHub: https://github.com/shcherbak-ai/contextgem Try it out and let me know your thoughts!

After getting tired of writing endless boilerplate to extract structured data from documents with LLMs, I built ContextGem - a free, open-source framework that makes this radically easier.
What makes it different?
✅ Automated dynamic prompts and data modeling
✅ Precise reference mapping to source content
✅ Built-in justifications for extractions
✅ Nested context extraction
✅ Works with any LLM provider
and more built-in abstractions that save developer time.
Simple LLM extraction in just a few lines:
from contextgem import Aspect, Document, DocumentLLM
# Define what to extract
doc = Document(raw_text="Your document text here...")
doc.aspects = [
Aspect(
name="Intellectual property",
description="Clauses on intellectual property rights",
)
]
# Extract with any LLM
llm = DocumentLLM(model="/ ", api_key=" ")
doc = llm.extract_all(doc)
# Get results
print(doc.aspects[0].extracted_items)
Features a native DOCX converter, support for multiple LLMs, and full serialization - all under Apache 2.0 permissive license.
View project on GitHub: https://github.com/shcherbak-ai/contextgem
Try it out and let me know your thoughts!