Open Scripture Intelligence
AI-ready Bible dataset and knowledge graph. 66 books, 31,102 verses, cross-reference graph, semantic embeddings. Open source.
Why This Exists
Hundreds of Bible apps exist. All of them treat Scripture as a text database. None of them treat it as a knowledge system.
Most apps give users verses. This project lets apps understand how Scripture connects.
There is no open-source dataset that combines normalized scripture schema, markdown source, passage chunking, cross-reference graph, theological metadata, and semantic embeddings. Open Scripture Intelligence fills that gap.
What This Is
A structured, multi-layer dataset built from public-domain Bible translations.
| Layer | Format | Purpose |
|---|---|---|
| Source | Markdown | Human-readable, version-controlled Scripture text |
| Canonical | JSONL | Normalized verse/chapter/book records |
| Chunks | JSONL | Verse, passage, and chapter chunks for retrieval |
| Graph | JSONL | Cross-reference edges and relationship types |
| Metadata | JSON | Topics, entities, people, places, themes |
| Embeddings | JSONL | Semantic vectors for AI search and reasoning |
Repository Structure
open-scripture-intelligence/
source/
raw-markdown/ # Bible text in Markdown (one chapter per file)
canonical/
books.json # Book metadata (66 books)
verses.jsonl # Every verse as a normalized record
chapters.jsonl # Chapter-level records
chunks/
by_verse/ # Single-verse chunks
by_passage/ # Multi-verse passage chunks
by_chapter/ # Full chapter chunks
graph/
nodes.jsonl # Scripture graph nodes
edges.jsonl # Cross-reference and relationship edges
metadata/
topics.json # Theological topic taxonomy
entities.json # People, places, concepts
embeddings/
verse_embeddings.jsonl
passage_embeddings.jsonl
scripts/
parse_markdown.py # Ingest Markdown -> canonical JSONL
build_chunks.py # Generate chunk layers
build_graph.py # Build cross-reference graph
exports/
obsidian/ # Obsidian vault export
app/ # App-ready export
training/ # ML training export Schema Examples
Verse Record
{
"id": "web-john-3-16",
"translation": "WEB",
"book": "John",
"chapter": 3,
"verse": 16,
"reference": "John 3:16",
"text": "For God so loved the world, that he gave his only begotten Son...",
"testament": "NT",
"book_number": 43
} Passage Chunk
{
"id": "web-john-3-16-21",
"translation": "WEB",
"start_reference": "John 3:16",
"end_reference": "John 3:21",
"label": "God's love and salvation",
"verse_ids": ["web-john-3-16", "web-john-3-17", ...],
"text": "For God so loved the world..."
} Graph Edge
{
"from": "web-isaiah-53-5",
"to": "web-1peter-2-24",
"type": "prophecy_fulfillment",
"label": "suffering and healing",
"source": "openbible_crossrefs"
} Use Cases
Semantic Bible Search
Find passages by concept, not just keywords
Related Passage Discovery
Surface thematically connected verses
Prophecy Mapping
Trace OT prophecy to NT fulfillment
Sermon Preparation
Explore themes with AI-assisted context
Theological Research
Map concepts across Scripture
Bible Study Apps
Power "Explain This Passage" features
Translation Studies
Compare translations semantically
AI Assistants
Ground Scripture chatbots in structured data
Contributing
This project is built for the community. Contributions welcome:
- Scripture text normalization
- Cross-reference data
- Topic and entity tagging
- Embedding generation
- Export format adapters
- Documentation and schema improvements
Dataset structure and tooling: MIT License. Scripture text: Public domain translations (WEB, KJV, ASV).
Explore the dataset
Open Scripture Intelligence is open source. Clone it, build on it, contribute to it.
View on GitHub