Open Source

Open Scripture Intelligence

AI-ready Bible dataset and knowledge graph. 66 books, 31,102 verses, cross-reference graph, semantic embeddings. Open source.

Why This Exists

Hundreds of Bible apps exist. All of them treat Scripture as a text database. None of them treat it as a knowledge system.

Most apps give users verses. This project lets apps understand how Scripture connects.

There is no open-source dataset that combines normalized scripture schema, markdown source, passage chunking, cross-reference graph, theological metadata, and semantic embeddings. Open Scripture Intelligence fills that gap.

What This Is

A structured, multi-layer dataset built from public-domain Bible translations.

Layer Format Purpose
Source Markdown Human-readable, version-controlled Scripture text
Canonical JSONL Normalized verse/chapter/book records
Chunks JSONL Verse, passage, and chapter chunks for retrieval
Graph JSONL Cross-reference edges and relationship types
Metadata JSON Topics, entities, people, places, themes
Embeddings JSONL Semantic vectors for AI search and reasoning

Repository Structure

open-scripture-intelligence/
  source/
    raw-markdown/          # Bible text in Markdown (one chapter per file)
  canonical/
    books.json             # Book metadata (66 books)
    verses.jsonl           # Every verse as a normalized record
    chapters.jsonl         # Chapter-level records
  chunks/
    by_verse/              # Single-verse chunks
    by_passage/            # Multi-verse passage chunks
    by_chapter/            # Full chapter chunks
  graph/
    nodes.jsonl            # Scripture graph nodes
    edges.jsonl            # Cross-reference and relationship edges
  metadata/
    topics.json            # Theological topic taxonomy
    entities.json          # People, places, concepts
  embeddings/
    verse_embeddings.jsonl
    passage_embeddings.jsonl
  scripts/
    parse_markdown.py      # Ingest Markdown -> canonical JSONL
    build_chunks.py        # Generate chunk layers
    build_graph.py         # Build cross-reference graph
  exports/
    obsidian/              # Obsidian vault export
    app/                   # App-ready export
    training/              # ML training export

Schema Examples

Verse Record

{
  "id": "web-john-3-16",
  "translation": "WEB",
  "book": "John",
  "chapter": 3,
  "verse": 16,
  "reference": "John 3:16",
  "text": "For God so loved the world, that he gave his only begotten Son...",
  "testament": "NT",
  "book_number": 43
}

Passage Chunk

{
  "id": "web-john-3-16-21",
  "translation": "WEB",
  "start_reference": "John 3:16",
  "end_reference": "John 3:21",
  "label": "God's love and salvation",
  "verse_ids": ["web-john-3-16", "web-john-3-17", ...],
  "text": "For God so loved the world..."
}

Graph Edge

{
  "from": "web-isaiah-53-5",
  "to": "web-1peter-2-24",
  "type": "prophecy_fulfillment",
  "label": "suffering and healing",
  "source": "openbible_crossrefs"
}

Use Cases

Semantic Bible Search

Find passages by concept, not just keywords

Related Passage Discovery

Surface thematically connected verses

Prophecy Mapping

Trace OT prophecy to NT fulfillment

Sermon Preparation

Explore themes with AI-assisted context

Theological Research

Map concepts across Scripture

Bible Study Apps

Power "Explain This Passage" features

Translation Studies

Compare translations semantically

AI Assistants

Ground Scripture chatbots in structured data

Contributing

This project is built for the community. Contributions welcome:

  • Scripture text normalization
  • Cross-reference data
  • Topic and entity tagging
  • Embedding generation
  • Export format adapters
  • Documentation and schema improvements

Dataset structure and tooling: MIT License. Scripture text: Public domain translations (WEB, KJV, ASV).

Explore the dataset

Open Scripture Intelligence is open source. Clone it, build on it, contribute to it.

View on GitHub