An Open Source implementation of Notebook LM with more flexibility and features
Find a file
Luis Novo d8006ff5cb
feat: content-type aware chunking and unified embedding (#444)
* feat: content-type aware chunking and unified embedding

- Add chunking.py with HTML, Markdown, and plain text detection
- Add embedding.py with mean pooling for large content
- Create dedicated commands: embed_note, embed_insight, embed_source
- Use fire-and-forget pattern for embedding via submit_command()
- Refactor rebuild_embeddings_command to delegate to individual commands
- Remove legacy commands and needs_embedding() methods
- Reduce chunk size to 1500 chars for Ollama compatibility
- Update CLAUDE.md documentation for new architecture

Fixes #350, #142

* fix: address code review issues

- Note.save() now returns command_id for tracking embedding jobs
- Add length check after generate_embeddings() to fail fast on mismatch
- Add numpy as explicit dependency (was transitive)
- Remove hardcoded chunk sizes from docstrings

* docs: address code review comments

- Rename "SYNC PATH" to "DOMAIN MODEL PATH" in embedding router
- Add test_chunking.py and test_embedding.py to Testing Strategy
- Clarify auto-embedding behavior for each domain model

* fix: clean thinking tags from prompt graph output

Adds clean_thinking_content() to prompt.py to handle extended thinking
models that return <think>...</think> tags. This fixes empty titles
when saving notes from chat.

* chore: remove local docker-compose from git

* fix(frontend): handle null parent_id in search results

Add defensive check for null parent_id in search results to prevent
"Cannot read properties of null (reading 'split')" error. This can
happen with orphaned records in the database.

* fix: cascade delete embeddings and insights when source is deleted

When deleting a Source, now also deletes associated:
- source_embedding records
- source_insight records

This prevents orphaned records that cause null parent_id errors
in vector search results.

* fix: add cleanup for orphan embedding/insight records in migration 10

Deletes source_embedding and source_insight records where the
linked source no longer exists (source.id = NONE).

* chore: bump esperanto to 2.16

Increases ctx_num for Ollama models to accommodate larger notebook
context windows. See: https://github.com/lfnovo/esperanto/pull/69
2026-01-21 23:49:08 -03:00
.github Feat/localization tests docker (#371) 2026-01-15 13:51:05 -03:00
api feat: content-type aware chunking and unified embedding (#444) 2026-01-21 23:49:08 -03:00
commands feat: content-type aware chunking and unified embedding (#444) 2026-01-21 23:49:08 -03:00
docs docs: add conda installation instructions to README (#446) 2026-01-18 16:50:19 -03:00
frontend feat: content-type aware chunking and unified embedding (#444) 2026-01-21 23:49:08 -03:00
open_notebook feat: content-type aware chunking and unified embedding (#444) 2026-01-21 23:49:08 -03:00
prompts docs: generate comprehensive CLAUDE.md reference documentation across codebase 2026-01-03 16:27:52 -03:00
scripts Feat/localization tests docker (#371) 2026-01-15 13:51:05 -03:00
tests feat: content-type aware chunking and unified embedding (#444) 2026-01-21 23:49:08 -03:00
.dockerignore chore: post-i18n cleanup and version bump to 1.5.0 (#433) 2026-01-15 14:20:13 -03:00
.env.example docs: update all database examples for more clarity and better database names. 2026-01-04 09:23:15 -03:00
.gitignore feat: content-type aware chunking and unified embedding (#444) 2026-01-21 23:49:08 -03:00
.python-version forcing 3.12 as maximum python version to fix pydub issue 2025-04-26 06:23:31 -03:00
.worktreeinclude feat(ui): add command palette for quick navigation and search (#288) 2025-12-01 14:59:17 -03:00
CHANGELOG.md feat: content-type aware chunking and unified embedding (#444) 2026-01-21 23:49:08 -03:00
CLAUDE.md feat: content-type aware chunking and unified embedding (#444) 2026-01-21 23:49:08 -03:00
CONFIGURATION.md docs: restructure documentation with new organized layout 2026-01-03 20:10:24 -03:00
CONTRIBUTING.md docs: restructure documentation with new organized layout 2026-01-03 20:10:24 -03:00
docker-compose.dev.yml fix: specifiy docker compose and add port 2025-12-14 19:14:11 +02:00
docker-compose.full.yml Feat/localization tests docker (#371) 2026-01-15 13:51:05 -03:00
docker-compose.single.yml Feat/localization tests docker (#371) 2026-01-15 13:51:05 -03:00
Dockerfile Feat/localization tests docker (#371) 2026-01-15 13:51:05 -03:00
Dockerfile.single Feat/localization tests docker (#371) 2026-01-15 13:51:05 -03:00
LICENSE Initial commit with all features 2024-10-21 14:56:10 -03:00
logo.png fix discord link 2025-04-24 10:14:59 -03:00
MAINTAINER_GUIDE.md docs: restructure documentation with new organized layout 2026-01-03 20:10:24 -03:00
Makefile feat: improve dev commands, update all langchain dependencies to their latest major versions 2026-01-05 08:22:41 -03:00
mypy.ini Feat/localization tests docker (#371) 2026-01-15 13:51:05 -03:00
pyproject.toml feat: content-type aware chunking and unified embedding (#444) 2026-01-21 23:49:08 -03:00
README.dev.md chore: post-i18n cleanup and version bump to 1.5.0 (#433) 2026-01-15 14:20:13 -03:00
README.md docs: add conda installation instructions to README (#446) 2026-01-18 16:50:19 -03:00
run_api.py Api podcast migration (#93) 2025-07-17 08:36:11 -03:00
supervisord.conf fix: use standalone server for Next.js in Docker 2026-01-14 22:03:15 -03:00
supervisord.single.conf fix: use standalone server for Next.js in Docker 2026-01-14 22:03:15 -03:00
uv.lock feat: content-type aware chunking and unified embedding (#444) 2026-01-21 23:49:08 -03:00

Forks Stargazers Issues MIT License


Logo

Open Notebook

An open source, privacy-focused alternative to Google's Notebook LM!
Join our Discord server for help, to share workflow ideas, and suggest features!
Checkout our website »

📚 Get Started · 📖 User Guide · Features · 🚀 Deploy

lfnovo%2Fopen-notebook | Trendshift

New Notebook

In a world dominated by Artificial Intelligence, having the ability to think 🧠 and acquire new knowledge 💡, is a skill that should not be a privilege for a few, nor restricted to a single provider.

Open Notebook empowers you to:

  • 🔒 Control your data - Keep your research private and secure
  • 🤖 Choose your AI models - Support for 16+ providers including OpenAI, Anthropic, Ollama, LM Studio, and more
  • 📚 Organize multi-modal content - PDFs, videos, audio, web pages, and more
  • 🎙️ Generate professional podcasts - Advanced multi-speaker podcast generation
  • 🔍 Search intelligently - Full-text and vector search across all your content
  • 💬 Chat with context - AI conversations powered by your research
  • 🌐 Multi-language UI - English, Portuguese, and Chinese (Simplified & Traditional) support

Learn more about our project at https://www.open-notebook.ai


🆚 Open Notebook vs Google Notebook LM

Feature Open Notebook Google Notebook LM Advantage
Privacy & Control Self-hosted, your data Google cloud only Complete data sovereignty
AI Provider Choice 16+ providers (OpenAI, Anthropic, Ollama, LM Studio, etc.) Google models only Flexibility and cost optimization
Podcast Speakers 1-4 speakers with custom profiles 2 speakers only Extreme flexibility
Content Transformations Custom and built-in Limited options Unlimited processing power
API Access Full REST API No API Complete automation
Deployment Docker, cloud, or local Google hosted only Deploy anywhere
Citations Basic references (will improve) Comprehensive with sources Research integrity
Customization Open source, fully customizable Closed system Unlimited extensibility
Cost Pay only for AI usage Free tier + Monthly subscription Transparent and controllable

Why Choose Open Notebook?

  • 🔒 Privacy First: Your sensitive research stays completely private
  • 💰 Cost Control: Choose cheaper AI providers or run locally with Ollama
  • 🎙️ Better Podcasts: Full script control and multi-speaker flexibility vs limited 2-speaker deep-dive format
  • 🔧 Unlimited Customization: Modify, extend, and integrate as needed
  • 🌐 No Vendor Lock-in: Switch providers, deploy anywhere, own your data

Built With

Python Next.js React SurrealDB LangChain

🚀 Quick Start

Choose your installation method:

Best for most users - Fast setup with Docker Compose:

Docker Compose Installation Guide

  • Multi-container setup (recommended)
  • 5-10 minutes setup time
  • Requires Docker Desktop

Quick Start:

  • Get an API key (OpenAI, Anthropic, Google, etc.) or setup Ollama
  • Create docker-compose.yml (example in guide)
  • Run: docker compose up -d
  • Access: http://localhost:8502

💻 From Source (Developers)

For development and contributors:

From Source Installation Guide

  • Clone and run locally
  • 10-15 minutes setup time
  • Requires: Python 3.11+, Node.js 18+, Docker, uv

Quick Start:

git clone https://github.com/lfnovo/open-notebook.git
uv sync
make start-all

Access: http://localhost:3000 (dev) or http://localhost:8502 (production)


📖 Need Help?


Star History

Star History Chart

Provider Support Matrix

Thanks to the Esperanto library, we support this providers out of the box!

Provider LLM Support Embedding Support Speech-to-Text Text-to-Speech
OpenAI
Anthropic
Groq
Google (GenAI)
Vertex AI
Ollama
Perplexity
ElevenLabs
Azure OpenAI
Mistral
DeepSeek
Voyage
xAI
OpenRouter
OpenAI Compatible*

*Supports LM Studio and any OpenAI-compatible endpoint

Key Features

Core Capabilities

  • 🔒 Privacy-First: Your data stays under your control - no cloud dependencies
  • 🎯 Multi-Notebook Organization: Manage multiple research projects seamlessly
  • 📚 Universal Content Support: PDFs, videos, audio, web pages, Office docs, and more
  • 🤖 Multi-Model AI Support: 16+ providers including OpenAI, Anthropic, Ollama, Google, LM Studio, and more
  • 🎙️ Professional Podcast Generation: Advanced multi-speaker podcasts with Episode Profiles
  • 🔍 Intelligent Search: Full-text and vector search across all your content
  • 💬 Context-Aware Chat: AI conversations powered by your research materials
  • 📝 AI-Assisted Notes: Generate insights or write notes manually

Advanced Features

  • Reasoning Model Support: Full support for thinking models like DeepSeek-R1 and Qwen3
  • 🔧 Content Transformations: Powerful customizable actions to summarize and extract insights
  • 🌐 Comprehensive REST API: Full programmatic access for custom integrations API Docs
  • 🔐 Optional Password Protection: Secure public deployments with authentication
  • 📊 Fine-Grained Context Control: Choose exactly what to share with AI models
  • 📎 Citations: Get answers with proper source citations

Podcast Feature

Check out our podcast sample

📚 Documentation

Getting Started

User Guide

Advanced Topics

(back to top)

🗺️ Roadmap

Upcoming Features

  • Live Front-End Updates: Real-time UI updates for smoother experience
  • Async Processing: Faster UI through asynchronous content processing
  • Cross-Notebook Sources: Reuse research materials across projects
  • Bookmark Integration: Connect with your favorite bookmarking apps

Recently Completed

  • Next.js Frontend: Modern React-based frontend with improved performance
  • Comprehensive REST API: Full programmatic access to all functionality
  • Multi-Model Support: 16+ AI providers including OpenAI, Anthropic, Ollama, LM Studio
  • Advanced Podcast Generator: Professional multi-speaker podcasts with Episode Profiles
  • Content Transformations: Powerful customizable actions for content processing
  • Enhanced Citations: Improved layout and finer control for source citations
  • Multiple Chat Sessions: Manage different conversations within notebooks

See the open issues for a full list of proposed features and known issues.

(back to top)

📖 Need Help?

🤝 Community & Contributing

Join the Community

  • 💬 Discord Server - Get help, share ideas, and connect with other users
  • 🐛 GitHub Issues - Report bugs and request features
  • Star this repo - Show your support and help others discover Open Notebook

Contributing

We welcome contributions! We're especially looking for help with:

  • Frontend Development: Help improve our modern Next.js/React UI
  • Testing & Bug Fixes: Make Open Notebook more robust
  • Feature Development: Build the coolest research tool together
  • Documentation: Improve guides and tutorials

Current Tech Stack: Python, FastAPI, Next.js, React, SurrealDB Future Roadmap: Real-time updates, enhanced async processing

See our Contributing Guide for detailed information on how to get started.

(back to top)

📄 License

Open Notebook is MIT licensed. See the LICENSE file for details.

Community Support:

(back to top)