open-notebook

Author	SHA1	Message	Date
Luis Novo	12a3caf636	fix: fail fast when source content extraction returns empty Add empty-content validation in content_process() after extract_content() returns. Sources with no extractable text (e.g. YouTube videos without transcripts) now raise ValueError immediately instead of silently saving an empty source. ValueError is already configured as a permanent failure in the retry config, so no retries are wasted on unrecoverable situations. Closes #527	2026-02-16 15:25:58 -03:00
Luis Novo	26d5349750	fix: handle empty/whitespace source content without retry loop (#576 ) Source.vectorize() wrapped its own ValueError in DatabaseOperationError, bypassing the stop_on=[ValueError] retry guard in process_source_command. This caused up to 15 retries when processing files with no extractable text, blocking sync API requests indefinitely. - Re-raise ValueError directly in Source.vectorize() instead of wrapping - Add .strip() check to catch whitespace-only content - Skip vectorization gracefully in save_source() when content is empty - Add unit tests for vectorize error handling Fixes #560	2026-02-14 18:09:07 -03:00
Luis Novo	877c303b02	fix: update esperanto dep and increase transformation max_tokens (#568 ) * fix: increase transformation max_tokens from 5055 to 8192 Closes #565 * chore: update esperanto dep to fix api keys passing via config - fixes: #567	2026-02-12 07:33:27 -03:00
Luis Novo	d8006ff5cb	feat: content-type aware chunking and unified embedding (#444 ) * feat: content-type aware chunking and unified embedding - Add chunking.py with HTML, Markdown, and plain text detection - Add embedding.py with mean pooling for large content - Create dedicated commands: embed_note, embed_insight, embed_source - Use fire-and-forget pattern for embedding via submit_command() - Refactor rebuild_embeddings_command to delegate to individual commands - Remove legacy commands and needs_embedding() methods - Reduce chunk size to 1500 chars for Ollama compatibility - Update CLAUDE.md documentation for new architecture Fixes #350, #142 * fix: address code review issues - Note.save() now returns command_id for tracking embedding jobs - Add length check after generate_embeddings() to fail fast on mismatch - Add numpy as explicit dependency (was transitive) - Remove hardcoded chunk sizes from docstrings * docs: address code review comments - Rename "SYNC PATH" to "DOMAIN MODEL PATH" in embedding router - Add test_chunking.py and test_embedding.py to Testing Strategy - Clarify auto-embedding behavior for each domain model * fix: clean thinking tags from prompt graph output Adds clean_thinking_content() to prompt.py to handle extended thinking models that return <think>...</think> tags. This fixes empty titles when saving notes from chat. * chore: remove local docker-compose from git * fix(frontend): handle null parent_id in search results Add defensive check for null parent_id in search results to prevent "Cannot read properties of null (reading 'split')" error. This can happen with orphaned records in the database. * fix: cascade delete embeddings and insights when source is deleted When deleting a Source, now also deletes associated: - source_embedding records - source_insight records This prevents orphaned records that cause null parent_id errors in vector search results. * fix: add cleanup for orphan embedding/insight records in migration 10 Deletes source_embedding and source_insight records where the linked source no longer exists (source.id = NONE). * chore: bump esperanto to 2.16 Increases ctx_num for Ollama models to accommodate larger notebook context windows. See: https://github.com/lfnovo/esperanto/pull/69	2026-01-21 23:49:08 -03:00
MisonL	67dd85c928	Feat/localization tests docker (#371 ) * feat(i18n): complete 100% internationalization and fix Next.js 15 compatibility * feat(i18n): complete 100% internationalization coverage * chore(test): finalize component tests and project cleanup * test(logic): add unit tests for useModalManager hook * fix(test): resolve timeout in AppSidebar tests by mocking TooltipProvider * feat(i18n): comprehensive i18n audit, fixes for hardcoded strings, and complete zh-TW support * fix(i18n): resolve TypeScript warnings and improve translation hook stability - Remove unused useTranslation import from ConnectionGuard - Add ref-based checking state to prevent dependency cycles - Fix useTranslation hook to return empty string for undefined translations - Add comment for backward compatibility on ExtractedReference interface - Ensure .replace() string methods work safely with nested translation keys * feat(i18n): complete internationalization implementation with Docker deployment - Add LanguageLoadingOverlay component for smooth language transitions - Update all translation files (en-US, zh-CN, zh-TW) with improved terminology - Optimize Docker configuration for better performance - Update version check and config handling for i18n support - Fix route handling for language-specific content - Add comprehensive task documentation * fix(i18n): resolve localization errors, duplicates, and type issues * chore(i18n): finalize 100% internationalization coverage * chore(test): supplement i18n test cases and cleanup redundant files * fix(test): resolve lint type errors and finalize delivery documents * feat(i18n): finalize full internationalization and zh-TW localization * fix(frontend): add missing devDependency and fix build tsconfig * feat(ui): enhance sidebar hover effects with better visual feedback * fix(frontend): resolve accessibility, i18n, and lint issues - fix: add missing id, name, autocomplete attributes to dialog inputs - fix: add aria labels and DialogDescription for accessibility - fix: resolve uncontrolled component warning in SettingsForm - fix: correct duplicate 'Traditional Chinese' label in zh-TW locale - feat: add i18n support for podcast template names - chore: fix lint errors in Dialogs * fix: address all 21 PR feedback items from cubic-dev-ai bot Configuration: - Remove ignoreDuringBuilds flags from next.config.ts Testing: - Fix AppSidebar.test.tsx regex pattern and add missing assertion Logic: - Fix ConnectionGuard.tsx re-entry prevention logic Internationalization (I18n) - Translations: - Add missing keys: notebooks.archived, common.note/insight, accessibility keys - Add specific keys: sources.allSourcesDescShort, transformations.selectModel - Add singular/plural keys: podcasts.usedByCount_one/other, common.note/notes - Add common.created/updated with {time} placeholder Internationalization (I18n) - Usage: - SourcesPage: use allSourcesDescShort instead of string splitting - TransformationPlayground: use navigation.transformation and selectModel - CommandPalette: use dedicated keys instead of string concatenation - GeneratePodcastDialog: fix zh-TW date locale handling - NotebookHeader: correctly interpolate {time} placeholder - TransformationCard: use common.description instead of undefined key - ChatPanel/SpeakerProfilesPanel: implement proper pluralization - SystemInfo: correctly interpolate {version} placeholder - LanguageLoadingOverlay: use t.common.loading instead of hardcoded string - MessageActions: use specific error key cannotSaveNoteNoNotebook Other: - Fix SessionManager.tsx exhaustive-deps warning * fix: remove duplicate locale keys and add missing zh-CN translations - en-US: remove duplicate loading key (line 59) and addNew key (sources) - zh-CN: remove duplicate common keys (loading, note, insight, newSource, newNotebook, newPodcast) - zh-CN: remove duplicate accessibility.searchNotebooks key - zh-CN: remove duplicate sources.addNew key - zh-CN: remove duplicate navigation.transformation key - zh-CN: add missing usedByCount_one and usedByCount_other keys in podcasts - zh-TW: remove duplicate common keys (loading, note, insight, newSource, newNotebook, newPodcast) - zh-TW: remove duplicate accessibility.searchNotebooks key - zh-TW: remove duplicate sources.addNew key * docs: remove info.md * fix: remove duplicate notebook keys and unused ts-expect-error - zh-CN: remove duplicate notebooks keys (archived, archive, unarchive, deleteNotebook, deleteNotebookDesc) - zh-TW: remove duplicate notebooks keys (archived, archive, unarchive, deleteNotebook, deleteNotebookDesc) - GeneratePodcastDialog: remove unused @ts-expect-error directive * fix(a11y): fix unassociated labels in search page - Replace <Label> with role='group' + aria-labelledby for search type section - Replace <Label> with role='group' + aria-labelledby for search in section - Follows WAI-ARIA best practices for labeling form field groups * fix(a11y): fix unassociated labels across multiple components - search/page.tsx: use role='group' + aria-labelledby for search type and search in sections - RebuildEmbeddings.tsx: use role='group' + aria-labelledby for include checkboxes - TransformationPlayground.tsx: replace Label with span for non-form output label * chore: revert to npm stack and ensure i18n compatibility * chore: polish zh-TW translations for better idiomatic usage * fix: resolve linter errors (ruff import sort, mypy config duplicate) * style: apply ruff formatting * fix: finalize upstream compliance (Dockerfile.single, i18n hooks, docker-compose) * style: polish strings, fix timeout cleanup, and improve test mocks * fix: use relative imports in test setup to resolve IDE path errors * perf(docker): optimize build speed by removing apt-get upgrade and build tools - Remove apt-get upgrade from both builder and runtime stages (saves 10-15 min each) - Remove gcc/g++/make/git from builder (uv downloads pre-built wheels) - Add --no-install-recommends to minimize package footprint - Keep npm mirror (npmmirror.com) for faster frontend deps - Add npm registry config for reliable China network access Also includes: - fix(a11y): add missing labels and aria attributes to form fields - fix(i18n): add 2s safety timeout to LanguageLoadingOverlay - fix(i18n): add robustness checks to use-translation proxy Build time reduced from 2+ hours to ~34 minutes (~70% improvement) * fix(a11y): resolve 16 form field accessibility warnings in notebook and podcast pages * fix(a11y): resolve 4 button and 1 select field accessibility warnings in models page * fix(a11y): resolve redundant attributes and residual warnings in transformations and podcast forms * fix(i18n): deep fix for language switch hang using proxy protection and safer access * fix(a11y): add name attributes to ModelSelector, TransformationPlayground, and SourceDetailContent * fix: add missing Label import to SourceDetailContent * fix(i18n): use native react-i18next in LanguageLoadingOverlay to prevent hang during language switch * fix(i18n): rewrite use-translation Proxy with strict depth limit and expanded blocked props to prevent language switch hang * fix: add type assertion to fix TypeScript comparison error * fix(i18n): disable useSuspense to prevent thread hang during language resource loading * fix(i18n): add infinite loop detection circuit breaker to useTranslation hook * fix(i18n): update traditional chinese label to native script in en-US * feat: add new localization strings for notebook and note management. * fix: resolve config priority, docker build deps, and ui glitches * refactor: improve ui details and test coverage based on feedback * refactor: improve ui details (version check/lang toggle) and test coverage * fix: polish language matching and test cleanup * fix(test): update mocks to resolve timeouts and proxy errors * fix(frontend): restore tsconfig.json structure and enable IDE support for tests * fix: address PR review findings and resolve CI OIDC failure * fix: merge exception headers in custom handler * fix: comprehensive PR review remediations and async performance fixes * refactor: address all PR #371 review feedback - Docker: consolidate SURREAL_URL to docker.env, add single-container override - Security: restore apt-get upgrade in Dockerfile and Dockerfile.single - Create centralized getDateLocale helper (lib/utils/date-locale.ts) - Refactor 7 files to use getDateLocale helper - Revert config/route.ts to origin/main version - Move test files to co-located pattern (3 files) - Remove local useTranslation mock from ConfirmDialog.test.tsx - Simplify use-version-check to single useEffect pattern - Fix test import paths after moving to co-located pattern * fix: add jest-dom types for test files * fix: address remaining review issues - Add apt-get upgrade -y to Dockerfile.single backend-builder stage - Refactor ChatColumn.test.tsx: use 'as unknown as ReturnType<typeof hook>' instead of 'as any' - Use toBeInTheDocument() assertions instead of toBeDefined()	2026-01-15 13:51:05 -03:00
LUIS NOVO	71b8d13b24	docs: generate comprehensive CLAUDE.md reference documentation across codebase Create a hierarchical CLAUDE.md documentation system for the entire Open Notebook codebase with focus on concise, pattern-driven reference cards rather than comprehensive tutorials. ## Changes ### Core Documentation System - Updated `.claude/commands/build-claude-md.md` to distinguish between leaf and parent modules, with special handling for prompt/template modules - Established clear patterns: * Leaf modules (40-70 lines): Components, hooks, API clients * Parent modules (50-150 lines): Architecture, cross-layer patterns, data flows * Template modules: Pattern focus, not catalog listings ### Generated Documentation Created 15 CLAUDE.md reference files across the project: Frontend (React/Next.js) - frontend/src/CLAUDE.md: Architecture overview, data flow, three-tier design - frontend/src/lib/hooks/CLAUDE.md: React Query patterns, state management - frontend/src/lib/api/CLAUDE.md: Axios client, FormData handling, interceptors - frontend/src/lib/stores/CLAUDE.md: Zustand state persistence, auth patterns - frontend/src/components/ui/CLAUDE.md: Radix UI primitives, CVA styling Backend (Python/FastAPI) - open_notebook/CLAUDE.md: System architecture, layer interactions - open_notebook/ai/CLAUDE.md: Model provisioning, Esperanto integration - open_notebook/domain/CLAUDE.md: Data models, ObjectModel/RecordModel patterns - open_notebook/database/CLAUDE.md: Repository pattern, async migrations - open_notebook/graphs/CLAUDE.md: LangGraph workflows, async orchestration - open_notebook/utils/CLAUDE.md: Cross-cutting utilities, context building - open_notebook/podcasts/CLAUDE.md: Episode/speaker profiles, job tracking API & Other - api/CLAUDE.md: REST layer, service architecture - commands/CLAUDE.md: Async command handlers, job queue patterns - prompts/CLAUDE.md: Jinja2 templates, prompt engineering patterns (refactored) Project Root - CLAUDE.md: Project overview, three-tier architecture, tech stack, getting started ### Key Features - Zero duplication: Parent modules reference child CLAUDE.md files, don't repeat them - Pattern-focused: Emphasizes how components work together, not component catalogs - Scannable: Short bullets, code examples only when necessary (1-2 per file) - Practical: "How to extend" guides, quirks/gotchas for each module - Navigation: Root CLAUDE.md acts as hub pointing to specialized documentation ### Cleanup - Removed unused `batch_fix_services.py` - Removed deprecated `open_notebook/plugins/podcasts.py` - Updated .gitignore for documentation consistency ## Impact New contributors can now: 1. Read root CLAUDE.md for system architecture (5 min) 2. Jump to specific layer documentation (frontend, api, open_notebook) 3. Dive into module-specific patterns in child CLAUDE.md files (1 min per module) All documentation is lean, reference-focused, and avoids duplication.	2026-01-03 16:27:52 -03:00
LUIS NOVO	ab5560c9a2	refactor: reorganize folder structure for better maintainability Changes: - Move migrations/ under open_notebook/database/migrations/ - Extract AI models to open_notebook/ai/ (Model, ModelManager, provision) - Extract podcasts to open_notebook/podcasts/ (EpisodeProfile, SpeakerProfile, PodcastEpisode) - Reorganize prompts to mirror graphs structure (chat/, source_chat/) This improves code organization by: - Consolidating database concerns (migrations now with database code) - Separating AI infrastructure from domain entities - Isolating podcast feature into its own module - Creating consistent prompt/graph naming conventions All 52 tests pass.	2026-01-03 14:04:27 -03:00
Justin Florentine	855e730577	fix: preserve AIMessage metadata when cleaning thinking content Use model_copy() instead of creating new AIMessage to preserve response_metadata, id, usage_metadata, etc. Also adds test coverage for malformed thinking tags pattern. Addresses PR #333 feedback from lfnovo and cubic-dev-ai. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-19 20:08:12 -05:00
Justin Florentine	869664a10b	fix: strip <think> tags from chat responses Add thinking content cleaning to notebook and source chat graphs. Previously, models that output <think>...</think> tags (like DeepSeek) or malformed variants without opening tags (like Nemotron) would leak reasoning content into user-visible responses. Changes: - chat.py: Clean AI response content before returning messages - source_chat.py: Same fix for source-specific chat - text_utils.py: Handle malformed output where opening <think> tag is missing but </think> is present 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-18 16:31:23 -05:00
Luis Novo	f79a9040ae	Release 1.2 (#242 ) * chore: improve podcast transcripts * fix: remove date from insight - fixes #241 * fix: improve scrolling on source and insights - fixes #237 * chore: update esperanto to fix: #234 * chore: update esperanto to fix #226 * fix: process vectorization as subcommands to handle larger documents more gracefully - fix: #229 * feat: enable background job retry capabilities * feat: reenable content types that were disabled during alpha version * fix: remove unnecessary model caching causing many issues. * feat: support multiple azure endpoints and keys just like openai compatible. Fixes #215 * docs: update azure variables * chore: bump and update dependencies	2025-11-01 14:40:00 -03:00
LUIS NOVO	a51bb9d792	fix: missing parenthesis	2025-10-18 13:22:39 -03:00
LUIS NOVO	8b5daa86bc	fix: max tokens max is 8192 now	2025-10-18 13:21:53 -03:00
Luis Novo	b7e656a319	Version 1 (#160 ) New front-end Launch Chat API Manage Sources Enable re-embedding of all contents Sources can be added without a notebook now Improved settings Enable model selector on all chats Background processing for better experience Dark mode Improved Notes Improved Docs: - Remove all Streamlit references from documentation - Update deployment guides with React frontend setup - Fix Docker environment variables format (SURREAL_URL, SURREAL_PASSWORD) - Update docker image tag from :latest to :v1-latest - Change navigation references (Settings → Models to just Models) - Update development setup to include frontend npm commands - Add MIGRATION.md guide for users upgrading from Streamlit - Update quick-start guide with correct environment variables - Add port 5055 documentation for API access - Update project structure to reflect frontend/ directory - Remove outdated source-chat documentation files	2025-10-18 12:46:22 -03:00
Luis Novo	d7b0fff954	Api podcast migration (#93 ) Creates the API layer for Open Notebook Creates a services API gateway for the Streamlit front-end Migrates the SurrealDB SDK to the official one Change all database calls to async New podcast framework supporting multiple speaker configurations Implement the surreal-commands library for async processing Improve docker image and docker-compose configurations	2025-07-17 08:36:11 -03:00
LUIS NOVO	7eee271232	feat: extract think tags from reasoning models	2025-06-26 11:41:15 -03:00
LUIS NOVO	bea43f3ce7	feat: implement the new model management based on esperanto framework	2025-06-08 19:38:43 -03:00
LUIS NOVO	2afbd36cb4	refactor: implement ai_prompter library	2025-06-01 08:09:33 -03:00
LUIS NOVO	1afb5d81e8	feat: implement new content settings page and remove options from the source panel	2025-05-30 15:25:39 -03:00
LUIS NOVO	36e928eb75	feat: replace content processing engine with content-core	2025-05-30 13:35:46 -03:00
LUIS NOVO	c297dcb809	refactor objectmodel	2024-11-19 19:03:32 -03:00
LUIS NOVO	4a5d47d934	refactor transformation, add graph and admin	2024-11-18 22:01:11 -03:00
LUIS NOVO	066c7a06e2	improve search functions	2024-11-13 15:52:44 -03:00
LUIS NOVO	80353a97c9	make model rag work with vector only	2024-11-13 12:18:26 -03:00
LUIS NOVO	e4b8fa8cc7	cleanup logging	2024-11-13 12:17:57 -03:00
LUIS NOVO	281abdf01b	improve the accuracy of ids in the citations	2024-11-13 11:55:38 -03:00
LUIS NOVO	a33228de5a	split system and user message in patterns	2024-11-12 12:56:03 -03:00
LUIS NOVO	8cb6d835fe	add ui improvements to embed and transformation dialogs	2024-11-11 18:17:08 -03:00
LUIS NOVO	817b1bc7f9	add initial embedding to the content graph	2024-11-11 17:47:50 -03:00
LUIS NOVO	01cf15e7d1	add check	2024-11-11 17:33:28 -03:00
LUIS NOVO	00f070a644	add async content processing	2024-11-11 17:32:35 -03:00
LUIS NOVO	2e2a4947b3	separate source and content graph	2024-11-10 13:30:03 -03:00
LUIS NOVO	d5be2b0d5b	make rag async	2024-11-09 16:03:41 -03:00
LUIS NOVO	183149014e	change model provisioning parameters	2024-11-08 16:08:54 -03:00
LUIS NOVO	99b8ada280	new ask model strategy	2024-11-08 16:08:13 -03:00
LUIS NOVO	418c67f69f	add search and rag functions in beta	2024-11-04 09:53:49 -03:00
LUIS NOVO	b4ba3ef4c8	change model provisioning strategy	2024-11-04 09:49:11 -03:00
LUIS NOVO	d9c0c93deb	improved typing	2024-11-01 22:50:27 -03:00
LUIS NOVO	7dc37a3ac7	model fixes	2024-11-01 22:43:33 -03:00
LUIS NOVO	223f1bdaf5	improve default_models	2024-11-01 22:38:21 -03:00
LUIS NOVO	212d3a33b0	improve object typing	2024-11-01 22:29:59 -03:00
LUIS NOVO	15048b0839	simplify model provisioning	2024-11-01 21:32:40 -03:00
LUIS NOVO	3b262a63f4	better model mgmt	2024-11-01 21:11:23 -03:00
LUIS NOVO	a9ac4a6dc8	model manager	2024-11-01 20:37:23 -03:00
LUIS NOVO	feabfaed01	remove defaultmodel from config file	2024-11-01 19:56:27 -03:00
LUIS NOVO	edf839cd1b	unused graphs	2024-11-01 19:08:33 -03:00
LUIS NOVO	0d4d9473b2	add transformation playground with model selection	2024-11-01 17:17:19 -03:00
LUIS NOVO	b89250d3ca	temporary fix to config cache	2024-11-01 17:06:10 -03:00
LUIS NOVO	796012f716	rename to patterns	2024-11-01 13:08:18 -03:00
LUIS NOVO	0876e94658	transformation folder change	2024-11-01 12:36:59 -03:00
LUIS NOVO	c65bf8ba12	rename model_name to model_id	2024-11-01 12:07:00 -03:00

1 2

75 commits