feat: update models pages

2025-06-09 20:47:20 -03:00 · 2025-06-09 20:47:20 -03:00 · 6c571db8c4
commit 6c571db8c4
parent bbd2f7dbfc
2 changed files with 402 additions and 191 deletions
--- a/docs/models.md
+++ b/docs/models.md
@ -0,0 +1,177 @@
+# AI Model Selection Guide
+
+This guide helps you choose the best AI models for your Open Notebook setup. We'll cover what makes each provider special, which models work best for different tasks, and give you ready-to-use combinations to get started quickly.
+
+## Understanding Model Types
+
+Open Notebook uses four types of AI models:
+
+- **Language Models**: For chat, text generation, summaries, and tool calling
+- **Embedding Models**: For semantic search and content similarity
+- **Text-to-Speech (TTS)**: For generating podcasts and audio content
+- **Speech-to-Text (STT)**: For transcribing audio files
+
+## What to Consider When Choosing Models
+
+**💰 Cost**: Some models are free (Ollama), others charge per token
+**🎯 Quality**: Higher quality models often cost more but produce better results
+**⚡ Speed**: Smaller models are faster but may be less capable
+**🔧 Features**: Some models excel at specific tasks like tool calling or large contexts
+
+---
+
+## Provider Breakdown
+
+### 🟦 Google (Gemini)
+**Best for**: Large context processing, affordable high-quality models
+
+**Language Models**
+- `gemini-2.0-flash` - Excellent balance of price and performance with 1M context window
+- `gemini-2.5-pro-preview-06-05` - Premium model for complex reasoning tasks
+
+**Text-to-Speech**
+- `gemini-2.5-flash-preview-tts` - Good quality at $10 per 1M tokens
+- `gemini-2.5-pro-preview-tts` - Higher quality at $20 per 1M tokens
+
+**Embedding**
+- `text-embedding-004` - Solid performance with generous free tier
+
+---
+
+### 🟢 OpenAI
+**Best for**: Reliable performance, excellent tool calling, wide ecosystem support
+
+**Language Models**
+- `gpt-4o-mini` - Great value for most tasks, perfect for everyday use
+- `gpt-4o` - Premium quality with excellent tool calling capabilities
+
+**Text-to-Speech**
+- `tts-1` - Good quality for personal use and podcasts
+
+**Speech-to-Text**
+- `whisper-1` - Industry-standard transcription quality
+
+**Embedding**
+- `text-embedding-3-small` - Affordable at $0.02 per 1M tokens with solid performance
+
+---
+
+### 🎤 ElevenLabs
+**Best for**: High-quality voice synthesis and transcription
+
+**Text-to-Speech**
+- `eleven_turbo_v2_5` - Excellent voice quality with reasonable pricing
+
+**Speech-to-Text**
+- `scribe_v1` - High-quality transcription service
+
+---
+
+### 🔵 DeepSeek
+**Best for**: Cost-effective language models with good performance
+
+**Language Models**
+- `deepseek-chat` - Excellent quality-to-price ratio with 64k context window
+
+---
+
+### 🟡 Mistral
+**Best for**: European-based alternative with competitive pricing
+
+**Language Models**
+- `mistral-medium-latest` - Good balance of quality and price
+- `ministral-8b-latest` - Perfect for simple tasks like transformations
+
+**Embedding**
+- `mistral-embed` - Good quality, though not the most cost-effective
+
+---
+
+### ⚡ Grok (xAI)
+**Best for**: Cutting-edge intelligence and reasoning
+
+**Language Models**
+- `grok-3` - Top-tier intelligence, premium pricing
+- `grok-3-mini` - Excellent performance at more accessible pricing
+
+---
+
+### 🚢 Voyage AI
+**Best for**: Specialized embedding models
+
+**Embedding**
+- `voyage-3.5-lite` - Competitive with OpenAI's offering at similar pricing
+
+---
+
+### 🟣 Anthropic (Claude)
+**Best for**: High-quality reasoning and safety
+
+**Language Models**
+- `claude-3-5-sonnet-latest` - Exceptional quality for complex tasks
+
+---
+
+### 🦙 Ollama (Local/Free)
+**Best for**: Privacy, offline use, and zero ongoing costs
+
+**Language Models**
+- `qwen3` - Excellent free alternative for most language tasks
+- `gemma3` - Great for chat and simple transformations
+- `phi4` - Compact but capable model
+- `deepseek-r1` - Advanced reasoning capabilities
+- `llama4` - Well-rounded performance
+
+**Embedding**
+- `mxbai-embed-large` - Outstanding free embedding model
+
+---
+
+## Recommended Combinations
+
+### 🌟 Best Value (Mixed Providers)
+Perfect balance of cost and performance
+- **Chat**: `gpt-4o-mini` (OpenAI) - Reliable and affordable
+- **Tools**: `gpt-4o` (OpenAI) - Excellent tool calling
+- **Transformations**: `ministral-8b-latest` (Mistral) - Cost-effective
+- **Large Context**: `gemini-2.0-flash` (Google) - 1M context window
+- **Embedding**: `text-embedding-3-small` (OpenAI) - Good price/performance
+- **TTS**: `gemini-2.5-flash-preview-tts` (Google) - Affordable quality
+- **STT**: `whisper-1` (OpenAI) - Industry standard
+
+### 💰 Budget-Friendly (Mostly Free)
+Great for getting started or keeping costs low
+- **Language**: `qwen3` (Ollama) - Free and capable
+- **Tools**: `qwen3` (Ollama) - Handles basic tool calling
+- **Transformations**: `gemma3` (Ollama) - Free and fast
+- **Embedding**: `mxbai-embed-large` (Ollama) - Free, high quality
+- **TTS**: `tts-1` (OpenAI) - Reasonable cost
+- **STT**: `whisper-1` (OpenAI) - Best value
+
+### 🚀 High Performance (Premium)
+When quality is your top priority
+- **Chat**: `claude-3-5-sonnet-latest` (Anthropic) or `grok-3` (xAI) - Exceptional reasoning
+- **Tools**: `gpt-4o` (OpenAI) or `claude-3-5-sonnet-latest` (Anthropic) or `grok-3` (xAI) - Best tool calling
+- **Transformations**: `grok-3-mini` (xAI) - Smart and efficient
+- **Large Context**: `gemini-2.5-pro-preview-06-05` (Google) - Premium quality
+- **Embedding**: `voyage-3.5-lite` (Voyage) - Specialized performance
+- **TTS**: `eleven_turbo_v2_5` (ElevenLabs) - Premium voice quality
+- **STT**: `whisper-1` (OpenAI) - Proven reliability
+
+### 🏢 Single Provider (OpenAI)
+Simplify billing and setup with one provider
+- **Chat**: `gpt-4o-mini` - Everyday conversations
+- **Tools**: `gpt-4o` - Complex operations
+- **Transformations**: `gpt-4o-mini` - Cost-effective processing
+- **Embedding**: `text-embedding-3-small` - Solid performance
+- **TTS**: `tts-1` - Good enough quality
+- **STT**: `whisper-1` - Industry standard
+
+## Getting Started
+
+1. **New users**: Start with the "Budget-Friendly" combination
+2. **Want convenience**: Use the "Single Provider (OpenAI)" setup  
+3. **Need quality**: Go with "Best Value" for optimal balance
+4. **Budget isn't a concern**: Choose "High Performance"
+
+Remember: You can always start simple and upgrade specific models as your needs grow!
--- a/pages/7_🤖_Models.py
+++ b/pages/7_🤖_Models.py
@ -3,7 +3,6 @@ import os
 import streamlit as st
 from esperanto import AIFactory

-from open_notebook.config import CONFIG
 from open_notebook.domain.models import DefaultModels, Model, model_manager
 from pages.components.model_selector import model_selector
 from pages.stream_app.utils import setup_page
@ -34,11 +33,6 @@ def check_available_providers():
        and os.environ.get("VERTEX_LOCATION") is not None
        and os.environ.get("GOOGLE_APPLICATION_CREDENTIALS") is not None
    )
-    # provider_status["vertexai-anthropic"] = (
-    #     os.environ.get("VERTEX_PROJECT") is not None
-    #     and os.environ.get("VERTEX_LOCATION") is not None
-    #     and os.environ.get("GOOGLE_APPLICATION_CREDENTIALS") is not None
-    # )
    provider_status["gemini"] = os.environ.get("GOOGLE_API_KEY") is not None
    provider_status["openrouter"] = (
        os.environ.get("OPENROUTER_API_KEY") is not None
@ -63,35 +57,6 @@ def check_available_providers():
    return available_providers, unavailable_providers


-def generate_new_models(models, suggested_models):
-    # Create a set of existing model keys for efficient lookup
-    existing_model_keys = {
-        f"{model.provider}-{model.name}-{model.type}" for model in models
-    }
-
-    new_models = []
-
-    # Iterate through suggested models by provider
-    for provider, types in suggested_models.items():
-        # Iterate through each type (language, embedding, etc.)
-        for type_, model_list in types.items():
-            for model_name in model_list:
-                model_key = f"{provider}-{model_name}-{type_}"
-
-                # Check if model already exists
-                if model_key not in existing_model_keys:
-                    if provider_status.get(provider):
-                        new_models.append(
-                            {
-                                "name": model_name,
-                                "type": type_,
-                                "provider": provider,
-                            }
-                        )
-
-    return new_models
-
-
 default_models = DefaultModels()
 all_models = Model.get_all()
 esperanto_available_providers = AIFactory.get_available_providers()
@ -108,180 +73,249 @@ with st.expander("Unavailable Providers"):
    st.write(unavailable_providers)

 st.divider()
-st.subheader("Add Model")
-st.markdown(
-    "Even though a lot of models can be supported, not all will perform optimally. Some are more fit for use in this tool than others. To help you decide which models to use, please refer to [Which model to choose?](https://github.com/lfnovo/open-notebook/blob/main/docs/SETUP.md#which-model-to-choose) for more information. You can also play with some models in the [Transformations](https://try-it-out.open-notebook.com) page to see if they match your needs."
-)

-available_model_types = esperanto_available_providers.keys()
-model_type = st.selectbox(
-    "Model Type",
-    available_model_types,
-    help="Use language for text generation models, text_to_speech for TTS models for generating podcasts, etc.",
-)
-provider = st.selectbox("Provider", esperanto_available_providers[model_type])

-if model_type == "text_to_speech" and provider == "gemini":
-    model_name = "gemini-default"
-    st.markdown("Gemini models are pre-configured. Using the default model.")
-else:
-    model_name = st.text_input(
-        "Model Name", "", help="gpt-4o-mini, claude, gemini, llama3, etc"
-    )
-if st.button("Save"):
-    model = Model(name=model_name, provider=provider, type=model_type)
-    model.save()
-    st.success("Saved")
+# Helper function to add model with auto-save
+def add_model_form(model_type, container_key):
+    available_providers = esperanto_available_providers.get(model_type, [])
+    if not available_providers:
+        st.info(f"No providers available for {model_type}")
+        return

-st.divider()
-suggested_models = CONFIG.get("suggested_models", [])
-recommendations = generate_new_models(all_models, suggested_models)
-if len(recommendations) > 0:
-    with st.expander("💁‍♂️ Recommended models to get you started.."):
-        for recommendation in recommendations:
-            st.markdown(
-                f"**{recommendation['name']}** ({recommendation['provider']}, {recommendation['type']})"
+    st.markdown("**Add New Model**")
+
+    with st.form(key=f"add_{model_type}_{container_key}"):
+        provider = st.selectbox(
+            "Provider",
+            available_providers,
+            key=f"provider_{model_type}_{container_key}",
+        )
+
+        if model_type == "text_to_speech" and provider == "gemini":
+            model_name = "gemini-default"
+            st.markdown("Gemini models are pre-configured. Using the default model.")
+        else:
+            model_name = st.text_input(
+                "Model Name",
+                key=f"name_{model_type}_{container_key}",
+                help="gpt-4o-mini, claude, gemini, llama3, etc",
            )
-            if st.button("Add", key=f"add_{recommendation['name']}"):
-                new_model = Model(**recommendation)
-                new_model.save()
+
+        if st.form_submit_button("Add Model"):
+            if model_name:
+                model = Model(name=model_name, provider=provider, type=model_type)
+                model.save()
+                st.success("Model added!")
                st.rerun()
-st.divider()

-st.subheader("Configured Models")
-model_types_available = {
-    # "vision": False,
-    "language": False,
-    "embedding": False,
-    "text_to_speech": False,
-    "speech_to_text": False,
+
+# Helper function to handle default model selection with auto-save
+def handle_default_selection(
+    label, key, current_value, help_text, model_type, caption=None
+):
+    selected_model = model_selector(
+        label,
+        key,
+        selected_id=current_value,
+        help=help_text,
+        model_type=model_type,
+    )
+    # Auto-save when selection changes
+    if selected_model and (not current_value or selected_model.id != current_value):
+        setattr(default_models, key, selected_model.id)
+        default_models.update()
+        model_manager.refresh_defaults()
+    elif not selected_model and current_value:
+        setattr(default_models, key, None)
+        default_models.update()
+        model_manager.refresh_defaults()
+
+    if caption:
+        st.caption(caption)
+    return selected_model
+
+
+# Group models by type
+models_by_type = {
+    "language": [],
+    "embedding": [],
+    "text_to_speech": [],
+    "speech_to_text": [],
 }
+
 for model in all_models:
-    model_types_available[model.type] = True
-    with st.container(border=True):
-        st.markdown(f"{model.name} ({model.provider}, {model.type})")
-        if st.button("Delete", key=f"delete_{model.id}"):
-            model.delete()
-            st.rerun()
-
-for model_type, available in model_types_available.items():
-    if not available:
-        st.warning(f"No models available for {model_type}")
+    if model.type in models_by_type:
+        models_by_type[model.type].append(model)


-st.divider()
+st.markdown("""
+**Model Management Guide:** For optimal performance, refer to [Which model to choose?](https://github.com/lfnovo/open-notebook/blob/main/docs/models.md) 
+You can test models in the [Transformations](https://try-it-out.open-notebook.com) page.
+""")

-st.subheader("Select Default Models")
-text_generation_models = [model for model in all_models if model.type == "language"]
+# Language Models Section
+st.subheader("🗣️ Language Models")
+with st.container(border=True):
+    col1, col2 = st.columns([2, 1])

-text_to_speech_models = [
-    model for model in all_models if model.type == "text_to_speech"
-]
+    with col1:
+        st.markdown("**Configured Models**")
+        language_models = models_by_type["language"]
+        if language_models:
+            for model in language_models:
+                subcol1, subcol2 = st.columns([4, 1])
+                with subcol1:
+                    st.markdown(f"• {model.provider}/{model.name}")
+                with subcol2:
+                    if st.button(
+                        "🗑️", key=f"delete_lang_{model.id}", help="Delete model"
+                    ):
+                        model.delete()
+                        st.rerun()
+        else:
+            st.info("No language models configured")

-speech_to_text_models = [
-    model for model in all_models if model.type == "speech_to_text"
-]
-vision_models = [model for model in all_models if model.type == "vision"]
-embedding_models = [model for model in all_models if model.type == "embedding"]
-st.write(
-    "In this section, you can select the default models to be used on the various content operations done by Open Notebook. Some of these can be overriden in the different modules."
-)
-defs = {}
-# Handle chat model selection
-selected_model = model_selector(
-    "Default Chat Model",
-    "default_chat_model",
-    selected_id=default_models.default_chat_model,
-    help="This model will be used for chat.",
-    model_type="language",
-)
-if selected_model:
-    default_models.default_chat_model = selected_model.id
-st.divider()
-# Handle transformation model selection
-selected_model = model_selector(
-    "Default Transformation Model",
-    "default_transformation_model",
-    selected_id=default_models.default_transformation_model,
-    help="This model will be used for text transformations such as summaries, insights, etc.",
-    model_type="language",
-)
-if selected_model:
-    default_models.default_transformation_model = selected_model.id
-st.caption("You can use a cheap model here like gpt-4o-mini, llama3, etc.")
-st.divider()
+    with col2:
+        add_model_form("language", "main")

-# Handle tools model selection
-selected_model = model_selector(
-    "Default Tools Model",
-    "default_tools_model",
-    selected_id=default_models.default_tools_model,
-    help="This model will be used for calling tools. Currently, it's best to use Open AI and Anthropic for this.",
-    model_type="language",
-)
-if selected_model:
-    default_models.default_tools_model = selected_model.id
-st.caption("Recommended to use a capable model here, like gpt-4o, claude, etc.")
-st.divider()
+    st.markdown("**Default Model Assignments**")
+    col1, col2 = st.columns(2)

-# Handle large context model selection
-selected_model = model_selector(
-    "Large Context Model",
-    "large_context_model",
-    selected_id=default_models.large_context_model,
-    help="This model will be used for larger context generation -- recommended: Gemini",
-    model_type="language",
-)
-if selected_model:
-    default_models.large_context_model = selected_model.id
-st.caption("Recommended to use Gemini models for larger context processing")
-st.divider()
+    with col1:
+        handle_default_selection(
+            "Chat Model",
+            "default_chat_model",
+            default_models.default_chat_model,
+            "Used for chat conversations",
+            "language",
+            "Pick the one that vibes with you.",
+        )

-# Handle text-to-speech model selection
-selected_model = model_selector(
-    "Default Text to Speech Model",
-    "default_text_to_speech_model",
-    selected_id=default_models.default_text_to_speech_model,
-    help="This is the default model for converting text to speech (podcasts, etc)",
-    model_type="text_to_speech",
-)
-st.caption("You can override this model on different podcasts")
-if selected_model:
-    default_models.default_text_to_speech_model = selected_model.id
-st.divider()
+        handle_default_selection(
+            "Tools Model",
+            "default_tools_model",
+            default_models.default_tools_model,
+            "Used for calling tools - use OpenAI or Anthropic",
+            "language",
+            "Recommended: gpt-4o, claude, qwen3, etc.",
+        )

-# Handle speech-to-text model selection
-selected_model = model_selector(
-    "Default Speech to Text Model",
-    selected_id=default_models.default_speech_to_text_model,
-    help="This is the default model for converting speech to text (audio transcriptions, etc)",
-    model_type="speech_to_text",
-    key="default_speech_to_text_model",
-)
+    with col2:
+        handle_default_selection(
+            "Transformation Model",
+            "default_transformation_model",
+            default_models.default_transformation_model,
+            "Used for summaries, insights, etc.",
+            "language",
+            "Can use cheaper models: gpt-4o-mini, llama3, gemma3, etc.",
+        )

-if selected_model:
-    default_models.default_speech_to_text_model = selected_model.id
+        handle_default_selection(
+            "Large Context Model",
+            "large_context_model",
+            default_models.large_context_model,
+            "Used for large context processing",
+            "language",
+            "Recommended: Gemini models",
+        )

-st.divider()
-# Handle embedding model selection
-selected_model = model_selector(
-    "Default Embedding Model",
-    "default_embedding_model",
-    selected_id=default_models.default_embedding_model,
-    help="This is the default model for embeddings (semantic search, etc)",
-    model_type="embedding",
-)
-if selected_model:
-    default_models.default_embedding_model = selected_model.id
-st.warning(
-    "Caution: you cannot change the embedding model once there is embeddings or they will need to be regenerated"
-)
+# Embedding Models Section
+st.subheader("🔍 Embedding Models")
+with st.container(border=True):
+    col1, col2 = st.columns([2, 1])

-for k, v in defs.items():
-    if v:
-        defs[k] = v.id
+    with col1:
+        st.markdown("**Configured Models**")
+        embedding_models = models_by_type["embedding"]
+        if embedding_models:
+            for model in embedding_models:
+                subcol1, subcol2 = st.columns([4, 1])
+                with subcol1:
+                    st.markdown(f"• {model.provider}/{model.name}")
+                with subcol2:
+                    if st.button(
+                        "🗑️", key=f"delete_emb_{model.id}", help="Delete model"
+                    ):
+                        model.delete()
+                        st.rerun()
+        else:
+            st.info("No embedding models configured")

-if st.button("Save Defaults"):
-    default_models.patch(defs)
-    model_manager.refresh_defaults()
-    st.success("Saved")
+        handle_default_selection(
+            "Default Embedding Model",
+            "default_embedding_model",
+            default_models.default_embedding_model,
+            "Used for semantic search and embeddings",
+            "embedding",
+        )
+        st.warning("⚠️ Changing embedding models requires regenerating all embeddings")
+
+    with col2:
+        add_model_form("embedding", "main")
+
+# Text-to-Speech Models Section
+st.subheader("🎙️ Text-to-Speech Models")
+with st.container(border=True):
+    col1, col2 = st.columns([2, 1])
+
+    with col1:
+        st.markdown("**Configured Models**")
+        tts_models = models_by_type["text_to_speech"]
+        if tts_models:
+            for model in tts_models:
+                subcol1, subcol2 = st.columns([4, 1])
+                with subcol1:
+                    st.markdown(f"• {model.provider}/{model.name}")
+                with subcol2:
+                    if st.button(
+                        "🗑️", key=f"delete_tts_{model.id}", help="Delete model"
+                    ):
+                        model.delete()
+                        st.rerun()
+        else:
+            st.info("No text-to-speech models configured")
+
+        handle_default_selection(
+            "Default TTS Model",
+            "default_text_to_speech_model",
+            default_models.default_text_to_speech_model,
+            "Used for podcasts and audio generation",
+            "text_to_speech",
+            "Can be overridden per podcast",
+        )
+
+    with col2:
+        add_model_form("text_to_speech", "main")
+
+# Speech-to-Text Models Section
+st.subheader("🎤 Speech-to-Text Models")
+with st.container(border=True):
+    col1, col2 = st.columns([2, 1])
+
+    with col1:
+        st.markdown("**Configured Models**")
+        stt_models = models_by_type["speech_to_text"]
+        if stt_models:
+            for model in stt_models:
+                subcol1, subcol2 = st.columns([4, 1])
+                with subcol1:
+                    st.markdown(f"• {model.provider}/{model.name}")
+                with subcol2:
+                    if st.button(
+                        "🗑️", key=f"delete_stt_{model.id}", help="Delete model"
+                    ):
+                        model.delete()
+                        st.rerun()
+        else:
+            st.info("No speech-to-text models configured")
+
+        handle_default_selection(
+            "Default STT Model",
+            "default_speech_to_text_model",
+            default_models.default_speech_to_text_model,
+            "Used for audio transcriptions",
+            "speech_to_text",
+        )
+
+    with col2:
+        add_model_form("speech_to_text", "main")