This pull request introduces the following changes: 1. **Exclude translated pages from search**: I explored ways to make the search plugin work with the i18n plugin, but it would require extensive custom JavaScript hacks. So for now, I’m holding off on this work. 2. **Switch from GPT-4.1 to o3 for even better translation quality**: While 4.1 performs well, o3 shows even greater quality for this task, and there’s no reason to avoid using it.
198 lines
No EOL
5.6 KiB
Markdown
198 lines
No EOL
5.6 KiB
Markdown
---
|
||
search:
|
||
exclude: true
|
||
---
|
||
# クイックスタート
|
||
|
||
## 前提条件
|
||
|
||
まずは [クイックスタート手順](../quickstart.md) に従って Agents SDK をセットアップし、仮想環境を作成してください。その後、SDK の音声関連のオプション依存関係をインストールします:
|
||
|
||
```bash
|
||
pip install 'openai-agents[voice]'
|
||
```
|
||
|
||
## コンセプト
|
||
|
||
押さえておくべき主な概念は [`VoicePipeline`][agents.voice.pipeline.VoicePipeline] です。これは次の 3 ステップから成るプロセスです。
|
||
|
||
1. speech-to-text モデルを実行して音声をテキストに変換します。
|
||
2. 通常はエージェント的ワークフローであるあなたのコードを実行し、結果を生成します。
|
||
3. text-to-speech モデルを実行して結果のテキストを再び音声に変換します。
|
||
|
||
```mermaid
|
||
graph LR
|
||
%% Input
|
||
A["🎤 Audio Input"]
|
||
|
||
%% Voice Pipeline
|
||
subgraph Voice_Pipeline [Voice Pipeline]
|
||
direction TB
|
||
B["Transcribe (speech-to-text)"]
|
||
C["Your Code"]:::highlight
|
||
D["Text-to-speech"]
|
||
B --> C --> D
|
||
end
|
||
|
||
%% Output
|
||
E["🎧 Audio Output"]
|
||
|
||
%% Flow
|
||
A --> Voice_Pipeline
|
||
Voice_Pipeline --> E
|
||
|
||
%% Custom styling
|
||
classDef highlight fill:#ffcc66,stroke:#333,stroke-width:1px,font-weight:700;
|
||
|
||
```
|
||
|
||
## エージェント
|
||
|
||
まず、いくつかの エージェント をセットアップしましょう。この SDK でエージェントを構築したことがあれば、見覚えがあるはずです。ここでは複数の エージェント、ハンドオフ、そしてツールを用意します。
|
||
|
||
```python
|
||
import asyncio
|
||
import random
|
||
|
||
from agents import (
|
||
Agent,
|
||
function_tool,
|
||
)
|
||
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
|
||
|
||
|
||
|
||
@function_tool
|
||
def get_weather(city: str) -> str:
|
||
"""Get the weather for a given city."""
|
||
print(f"[debug] get_weather called with city: {city}")
|
||
choices = ["sunny", "cloudy", "rainy", "snowy"]
|
||
return f"The weather in {city} is {random.choice(choices)}."
|
||
|
||
|
||
spanish_agent = Agent(
|
||
name="Spanish",
|
||
handoff_description="A spanish speaking agent.",
|
||
instructions=prompt_with_handoff_instructions(
|
||
"You're speaking to a human, so be polite and concise. Speak in Spanish.",
|
||
),
|
||
model="gpt-4o-mini",
|
||
)
|
||
|
||
agent = Agent(
|
||
name="Assistant",
|
||
instructions=prompt_with_handoff_instructions(
|
||
"You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",
|
||
),
|
||
model="gpt-4o-mini",
|
||
handoffs=[spanish_agent],
|
||
tools=[get_weather],
|
||
)
|
||
```
|
||
|
||
## 音声パイプライン
|
||
|
||
[`SingleAgentVoiceWorkflow`][agents.voice.workflow.SingleAgentVoiceWorkflow] をワークフローとして、シンプルな音声パイプラインを構築します。
|
||
|
||
```python
|
||
from agents.voice import SingleAgentVoiceWorkflow, VoicePipeline
|
||
pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))
|
||
```
|
||
|
||
## パイプラインの実行
|
||
|
||
```python
|
||
import numpy as np
|
||
import sounddevice as sd
|
||
from agents.voice import AudioInput
|
||
|
||
# For simplicity, we'll just create 3 seconds of silence
|
||
# In reality, you'd get microphone data
|
||
buffer = np.zeros(24000 * 3, dtype=np.int16)
|
||
audio_input = AudioInput(buffer=buffer)
|
||
|
||
result = await pipeline.run(audio_input)
|
||
|
||
# Create an audio player using `sounddevice`
|
||
player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
|
||
player.start()
|
||
|
||
# Play the audio stream as it comes in
|
||
async for event in result.stream():
|
||
if event.type == "voice_stream_event_audio":
|
||
player.write(event.data)
|
||
|
||
```
|
||
|
||
## まとめて実行
|
||
|
||
```python
|
||
import asyncio
|
||
import random
|
||
|
||
import numpy as np
|
||
import sounddevice as sd
|
||
|
||
from agents import (
|
||
Agent,
|
||
function_tool,
|
||
set_tracing_disabled,
|
||
)
|
||
from agents.voice import (
|
||
AudioInput,
|
||
SingleAgentVoiceWorkflow,
|
||
VoicePipeline,
|
||
)
|
||
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
|
||
|
||
|
||
@function_tool
|
||
def get_weather(city: str) -> str:
|
||
"""Get the weather for a given city."""
|
||
print(f"[debug] get_weather called with city: {city}")
|
||
choices = ["sunny", "cloudy", "rainy", "snowy"]
|
||
return f"The weather in {city} is {random.choice(choices)}."
|
||
|
||
|
||
spanish_agent = Agent(
|
||
name="Spanish",
|
||
handoff_description="A spanish speaking agent.",
|
||
instructions=prompt_with_handoff_instructions(
|
||
"You're speaking to a human, so be polite and concise. Speak in Spanish.",
|
||
),
|
||
model="gpt-4o-mini",
|
||
)
|
||
|
||
agent = Agent(
|
||
name="Assistant",
|
||
instructions=prompt_with_handoff_instructions(
|
||
"You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",
|
||
),
|
||
model="gpt-4o-mini",
|
||
handoffs=[spanish_agent],
|
||
tools=[get_weather],
|
||
)
|
||
|
||
|
||
async def main():
|
||
pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))
|
||
buffer = np.zeros(24000 * 3, dtype=np.int16)
|
||
audio_input = AudioInput(buffer=buffer)
|
||
|
||
result = await pipeline.run(audio_input)
|
||
|
||
# Create an audio player using `sounddevice`
|
||
player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
|
||
player.start()
|
||
|
||
# Play the audio stream as it comes in
|
||
async for event in result.stream():
|
||
if event.type == "voice_stream_event_audio":
|
||
player.write(event.data)
|
||
|
||
|
||
if __name__ == "__main__":
|
||
asyncio.run(main())
|
||
```
|
||
|
||
この例を実行すると、エージェントがあなたに話しかけます。実際にエージェントと会話できるデモは、[examples/voice/static](https://github.com/openai/openai-agents-python/tree/main/examples/voice/static) をご覧ください。 |