openai-agents-python/docs/ja/voice/quickstart.md
Kazuhiro Sera 5639606163
Docs: Switch to o3 model; exclude translated pages from search (#533)
This pull request introduces the following changes:
1. **Exclude translated pages from search**: I explored ways to make the
search plugin work with the i18n plugin, but it would require extensive
custom JavaScript hacks. So for now, I’m holding off on this work.
2. **Switch from GPT-4.1 to o3 for even better translation quality**:
While 4.1 performs well, o3 shows even greater quality for this task,
and there’s no reason to avoid using it.
2025-04-16 21:29:09 -04:00

198 lines
No EOL
5.6 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
search:
exclude: true
---
# クイックスタート
## 前提条件
まずは [クイックスタート手順](../quickstart.md) に従って Agents SDK をセットアップし、仮想環境を作成してください。その後、SDK の音声関連のオプション依存関係をインストールします:
```bash
pip install 'openai-agents[voice]'
```
## コンセプト
押さえておくべき主な概念は [`VoicePipeline`][agents.voice.pipeline.VoicePipeline] です。これは次の 3 ステップから成るプロセスです。
1. speech-to-text モデルを実行して音声をテキストに変換します。
2. 通常はエージェント的ワークフローであるあなたのコードを実行し、結果を生成します。
3. text-to-speech モデルを実行して結果のテキストを再び音声に変換します。
```mermaid
graph LR
%% Input
A["🎤 Audio Input"]
%% Voice Pipeline
subgraph Voice_Pipeline [Voice Pipeline]
direction TB
B["Transcribe (speech-to-text)"]
C["Your Code"]:::highlight
D["Text-to-speech"]
B --> C --> D
end
%% Output
E["🎧 Audio Output"]
%% Flow
A --> Voice_Pipeline
Voice_Pipeline --> E
%% Custom styling
classDef highlight fill:#ffcc66,stroke:#333,stroke-width:1px,font-weight:700;
```
## エージェント
まず、いくつかの エージェント をセットアップしましょう。この SDK でエージェントを構築したことがあれば、見覚えがあるはずです。ここでは複数の エージェント、ハンドオフ、そしてツールを用意します。
```python
import asyncio
import random
from agents import (
Agent,
function_tool,
)
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
@function_tool
def get_weather(city: str) -> str:
"""Get the weather for a given city."""
print(f"[debug] get_weather called with city: {city}")
choices = ["sunny", "cloudy", "rainy", "snowy"]
return f"The weather in {city} is {random.choice(choices)}."
spanish_agent = Agent(
name="Spanish",
handoff_description="A spanish speaking agent.",
instructions=prompt_with_handoff_instructions(
"You're speaking to a human, so be polite and concise. Speak in Spanish.",
),
model="gpt-4o-mini",
)
agent = Agent(
name="Assistant",
instructions=prompt_with_handoff_instructions(
"You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",
),
model="gpt-4o-mini",
handoffs=[spanish_agent],
tools=[get_weather],
)
```
## 音声パイプライン
[`SingleAgentVoiceWorkflow`][agents.voice.workflow.SingleAgentVoiceWorkflow] をワークフローとして、シンプルな音声パイプラインを構築します。
```python
from agents.voice import SingleAgentVoiceWorkflow, VoicePipeline
pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))
```
## パイプラインの実行
```python
import numpy as np
import sounddevice as sd
from agents.voice import AudioInput
# For simplicity, we'll just create 3 seconds of silence
# In reality, you'd get microphone data
buffer = np.zeros(24000 * 3, dtype=np.int16)
audio_input = AudioInput(buffer=buffer)
result = await pipeline.run(audio_input)
# Create an audio player using `sounddevice`
player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
player.start()
# Play the audio stream as it comes in
async for event in result.stream():
if event.type == "voice_stream_event_audio":
player.write(event.data)
```
## まとめて実行
```python
import asyncio
import random
import numpy as np
import sounddevice as sd
from agents import (
Agent,
function_tool,
set_tracing_disabled,
)
from agents.voice import (
AudioInput,
SingleAgentVoiceWorkflow,
VoicePipeline,
)
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
@function_tool
def get_weather(city: str) -> str:
"""Get the weather for a given city."""
print(f"[debug] get_weather called with city: {city}")
choices = ["sunny", "cloudy", "rainy", "snowy"]
return f"The weather in {city} is {random.choice(choices)}."
spanish_agent = Agent(
name="Spanish",
handoff_description="A spanish speaking agent.",
instructions=prompt_with_handoff_instructions(
"You're speaking to a human, so be polite and concise. Speak in Spanish.",
),
model="gpt-4o-mini",
)
agent = Agent(
name="Assistant",
instructions=prompt_with_handoff_instructions(
"You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",
),
model="gpt-4o-mini",
handoffs=[spanish_agent],
tools=[get_weather],
)
async def main():
pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))
buffer = np.zeros(24000 * 3, dtype=np.int16)
audio_input = AudioInput(buffer=buffer)
result = await pipeline.run(audio_input)
# Create an audio player using `sounddevice`
player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
player.start()
# Play the audio stream as it comes in
async for event in result.stream():
if event.type == "voice_stream_event_audio":
player.write(event.data)
if __name__ == "__main__":
asyncio.run(main())
```
この例を実行すると、エージェントがあなたに話しかけます。実際にエージェントと会話できるデモは、[examples/voice/static](https://github.com/openai/openai-agents-python/tree/main/examples/voice/static) をご覧ください。