openai-agents-python/docs/ja/voice/quickstart.md
Kazuhiro Sera 360f173b73
Evolve the doc translation workflow by using gpt-4.1 (#507)
This pull request enhances the document translation workflow by
switching to the new GPT-4.1 model. The generator script’s prompt now
includes a “workflow” section that guides the model to iterate
self-reviews on its outputs to autonomously achieve the highest quality.
This addition has noticeably improved the naturalness and consistency of
the wording in the translated outputs.
2025-04-14 22:04:07 -04:00

194 lines
No EOL
5.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# クイックスタート
## 前提条件
Agents SDK の[クイックスタート手順](../quickstart.md)に従い、仮想環境をセットアップしてください。その後、SDK からオプションの音声依存関係をインストールします。
```bash
pip install 'openai-agents[voice]'
```
## コンセプト
主なコンセプトは [`VoicePipeline`][agents.voice.pipeline.VoicePipeline] です。これは 3 ステップのプロセスです:
1. 音声をテキストに変換する音声認識モデルspeech-to-textを実行します。
2. 通常はエージェント的なワークフローであるあなたのコードを実行し、結果を生成します。
3. 結果のテキストを音声に戻す音声合成モデルtext-to-speechを実行します。
```mermaid
graph LR
%% Input
A["🎤 Audio Input"]
%% Voice Pipeline
subgraph Voice_Pipeline [Voice Pipeline]
direction TB
B["Transcribe (speech-to-text)"]
C["Your Code"]:::highlight
D["Text-to-speech"]
B --> C --> D
end
%% Output
E["🎧 Audio Output"]
%% Flow
A --> Voice_Pipeline
Voice_Pipeline --> E
%% Custom styling
classDef highlight fill:#ffcc66,stroke:#333,stroke-width:1px,font-weight:700;
```
## エージェント
まず、いくつかのエージェントをセットアップしましょう。この SDK でエージェントを作成したことがあれば、馴染みがあるはずです。ここでは、複数のエージェント、ハンドオフ、ツールを用意します。
```python
import asyncio
import random
from agents import (
Agent,
function_tool,
)
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
@function_tool
def get_weather(city: str) -> str:
"""Get the weather for a given city."""
print(f"[debug] get_weather called with city: {city}")
choices = ["sunny", "cloudy", "rainy", "snowy"]
return f"The weather in {city} is {random.choice(choices)}."
spanish_agent = Agent(
name="Spanish",
handoff_description="A spanish speaking agent.",
instructions=prompt_with_handoff_instructions(
"You're speaking to a human, so be polite and concise. Speak in Spanish.",
),
model="gpt-4o-mini",
)
agent = Agent(
name="Assistant",
instructions=prompt_with_handoff_instructions(
"You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",
),
model="gpt-4o-mini",
handoffs=[spanish_agent],
tools=[get_weather],
)
```
## 音声パイプライン
[`SingleAgentVoiceWorkflow`][agents.voice.workflow.SingleAgentVoiceWorkflow] をワークフローとして使い、シンプルな音声パイプラインをセットアップします。
```python
from agents.voice import SingleAgentVoiceWorkflow, VoicePipeline
pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))
```
## パイプラインの実行
```python
import numpy as np
import sounddevice as sd
from agents.voice import AudioInput
# For simplicity, we'll just create 3 seconds of silence
# In reality, you'd get microphone data
buffer = np.zeros(24000 * 3, dtype=np.int16)
audio_input = AudioInput(buffer=buffer)
result = await pipeline.run(audio_input)
# Create an audio player using `sounddevice`
player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
player.start()
# Play the audio stream as it comes in
async for event in result.stream():
if event.type == "voice_stream_event_audio":
player.write(event.data)
```
## すべてをまとめる
```python
import asyncio
import random
import numpy as np
import sounddevice as sd
from agents import (
Agent,
function_tool,
set_tracing_disabled,
)
from agents.voice import (
AudioInput,
SingleAgentVoiceWorkflow,
VoicePipeline,
)
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
@function_tool
def get_weather(city: str) -> str:
"""Get the weather for a given city."""
print(f"[debug] get_weather called with city: {city}")
choices = ["sunny", "cloudy", "rainy", "snowy"]
return f"The weather in {city} is {random.choice(choices)}."
spanish_agent = Agent(
name="Spanish",
handoff_description="A spanish speaking agent.",
instructions=prompt_with_handoff_instructions(
"You're speaking to a human, so be polite and concise. Speak in Spanish.",
),
model="gpt-4o-mini",
)
agent = Agent(
name="Assistant",
instructions=prompt_with_handoff_instructions(
"You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",
),
model="gpt-4o-mini",
handoffs=[spanish_agent],
tools=[get_weather],
)
async def main():
pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))
buffer = np.zeros(24000 * 3, dtype=np.int16)
audio_input = AudioInput(buffer=buffer)
result = await pipeline.run(audio_input)
# Create an audio player using `sounddevice`
player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
player.start()
# Play the audio stream as it comes in
async for event in result.stream():
if event.type == "voice_stream_event_audio":
player.write(event.data)
if __name__ == "__main__":
asyncio.run(main())
```
この例を実行すると、エージェントがあなたに話しかけます![examples/voice/static](https://github.com/openai/openai-agents-python/tree/main/examples/voice/static) のコード例をチェックして、実際にエージェントと会話できるデモをご覧ください。