This pull request improves the translation pipeline, which was introduced by #460. Now the document generation works pretty well with gpt-4o model.
194 lines
No EOL
5.6 KiB
Markdown
194 lines
No EOL
5.6 KiB
Markdown
# クイックスタート
|
|
|
|
## 前提条件
|
|
|
|
Agents SDK の基本的な[クイックスタート手順](../quickstart.md)に従い、仮想環境を設定してください。その後、SDK からオプションの音声依存関係をインストールします。
|
|
|
|
```bash
|
|
pip install 'openai-agents[voice]'
|
|
```
|
|
|
|
## コンセプト
|
|
|
|
知っておくべき主なコンセプトは、[`VoicePipeline`][agents.voice.pipeline.VoicePipeline] です。これは3ステップのプロセスです。
|
|
|
|
1. 音声をテキストに変換する音声認識モデルを実行します。
|
|
2. 通常はエージェントワークフローであるコードを実行して、結果を生成します。
|
|
3. 結果のテキストを音声に戻すテキスト読み上げモデルを実行します。
|
|
|
|
```mermaid
|
|
graph LR
|
|
%% Input
|
|
A["🎤 Audio Input"]
|
|
|
|
%% Voice Pipeline
|
|
subgraph Voice_Pipeline [Voice Pipeline]
|
|
direction TB
|
|
B["Transcribe (speech-to-text)"]
|
|
C["Your Code"]:::highlight
|
|
D["Text-to-speech"]
|
|
B --> C --> D
|
|
end
|
|
|
|
%% Output
|
|
E["🎧 Audio Output"]
|
|
|
|
%% Flow
|
|
A --> Voice_Pipeline
|
|
Voice_Pipeline --> E
|
|
|
|
%% Custom styling
|
|
classDef highlight fill:#ffcc66,stroke:#333,stroke-width:1px,font-weight:700;
|
|
|
|
```
|
|
|
|
## エージェント
|
|
|
|
まず、いくつかのエージェントを設定しましょう。この SDK でエージェントを構築したことがある場合、これは馴染みがあるはずです。エージェント、ハンドオフ、ツールを用意します。
|
|
|
|
```python
|
|
import asyncio
|
|
import random
|
|
|
|
from agents import (
|
|
Agent,
|
|
function_tool,
|
|
)
|
|
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
|
|
|
|
|
|
|
|
@function_tool
|
|
def get_weather(city: str) -> str:
|
|
"""Get the weather for a given city."""
|
|
print(f"[debug] get_weather called with city: {city}")
|
|
choices = ["sunny", "cloudy", "rainy", "snowy"]
|
|
return f"The weather in {city} is {random.choice(choices)}."
|
|
|
|
|
|
spanish_agent = Agent(
|
|
name="Spanish",
|
|
handoff_description="A spanish speaking agent.",
|
|
instructions=prompt_with_handoff_instructions(
|
|
"You're speaking to a human, so be polite and concise. Speak in Spanish.",
|
|
),
|
|
model="gpt-4o-mini",
|
|
)
|
|
|
|
agent = Agent(
|
|
name="Assistant",
|
|
instructions=prompt_with_handoff_instructions(
|
|
"You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",
|
|
),
|
|
model="gpt-4o-mini",
|
|
handoffs=[spanish_agent],
|
|
tools=[get_weather],
|
|
)
|
|
```
|
|
|
|
## 音声パイプライン
|
|
|
|
[`SingleAgentVoiceWorkflow`][agents.voice.workflow.SingleAgentVoiceWorkflow] をワークフローとして使用して、シンプルな音声パイプラインを設定します。
|
|
|
|
```python
|
|
from agents.voice import SingleAgentVoiceWorkflow, VoicePipeline
|
|
pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))
|
|
```
|
|
|
|
## パイプラインを実行する
|
|
|
|
```python
|
|
import numpy as np
|
|
import sounddevice as sd
|
|
from agents.voice import AudioInput
|
|
|
|
# For simplicity, we'll just create 3 seconds of silence
|
|
# In reality, you'd get microphone data
|
|
buffer = np.zeros(24000 * 3, dtype=np.int16)
|
|
audio_input = AudioInput(buffer=buffer)
|
|
|
|
result = await pipeline.run(audio_input)
|
|
|
|
# Create an audio player using `sounddevice`
|
|
player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
|
|
player.start()
|
|
|
|
# Play the audio stream as it comes in
|
|
async for event in result.stream():
|
|
if event.type == "voice_stream_event_audio":
|
|
player.write(event.data)
|
|
|
|
```
|
|
|
|
## すべてをまとめる
|
|
|
|
```python
|
|
import asyncio
|
|
import random
|
|
|
|
import numpy as np
|
|
import sounddevice as sd
|
|
|
|
from agents import (
|
|
Agent,
|
|
function_tool,
|
|
set_tracing_disabled,
|
|
)
|
|
from agents.voice import (
|
|
AudioInput,
|
|
SingleAgentVoiceWorkflow,
|
|
VoicePipeline,
|
|
)
|
|
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
|
|
|
|
|
|
@function_tool
|
|
def get_weather(city: str) -> str:
|
|
"""Get the weather for a given city."""
|
|
print(f"[debug] get_weather called with city: {city}")
|
|
choices = ["sunny", "cloudy", "rainy", "snowy"]
|
|
return f"The weather in {city} is {random.choice(choices)}."
|
|
|
|
|
|
spanish_agent = Agent(
|
|
name="Spanish",
|
|
handoff_description="A spanish speaking agent.",
|
|
instructions=prompt_with_handoff_instructions(
|
|
"You're speaking to a human, so be polite and concise. Speak in Spanish.",
|
|
),
|
|
model="gpt-4o-mini",
|
|
)
|
|
|
|
agent = Agent(
|
|
name="Assistant",
|
|
instructions=prompt_with_handoff_instructions(
|
|
"You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",
|
|
),
|
|
model="gpt-4o-mini",
|
|
handoffs=[spanish_agent],
|
|
tools=[get_weather],
|
|
)
|
|
|
|
|
|
async def main():
|
|
pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))
|
|
buffer = np.zeros(24000 * 3, dtype=np.int16)
|
|
audio_input = AudioInput(buffer=buffer)
|
|
|
|
result = await pipeline.run(audio_input)
|
|
|
|
# Create an audio player using `sounddevice`
|
|
player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
|
|
player.start()
|
|
|
|
# Play the audio stream as it comes in
|
|
async for event in result.stream():
|
|
if event.type == "voice_stream_event_audio":
|
|
player.write(event.data)
|
|
|
|
|
|
if __name__ == "__main__":
|
|
asyncio.run(main())
|
|
```
|
|
|
|
この例を実行すると、エージェントがあなたに話しかけます! [examples/voice/static](https://github.com/openai/openai-agents-python/tree/main/examples/voice/static) の例をチェックして、自分でエージェントと話すデモを見てください。 |