This pull request enhances the document translation workflow by switching to the new GPT-4.1 model. The generator script’s prompt now includes a “workflow” section that guides the model to iterate self-reviews on its outputs to autonomously achieve the highest quality. This addition has noticeably improved the naturalness and consistency of the wording in the translated outputs.
194 lines
No EOL
5.6 KiB
Markdown
194 lines
No EOL
5.6 KiB
Markdown
# クイックスタート
|
||
|
||
## 前提条件
|
||
|
||
Agents SDK の[クイックスタート手順](../quickstart.md)に従い、仮想環境をセットアップしてください。その後、SDK からオプションの音声依存関係をインストールします。
|
||
|
||
```bash
|
||
pip install 'openai-agents[voice]'
|
||
```
|
||
|
||
## コンセプト
|
||
|
||
主なコンセプトは [`VoicePipeline`][agents.voice.pipeline.VoicePipeline] です。これは 3 ステップのプロセスです:
|
||
|
||
1. 音声をテキストに変換する音声認識モデル(speech-to-text)を実行します。
|
||
2. 通常はエージェント的なワークフローであるあなたのコードを実行し、結果を生成します。
|
||
3. 結果のテキストを音声に戻す音声合成モデル(text-to-speech)を実行します。
|
||
|
||
```mermaid
|
||
graph LR
|
||
%% Input
|
||
A["🎤 Audio Input"]
|
||
|
||
%% Voice Pipeline
|
||
subgraph Voice_Pipeline [Voice Pipeline]
|
||
direction TB
|
||
B["Transcribe (speech-to-text)"]
|
||
C["Your Code"]:::highlight
|
||
D["Text-to-speech"]
|
||
B --> C --> D
|
||
end
|
||
|
||
%% Output
|
||
E["🎧 Audio Output"]
|
||
|
||
%% Flow
|
||
A --> Voice_Pipeline
|
||
Voice_Pipeline --> E
|
||
|
||
%% Custom styling
|
||
classDef highlight fill:#ffcc66,stroke:#333,stroke-width:1px,font-weight:700;
|
||
|
||
```
|
||
|
||
## エージェント
|
||
|
||
まず、いくつかのエージェントをセットアップしましょう。この SDK でエージェントを作成したことがあれば、馴染みがあるはずです。ここでは、複数のエージェント、ハンドオフ、ツールを用意します。
|
||
|
||
```python
|
||
import asyncio
|
||
import random
|
||
|
||
from agents import (
|
||
Agent,
|
||
function_tool,
|
||
)
|
||
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
|
||
|
||
|
||
|
||
@function_tool
|
||
def get_weather(city: str) -> str:
|
||
"""Get the weather for a given city."""
|
||
print(f"[debug] get_weather called with city: {city}")
|
||
choices = ["sunny", "cloudy", "rainy", "snowy"]
|
||
return f"The weather in {city} is {random.choice(choices)}."
|
||
|
||
|
||
spanish_agent = Agent(
|
||
name="Spanish",
|
||
handoff_description="A spanish speaking agent.",
|
||
instructions=prompt_with_handoff_instructions(
|
||
"You're speaking to a human, so be polite and concise. Speak in Spanish.",
|
||
),
|
||
model="gpt-4o-mini",
|
||
)
|
||
|
||
agent = Agent(
|
||
name="Assistant",
|
||
instructions=prompt_with_handoff_instructions(
|
||
"You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",
|
||
),
|
||
model="gpt-4o-mini",
|
||
handoffs=[spanish_agent],
|
||
tools=[get_weather],
|
||
)
|
||
```
|
||
|
||
## 音声パイプライン
|
||
|
||
[`SingleAgentVoiceWorkflow`][agents.voice.workflow.SingleAgentVoiceWorkflow] をワークフローとして使い、シンプルな音声パイプラインをセットアップします。
|
||
|
||
```python
|
||
from agents.voice import SingleAgentVoiceWorkflow, VoicePipeline
|
||
pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))
|
||
```
|
||
|
||
## パイプラインの実行
|
||
|
||
```python
|
||
import numpy as np
|
||
import sounddevice as sd
|
||
from agents.voice import AudioInput
|
||
|
||
# For simplicity, we'll just create 3 seconds of silence
|
||
# In reality, you'd get microphone data
|
||
buffer = np.zeros(24000 * 3, dtype=np.int16)
|
||
audio_input = AudioInput(buffer=buffer)
|
||
|
||
result = await pipeline.run(audio_input)
|
||
|
||
# Create an audio player using `sounddevice`
|
||
player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
|
||
player.start()
|
||
|
||
# Play the audio stream as it comes in
|
||
async for event in result.stream():
|
||
if event.type == "voice_stream_event_audio":
|
||
player.write(event.data)
|
||
|
||
```
|
||
|
||
## すべてをまとめる
|
||
|
||
```python
|
||
import asyncio
|
||
import random
|
||
|
||
import numpy as np
|
||
import sounddevice as sd
|
||
|
||
from agents import (
|
||
Agent,
|
||
function_tool,
|
||
set_tracing_disabled,
|
||
)
|
||
from agents.voice import (
|
||
AudioInput,
|
||
SingleAgentVoiceWorkflow,
|
||
VoicePipeline,
|
||
)
|
||
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
|
||
|
||
|
||
@function_tool
|
||
def get_weather(city: str) -> str:
|
||
"""Get the weather for a given city."""
|
||
print(f"[debug] get_weather called with city: {city}")
|
||
choices = ["sunny", "cloudy", "rainy", "snowy"]
|
||
return f"The weather in {city} is {random.choice(choices)}."
|
||
|
||
|
||
spanish_agent = Agent(
|
||
name="Spanish",
|
||
handoff_description="A spanish speaking agent.",
|
||
instructions=prompt_with_handoff_instructions(
|
||
"You're speaking to a human, so be polite and concise. Speak in Spanish.",
|
||
),
|
||
model="gpt-4o-mini",
|
||
)
|
||
|
||
agent = Agent(
|
||
name="Assistant",
|
||
instructions=prompt_with_handoff_instructions(
|
||
"You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",
|
||
),
|
||
model="gpt-4o-mini",
|
||
handoffs=[spanish_agent],
|
||
tools=[get_weather],
|
||
)
|
||
|
||
|
||
async def main():
|
||
pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))
|
||
buffer = np.zeros(24000 * 3, dtype=np.int16)
|
||
audio_input = AudioInput(buffer=buffer)
|
||
|
||
result = await pipeline.run(audio_input)
|
||
|
||
# Create an audio player using `sounddevice`
|
||
player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
|
||
player.start()
|
||
|
||
# Play the audio stream as it comes in
|
||
async for event in result.stream():
|
||
if event.type == "voice_stream_event_audio":
|
||
player.write(event.data)
|
||
|
||
|
||
if __name__ == "__main__":
|
||
asyncio.run(main())
|
||
```
|
||
|
||
この例を実行すると、エージェントがあなたに話しかけます のコード例をチェックして、実際にエージェントと会話できるデモをご覧ください。 |