# クイックスタート ## 前提条件 Agents SDK の基本的な[クイックスタート手順](../quickstart.md)に従い、仮想環境を設定してください。その後、SDK からオプションの音声依存関係をインストールします。 ```bash pip install 'openai-agents[voice]' ``` ## コンセプト 知っておくべき主なコンセプトは、[`VoicePipeline`][agents.voice.pipeline.VoicePipeline] です。これは3ステップのプロセスです。 1. 音声をテキストに変換する音声認識モデルを実行します。 2. 通常はエージェントワークフローであるコードを実行して、結果を生成します。 3. 結果のテキストを音声に戻すテキスト読み上げモデルを実行します。 ```mermaid graph LR %% Input A["🎤 Audio Input"] %% Voice Pipeline subgraph Voice_Pipeline [Voice Pipeline] direction TB B["Transcribe (speech-to-text)"] C["Your Code"]:::highlight D["Text-to-speech"] B --> C --> D end %% Output E["🎧 Audio Output"] %% Flow A --> Voice_Pipeline Voice_Pipeline --> E %% Custom styling classDef highlight fill:#ffcc66,stroke:#333,stroke-width:1px,font-weight:700; ``` ## エージェント まず、いくつかのエージェントを設定しましょう。この SDK でエージェントを構築したことがある場合、これは馴染みがあるはずです。エージェント、ハンドオフ、ツールを用意します。 ```python import asyncio import random from agents import ( Agent, function_tool, ) from agents.extensions.handoff_prompt import prompt_with_handoff_instructions @function_tool def get_weather(city: str) -> str: """Get the weather for a given city.""" print(f"[debug] get_weather called with city: {city}") choices = ["sunny", "cloudy", "rainy", "snowy"] return f"The weather in {city} is {random.choice(choices)}." spanish_agent = Agent( name="Spanish", handoff_description="A spanish speaking agent.", instructions=prompt_with_handoff_instructions( "You're speaking to a human, so be polite and concise. Speak in Spanish.", ), model="gpt-4o-mini", ) agent = Agent( name="Assistant", instructions=prompt_with_handoff_instructions( "You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.", ), model="gpt-4o-mini", handoffs=[spanish_agent], tools=[get_weather], ) ``` ## 音声パイプライン [`SingleAgentVoiceWorkflow`][agents.voice.workflow.SingleAgentVoiceWorkflow] をワークフローとして使用して、シンプルな音声パイプラインを設定します。 ```python from agents.voice import SingleAgentVoiceWorkflow, VoicePipeline pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent)) ``` ## パイプラインを実行する ```python import numpy as np import sounddevice as sd from agents.voice import AudioInput # For simplicity, we'll just create 3 seconds of silence # In reality, you'd get microphone data buffer = np.zeros(24000 * 3, dtype=np.int16) audio_input = AudioInput(buffer=buffer) result = await pipeline.run(audio_input) # Create an audio player using `sounddevice` player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16) player.start() # Play the audio stream as it comes in async for event in result.stream(): if event.type == "voice_stream_event_audio": player.write(event.data) ``` ## すべてをまとめる ```python import asyncio import random import numpy as np import sounddevice as sd from agents import ( Agent, function_tool, set_tracing_disabled, ) from agents.voice import ( AudioInput, SingleAgentVoiceWorkflow, VoicePipeline, ) from agents.extensions.handoff_prompt import prompt_with_handoff_instructions @function_tool def get_weather(city: str) -> str: """Get the weather for a given city.""" print(f"[debug] get_weather called with city: {city}") choices = ["sunny", "cloudy", "rainy", "snowy"] return f"The weather in {city} is {random.choice(choices)}." spanish_agent = Agent( name="Spanish", handoff_description="A spanish speaking agent.", instructions=prompt_with_handoff_instructions( "You're speaking to a human, so be polite and concise. Speak in Spanish.", ), model="gpt-4o-mini", ) agent = Agent( name="Assistant", instructions=prompt_with_handoff_instructions( "You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.", ), model="gpt-4o-mini", handoffs=[spanish_agent], tools=[get_weather], ) async def main(): pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent)) buffer = np.zeros(24000 * 3, dtype=np.int16) audio_input = AudioInput(buffer=buffer) result = await pipeline.run(audio_input) # Create an audio player using `sounddevice` player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16) player.start() # Play the audio stream as it comes in async for event in result.stream(): if event.type == "voice_stream_event_audio": player.write(event.data) if __name__ == "__main__": asyncio.run(main()) ``` この例を実行すると、エージェントがあなたに話しかけます! [examples/voice/static](https://github.com/openai/openai-agents-python/tree/main/examples/voice/static) の例をチェックして、自分でエージェントと話すデモを見てください。