概要

OpenAI APIのリクエスト制限（RPM/TPM）に達した際のエラーと対処法です。

エラーメッセージ

```json { “error”: { “message”: “Rate limit reached for gpt-4 in organization xxx on requests per min (RPM): Limit 200, Used 200, Requested 1.”, “type”: “rate_limit_error”, “code”: “rate_limit_exceeded” } } ```

原因

RPM超過: 1分あたりのリクエスト数制限
TPM超過: 1分あたりのトークン数制限
同時リクエスト過多: 並列処理が多すぎる
Tier制限: アカウントのティアによる制限

解決策

1. Exponential Backoffを実装

```python import openai import time from tenacity import retry, wait_exponential, stop_after_attempt

@retry(wait=wait_exponential(min=1, max=60), stop=stop_after_attempt(6)) def call_openai(prompt): return openai.ChatCompletion.create( model=“gpt-4”, messages=[{“role”: “user”, “content”: prompt}] ) ```

2. レートリミッターを使用

```python from ratelimit import limits, sleep_and_retry

1分あたり60リクエストに制限

@sleep_and_retry @limits(calls=60, period=60) def call_openai_limited(prompt): return openai.ChatCompletion.create(…) ```

3. バッチ処理でトークンを最適化

```python

悪い例: 1件ずつリクエスト

for item in items: response = openai.ChatCompletion.create( messages=[{“role”: “user”, “content”: f"Process: {item}"}] )

良い例: バッチでリクエスト

batch_prompt = “Process the following items:\n” + “\n”.join(items) response = openai.ChatCompletion.create( messages=[{“role”: “user”, “content”: batch_prompt}] ) ```

4. キャッシュを活用

```python import hashlib from functools import lru_cache

@lru_cache(maxsize=1000) def cached_openai_call(prompt_hash): return openai.ChatCompletion.create(…)

def call_with_cache(prompt): prompt_hash = hashlib.md5(prompt.encode()).hexdigest() return cached_openai_call(prompt_hash) ```

5. 使用量を監視

```python response = openai.ChatCompletion.create(…)

レスポンスヘッダーで残り制限を確認

print(f"Remaining requests: {response.headers.get(‘x-ratelimit-remaining-requests’)}") print(f"Remaining tokens: {response.headers.get(‘x-ratelimit-remaining-tokens’)}") ```

よくある間違い

リトライ間隔を固定値にする
全リクエストを同時に送信
開発中に本番キーを使用

OpenAI API: Rate limit reached

概要