AI API Errors: A Practical Debugging Guide for Developers
API failures in AI work differently. Here's how to debug them properly.
A 200 status code doesn't always mean your AI generation succeeded. A null content field isn't necessarily an error. And a prompt that worked perfectly yesterday might fail today — because a provider quietly updated their content policy.
This guide walks you through reading AI API errors, understanding what each failure mode actually means, and building error handling that tells you what broke — not just that something broke.
Note: Model names like
gpt-5.4andgpt-5.4-miniused here are CometAPI platform identifiers. They work throughhttps://api.cometapi.com/v1only — not directly through OpenAI or Anthropic APIs.
Why AI API Debugging Is Different
With a standard REST API, 200 means success and 4xx means you made a mistake. AI APIs introduce a third category: soft failures — responses that return 200 but contain nothing usable.
AI failures fall into three types:
| Failure Type | What Happens | Example |
|---|---|---|
| Hard failure | HTTP error (4xx, 5xx). Request didn't complete. | 401 Unauthorized |
| Soft failure | HTTP 200, but finish_reason is content_filter or length |
Blocked prompt |
| Silent failure | HTTP 200, everything looks fine — but output is wrong | Wrong classification |
Most error handling only covers the first type. The second and third types are where production bugs hide.
Understanding Error Responses
The text completions endpoint returns a consistent error structure:
{
"error": {
"message": "Human-readable description (includes request ID)",
"type": "comet_api_error",
"param": "the_problematic_parameter",
"code": "error_code"
}
}
What to log: Always log message and param. The message tells you what went wrong. The param tells you which parameter caused it.
Image & video endpoints return different error formats — always parse the raw response body.
HTTP Status Codes: What They Mean
| Status | Meaning | Common Cause | Fix |
|---|---|---|---|
| 400 | Bad request | Missing model or wrong parameter | Check error.param |
| 401 | Unauthorized | Invalid or missing API key | Verify Bearer <key> format |
| 429 | Rate limited | Too many requests | Exponential backoff |
| 500 | Server error | Provider-side issue | Retry with backoff |
| 504 | Gateway timeout | Provider took too long | Retry or use faster model |
Rule of thumb: Retry on 429, 500, and 504. Don't retry on 400 or 401 — the same request will fail again.
The Most Overlooked Field: finish_reason
A 200 response with finish_reason: "content_filter" means your generation was blocked. The content field will be null or empty. If you don't check this, your app will silently return nothing.
finish_reason |
Meaning | Action |
|---|---|---|
stop |
Normal completion | Success |
length |
Hit token limit | Increase max_tokens or shorten prompt |
content_filter |
Blocked by safety policy | Rephrase the prompt |
tool_calls |
Model called a tool | Handle the tool call (content will be null) |
A Robust Text Completion Example (Python)
Here's a production-ready function that handles all three failure types:
import os
import logging
from openai import OpenAI, APIStatusError, APIConnectionError
client = OpenAI(
base_url="https://api.cometapi.com/v1",
api_key=os.environ.get("COMETAPI_KEY"),
)
def safe_complete(messages, model="gpt-5.4-mini", **kwargs):
try:
response = client.chat.completions.create(
model=model, messages=messages, **kwargs
)
except APIStatusError as e:
error_body = e.response.json().get("error", {})
logging.error(f"API error {e.status_code}: {error_body.get('message')}")
raise
choice = response.choices[0]
finish_reason = choice.finish_reason
if finish_reason == "content_filter":
raise ValueError(f"Generation blocked on model {model}. Rephrase prompt.")
if finish_reason == "length":
logging.warning("Output truncated at token limit.")
return {
"content": choice.message.content or "",
"finish_reason": finish_reason,
"tool_calls": choice.message.tool_calls,
}
Key takeaway: Always check finish_reason. Don't assume 200 means success.
Detecting Silent Failures
Silent failures are the hardest to catch. The API returns 200, finish_reason is stop, but the output is semantically wrong. You can only catch these at the application level.
Example: Validation for classification tasks
def validate_completion(result, task):
content = result["content"].strip()
# Empty output check
if not content and result["finish_reason"] != "tool_calls":
raise ValueError(f"Empty output for task '{task}'")
# Task-specific validation
if task == "classify":
valid_labels = {"positive", "negative", "neutral"}
if content.lower() not in valid_labels:
logging.warning(f"Unexpected output: '{content}'")
# May need to re-prompt with stricter instructions
if task == "json_extract":
import json
try:
json.loads(content)
except json.JSONDecodeError:
raise ValueError("Expected JSON but got plain text")
return content
Common causes of silent failures:
- Ambiguous prompts
- Model ignored format instructions
- Input was too short or too long for the task
Exponential Backoff for Rate Limits
Rate limit errors (429) are temporary. Use exponential backoff with jitter:
import time
import random
def complete_with_retry(messages, model="gpt-5.4-mini", max_retries=3):
for attempt in range(max_retries):
try:
return safe_complete(messages, model=model)
except APIStatusError as e:
if e.status_code < 500:
raise # Don't retry 4xx errors
except RateLimitError:
pass # Retry
if attempt < max_retries - 1:
wait = (2 ** attempt) + random.random()
logging.warning(f"Retry in {wait:.1f}s")
time.sleep(wait)
raise RuntimeError(f"Failed after {max_retries} attempts")
Why jitter matters: Random delay prevents multiple clients from retrying in sync (thundering herd problem).
Image Generation Errors
Image generation has its own failure patterns:
| Symptom | Cause | Fix |
|---|---|---|
Empty data array |
Prompt filtered | Check revised_prompt; rephrase |
response_format error |
Wrong parameter for GPT Image 2 | Use output_format instead |
n > 1 error |
Qwen Image doesn't support multiple images | Loop single requests |
| URL returns 403 later | URL expired | Download immediately |
Simplified image generation check:
def generate_image_safe(prompt, model="dall-e-3"):
response = requests.post(
"https://api.cometapi.com/v1/images/generations",
json={"model": model, "prompt": prompt},
headers={"Authorization": f"Bearer {api_key}"}
)
data = response.json().get("data", [])
if not data:
return {"blocked": True} # Content filter triggered
return {"url": data[0].get("url"), "blocked": False}
Video Generation Errors
Video generation is asynchronous. Key patterns to watch:
| Symptom | Cause | Fix |
|---|---|---|
Stuck in queued 10+ min |
Server load | Try a different model |
failed with no detail |
Prompt filtered | Rephrase prompt |
| URL returns 403 | URL expired | Download immediately |
task_not_exist on first poll |
Task still initializing | Wait 5s and retry |
Kling returns "succeed" |
Non-standard status | Handle both "succeed" and "succeeded" |
Minimal polling pattern:
def poll_video(task_id, max_wait=600):
elapsed = 0
while elapsed < max_wait:
result = requests.get(f"https://api.cometapi.com/v1/videos/{task_id}").json()
status = result.get("status")
if status == "succeeded":
return result["output"][0]
if status in ("failed", "cancelled"):
raise RuntimeError(f"Video failed: {result.get('error')}")
time.sleep(10)
elapsed += 10
raise TimeoutError("Video generation timed out")
Debugging Checklist
For text generation:
- [ ] API key is correctly formatted (
Bearer <key>) - [ ]
finish_reasonisstop(notcontent_filterorlength) - [ ]
contentis notnull(ornullis expected due totool_calls) - [ ] Error is
4xx(fix request) or5xx(retry) - [ ] Output passes application-layer validation (no silent failure)
For image generation:
- [ ]
dataarray is not empty (content filter not triggered) - [ ] Correct parameters used (
output_formatfor GPT Image 2, notresponse_format) - [ ] Downloaded image before URL expired
For video generation:
- [ ] Task progresses beyond
queuedwithin reasonable time - [ ] Error field checked in failed task response
- [ ] Video downloaded before URL expired
- [ ] Handles both
"succeed"(Kling) and"succeeded"(others)
FAQ
Q: My request returns 200 but no content. What happened?
Check finish_reason. content_filter means the generation was blocked. tool_calls means the model wants to call a tool (content is null by design). If finish_reason is stop but content is still empty, that's a silent failure — log the full response and check your prompt.
Q: How do I know if my prompt was filtered?
Text: finish_reason === "content_filter". Images: data array is empty. Video: Task reaches failed status quickly with no error detail. Fix: Rephrase the prompt to be more neutral.
Q: When should I retry a failed request?
Retry on 429 and 5xx with exponential backoff. Don't retry on 4xx — a bad request won't fix itself.
Q: What's exponential backoff?
Instead of retrying immediately, wait progressively longer: 1s, 2s, 4s. Add random jitter to prevent multiple clients from retrying in sync. This is standard practice for any rate-limited API.
Q: How do I catch silent failures?
Silent failures require application-layer validation. The API won't tell you the output is semantically wrong. Check that the output matches the expected format (valid JSON, expected label, minimum length). Log the full output when validation fails.
All rights reserved