Đã đăng vào thg 6 11, 5:24 SA 3 phút đọc

AI API Errors: A Practical Debugging Guide for Developers

API failures in AI work differently. Here's how to debug them properly.

A 200 status code doesn't always mean your AI generation succeeded. A null content field isn't necessarily an error. And a prompt that worked perfectly yesterday might fail today — because a provider quietly updated their content policy.

This guide walks you through reading AI API errors, understanding what each failure mode actually means, and building error handling that tells you what broke — not just that something broke.

Note: Model names like gpt-5.4 and gpt-5.4-mini used here are CometAPI platform identifiers. They work through https://api.cometapi.com/v1 only — not directly through OpenAI or Anthropic APIs.

Why AI API Debugging Is Different

With a standard REST API, 200 means success and 4xx means you made a mistake. AI APIs introduce a third category: soft failures — responses that return 200 but contain nothing usable.

AI failures fall into three types:

Failure Type	What Happens	Example
Hard failure	HTTP error (4xx, 5xx). Request didn't complete.	401 Unauthorized
Soft failure	HTTP 200, but `finish_reason` is `content_filter` or `length`	Blocked prompt
Silent failure	HTTP 200, everything looks fine — but output is wrong	Wrong classification

Most error handling only covers the first type. The second and third types are where production bugs hide.

Understanding Error Responses

The text completions endpoint returns a consistent error structure:

{
  "error": {
    "message": "Human-readable description (includes request ID)",
    "type": "comet_api_error",
    "param": "the_problematic_parameter",
    "code": "error_code"
  }
}

What to log: Always log message and param. The message tells you what went wrong. The param tells you which parameter caused it.

Image & video endpoints return different error formats — always parse the raw response body.

HTTP Status Codes: What They Mean

Status	Meaning	Common Cause	Fix
400	Bad request	Missing model or wrong parameter	Check `error.param`
401	Unauthorized	Invalid or missing API key	Verify `Bearer <key>` format
429	Rate limited	Too many requests	Exponential backoff
500	Server error	Provider-side issue	Retry with backoff
504	Gateway timeout	Provider took too long	Retry or use faster model

Rule of thumb: Retry on 429, 500, and 504. Don't retry on 400 or 401 — the same request will fail again.

The Most Overlooked Field: `finish_reason`

A 200 response with finish_reason: "content_filter" means your generation was blocked. The content field will be null or empty. If you don't check this, your app will silently return nothing.

`finish_reason`	Meaning	Action
`stop`	Normal completion	Success
`length`	Hit token limit	Increase `max_tokens` or shorten prompt
`content_filter`	Blocked by safety policy	Rephrase the prompt
`tool_calls`	Model called a tool	Handle the tool call (content will be `null`)

A Robust Text Completion Example (Python)

Here's a production-ready function that handles all three failure types:

import os
import logging
from openai import OpenAI, APIStatusError, APIConnectionError

client = OpenAI(
    base_url="https://api.cometapi.com/v1",
    api_key=os.environ.get("COMETAPI_KEY"),
)

def safe_complete(messages, model="gpt-5.4-mini", **kwargs):
    try:
        response = client.chat.completions.create(
            model=model, messages=messages, **kwargs
        )
    except APIStatusError as e:
        error_body = e.response.json().get("error", {})
        logging.error(f"API error {e.status_code}: {error_body.get('message')}")
        raise

    choice = response.choices[0]
    finish_reason = choice.finish_reason

    if finish_reason == "content_filter":
        raise ValueError(f"Generation blocked on model {model}. Rephrase prompt.")

    if finish_reason == "length":
        logging.warning("Output truncated at token limit.")

    return {
        "content": choice.message.content or "",
        "finish_reason": finish_reason,
        "tool_calls": choice.message.tool_calls,
    }

Key takeaway: Always check finish_reason. Don't assume 200 means success.

Detecting Silent Failures

Silent failures are the hardest to catch. The API returns 200, finish_reason is stop, but the output is semantically wrong. You can only catch these at the application level.

Example: Validation for classification tasks

def validate_completion(result, task):
    content = result["content"].strip()

    # Empty output check
    if not content and result["finish_reason"] != "tool_calls":
        raise ValueError(f"Empty output for task '{task}'")

    # Task-specific validation
    if task == "classify":
        valid_labels = {"positive", "negative", "neutral"}
        if content.lower() not in valid_labels:
            logging.warning(f"Unexpected output: '{content}'")
            # May need to re-prompt with stricter instructions

    if task == "json_extract":
        import json
        try:
            json.loads(content)
        except json.JSONDecodeError:
            raise ValueError("Expected JSON but got plain text")

    return content

Common causes of silent failures:

Ambiguous prompts
Model ignored format instructions
Input was too short or too long for the task

Exponential Backoff for Rate Limits

Rate limit errors (429) are temporary. Use exponential backoff with jitter:

import time
import random

def complete_with_retry(messages, model="gpt-5.4-mini", max_retries=3):
    for attempt in range(max_retries):
        try:
            return safe_complete(messages, model=model)
        except APIStatusError as e:
            if e.status_code < 500:
                raise  # Don't retry 4xx errors
        except RateLimitError:
            pass  # Retry

        if attempt < max_retries - 1:
            wait = (2 ** attempt) + random.random()
            logging.warning(f"Retry in {wait:.1f}s")
            time.sleep(wait)

    raise RuntimeError(f"Failed after {max_retries} attempts")

Why jitter matters: Random delay prevents multiple clients from retrying in sync (thundering herd problem).

Image Generation Errors

Image generation has its own failure patterns:

Symptom	Cause	Fix
Empty `data` array	Prompt filtered	Check `revised_prompt`; rephrase
`response_format` error	Wrong parameter for GPT Image 2	Use `output_format` instead
`n > 1` error	Qwen Image doesn't support multiple images	Loop single requests
URL returns 403 later	URL expired	Download immediately

Simplified image generation check:

def generate_image_safe(prompt, model="dall-e-3"):
    response = requests.post(
        "https://api.cometapi.com/v1/images/generations",
        json={"model": model, "prompt": prompt},
        headers={"Authorization": f"Bearer {api_key}"}
    )

    data = response.json().get("data", [])
    if not data:
        return {"blocked": True}  # Content filter triggered

    return {"url": data[0].get("url"), "blocked": False}

Video Generation Errors

Video generation is asynchronous. Key patterns to watch:

Symptom	Cause	Fix
Stuck in `queued` 10+ min	Server load	Try a different model
`failed` with no detail	Prompt filtered	Rephrase prompt
URL returns 403	URL expired	Download immediately
`task_not_exist` on first poll	Task still initializing	Wait 5s and retry
Kling returns `"succeed"`	Non-standard status	Handle both `"succeed"` and `"succeeded"`

Minimal polling pattern:

def poll_video(task_id, max_wait=600):
    elapsed = 0
    while elapsed < max_wait:
        result = requests.get(f"https://api.cometapi.com/v1/videos/{task_id}").json()
        status = result.get("status")

        if status == "succeeded":
            return result["output"][0]
        if status in ("failed", "cancelled"):
            raise RuntimeError(f"Video failed: {result.get('error')}")

        time.sleep(10)
        elapsed += 10

    raise TimeoutError("Video generation timed out")

Debugging Checklist

For text generation:

[ ] API key is correctly formatted (Bearer <key>)
[ ] finish_reason is stop (not content_filter or length)
[ ] content is not null (or null is expected due to tool_calls)
[ ] Error is 4xx (fix request) or 5xx (retry)
[ ] Output passes application-layer validation (no silent failure)

For image generation:

[ ] data array is not empty (content filter not triggered)
[ ] Correct parameters used (output_format for GPT Image 2, not response_format)
[ ] Downloaded image before URL expired

For video generation:

[ ] Task progresses beyond queued within reasonable time
[ ] Error field checked in failed task response
[ ] Video downloaded before URL expired
[ ] Handles both "succeed" (Kling) and "succeeded" (others)

FAQ

Q: My request returns 200 but no content. What happened?
Check finish_reason. content_filter means the generation was blocked. tool_calls means the model wants to call a tool (content is null by design). If finish_reason is stop but content is still empty, that's a silent failure — log the full response and check your prompt.

Q: How do I know if my prompt was filtered?
Text: finish_reason === "content_filter". Images: data array is empty. Video: Task reaches failed status quickly with no error detail. Fix: Rephrase the prompt to be more neutral.

Q: When should I retry a failed request?
Retry on 429 and 5xx with exponential backoff. Don't retry on 4xx — a bad request won't fix itself.

Q: What's exponential backoff?
Instead of retrying immediately, wait progressively longer: 1s, 2s, 4s. Add random jitter to prevent multiple clients from retrying in sync. This is standard practice for any rate-limited API.

Q: How do I catch silent failures?
Silent failures require application-layer validation. The API won't tell you the output is semantically wrong. Check that the output matches the expected format (valid JSON, expected label, minimum length). Log the full output when validation fails.

cometapi