Troubleshooting reasoning service runtime issues

Common issues that occur during reasoning service operation, with symptoms, causes, and resolutions.

Runtime issues

Token budget exceeded

Agent returns message about context window limit reached.

This occurs when the conversation history is long, document context is large, an attempt is made to process a large number of documents at once, or the model's context window limit is exceeded.

Configure CONTEXT_WINDOW_TOKEN_LIMIT environment variable for your model. Adjust CONTEXT_WINDOW_SAFETY_MARGIN (default: 0.85). Reduce conversation history size via HISTORY_WINDOW_SIZE. Start a new conversation thread. Break large requests into smaller batches.

Agent timeout (504 Gateway Timeout)

Request times out after 5 minutes.

This occurs when a complex query requires many iterations or MCP server responses are slow.

Increase AGENT_INVOKE_TIMEOUT environment variable (default: 300 seconds). Check MAX_AGENT_ITERATIONS setting (default: 100). Simplify query or break into smaller requests. Check MCP server performance. Review agent logs.

Max iterations reached

Agent returns "I've reached the maximum number of LLM calls for this query".

This occurs when the query requires more reasoning steps than the configured limit or repeated tool calls occur due to unclear results.

Increase MAX_AGENT_ITERATIONS environment variable (default: 100). Provide a more specific query.