Tuning Content Cortex AI Services
Tune Content Cortex AI Services performance parameters to optimize response times, token usage, and cost while maintaining answer quality.
After configuring Content Cortex AI Services, you can tune various performance parameters to optimize the reasoning service for your specific use cases and requirements. Performance tuning involves balancing completeness, efficiency, and cost.
Performance tuning overview
The reasoning service uses a LangGraph-based agentic workflow that iteratively processes user queries through multiple reasoning steps. Each iteration consumes tokens and adds latency, so tuning involves finding the optimal balance between answer completeness, response efficiency, and API costs.
Key tunable parameters include agent iteration limits, context window settings, conversation history management, and completion token limits. Different use cases require different tuning approaches - for example, latency-sensitive applications benefit from reduced iteration limits, while quality-focused applications may need higher limits to ensure thorough answers.
Tuning considerations
When tuning Content Cortex AI Services performance:
- Start with default values and collect baseline metrics before making changes
- Adjust one parameter at a time to isolate the impact of each change
- Test with representative queries from your actual use cases
- Monitor production metrics continuously and adjust as needed
- Consider the characteristics of your chosen LLM model - high-capability models may need fewer iterations while smaller models may need more
Monitoring and optimization
Effective tuning requires ongoing monitoring of key metrics including average iterations per query, token usage distribution, response latency percentiles, and API costs. Use these metrics to identify optimization opportunities and validate that tuning changes achieve the desired results without degrading answer quality.