Features and benefits
The Spyre Operator has the following key features and benefits:
Device management and allocation
- Device plug-in: Discovers, lists, and allocates Spyre PF (Physical Function) and VF (Virtual
Function) devices to pods
- Simple allocation: Provides arbitrary Spyre cards
- Per-device allocation: Allocates specific cards by PCI address Note: The per-device allocation process is not recommended for general users. Only the cluster administrator can use this process for card debugging.
Custom scheduler
- Custom Kubernetes scheduler (spyre-scheduler) for intelligent Spyre device placement
- Topology-aware scheduling for optimal performance in tensor parallel workloads
- External device reservation support
Pod validator webhook
- Validates pod specifications requesting Spyre devices
- Ensures proper scheduler name (spyre-scheduler) is specified
- Prevents configuration errors before pod deployment
- Validates resource requests and allocation mode
Custom resource definitions (CRDs)
- SpyreClusterPolicy: Centralized configuration for entire cluster setup
- SpyreNodeState: Per-node status tracking and resource availability
Configuration management
- Automatic generation of device configuration files
(
senlib_config.jsonorconfig.json) - Topology file (
topo.json) generation and injection - Resource pool metadata injection
- ConfigMap that is based on topology management
Health checker
- The health checker gathers information on Spyre cards by using the 'lspci'' command
- The health checker reports its findings to the device plugin, which then updates the SpyreNodeState custom resource. The scheduler reads this custom resource