Drives higher utilization and ROI by dynamically sharing server resources among many data scientists running multiple models.
Distributed data ingest, transformation and training
Enables processing jobs in parallel over a cluster of servers—helping to accelerate time to results.
A distributed training fabric
Allows most applications to run in parallel without the need for code changes.
Support for large models
Provides the ability to leverage CPU and GPU memory across a single large model.
Helps avoid interruptions during training
The ability to dynamically increase or decrease the resources allocated to model while training.
Training visualization and tuning
Enables monitoring the accuracy of the model while training is in progress and for making adjustments or stopping if the models is not converging or has low accuracy.
Hyper-parameter search and optimization
Improves accuracy with suggestion-based logic while training is running.