Efficient Inference
Model Compression, Efficient Inference
Introduction
In existing research, efficient inference (also known as model compression) can be achieved by following several methods:
- network pruning & sparse neural network
- quantization
- neural architecture search
- knowledge distillation
Efficient training:
- gradient compression
- on-device training
- federated learning