Shi Chen and Qi Zhao, "Shallowing Deep Networks: Layer-wise Pruning based on Feature Representations," TPAMI 2018. [pdf]
Identifying redundant layers within deep neural networks is not a trivial task, and previous pruning methods are not applicable since it is difficult to determine the importance of layers based on corresponding weight information. To detect layers that have minor contributions, we propose a new method to evaluate the feature representations of different layers. By estimating the discriminative power of the features, we are able to analyze the behaviors of different layers and thus identify redundant ones. Layers with improvement on feature representations lower than a predefined threshold (layers with transparent color in the figures) will be removed from the networks.
Feature Diagnosis on Single-Label Classification | |||
Feature Diagnosis on MSCOCO Multi-Label Classification | |||
To compensate the loss of performance resulted from layer removal, we construct a teacher-student networks, where the original model plays the role as a teacher while the pruned model acts as the student. Knolwedge distillation is utilized to transfer knowledge from teacher to student during the fine-tuning procedure.
Illustration of the procedure of proposed layer-wise pruning. | |||
The pre-defined threshold determines the trade-off between model performance and the computational cost. For devices with limited computational resources, it is reasonable to use a larger threshold to reduce computational demand. While for the applications that require high accuracy, a lower threshold can be used to better preserve performance.