Knowledge distillation for spiking neural networks: aligning features and saliency
Spiking neural networks (SNNs) are renowned for their energy efficiency and bio-fidelity, but their widespread adoption is hindered by challenges in training, primarily due to the non-differentiability of spiking activations and limited representational capacity. Existing approaches, such as artific...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IOP Publishing
2025-01-01
|
Series: | Neuromorphic Computing and Engineering |
Subjects: | |
Online Access: | https://doi.org/10.1088/2634-4386/ade821 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Spiking neural networks (SNNs) are renowned for their energy efficiency and bio-fidelity, but their widespread adoption is hindered by challenges in training, primarily due to the non-differentiability of spiking activations and limited representational capacity. Existing approaches, such as artificial neural network (ANN)-to-SNN conversion and surrogate gradient learning, either suffer from prolonged simulation times or suboptimal performance. To address these challenges, we provide a novel perspective that frames knowledge distillation as a hybrid training strategy, effectively combining knowledge transfer from pretrained models with spike-based gradient learning. This approach leverages the complementary benefits of both paradigms, enabling the development of high-performance, low-latency SNNs. Our approach features a lightweight affine projector that facilitates flexible representation alignment across diverse network architectures and neuron types. We further empirically demonstrate that the effectiveness of distillation is robust, irrespective of whether high-precision membrane potentials or binary spike trains are used as features. Through a quantitative measure of the consistency between model predictions and the saliency of relevant input pixels, we show that knowledge transfer is grounded in a shared understanding of salient features, rather than the exact replication of numerical activations. This framework represents a significant step towards enabling SNNs to achieve accuracy levels that are competitive with those of their ANN counterparts, while maintaining a minimal number of timesteps. For instance, applying our method to ResNet-18 on CIFAR-100 attains 80.48% accuracy with just four timesteps, surpassing the equivalent ANN (79.90%) and yielding a 3.49% improvement over non-distilled SNNs. |
---|---|
ISSN: | 2634-4386 |