Knowledge distillation for spiking neural networks: aligning features and saliency

Spiking neural networks (SNNs) are renowned for their energy efficiency and bio-fidelity, but their widespread adoption is hindered by challenges in training, primarily due to the non-differentiability of spiking activations and limited representational capacity. Existing approaches, such as artific...

Full description

Saved in:
Bibliographic Details
Main Authors: Yifan Hu, Guoqi Li, Lei Deng
Format: Article
Language:English
Published: IOP Publishing 2025-01-01
Series:Neuromorphic Computing and Engineering
Subjects:
Online Access:https://doi.org/10.1088/2634-4386/ade821
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Spiking neural networks (SNNs) are renowned for their energy efficiency and bio-fidelity, but their widespread adoption is hindered by challenges in training, primarily due to the non-differentiability of spiking activations and limited representational capacity. Existing approaches, such as artificial neural network (ANN)-to-SNN conversion and surrogate gradient learning, either suffer from prolonged simulation times or suboptimal performance. To address these challenges, we provide a novel perspective that frames knowledge distillation as a hybrid training strategy, effectively combining knowledge transfer from pretrained models with spike-based gradient learning. This approach leverages the complementary benefits of both paradigms, enabling the development of high-performance, low-latency SNNs. Our approach features a lightweight affine projector that facilitates flexible representation alignment across diverse network architectures and neuron types. We further empirically demonstrate that the effectiveness of distillation is robust, irrespective of whether high-precision membrane potentials or binary spike trains are used as features. Through a quantitative measure of the consistency between model predictions and the saliency of relevant input pixels, we show that knowledge transfer is grounded in a shared understanding of salient features, rather than the exact replication of numerical activations. This framework represents a significant step towards enabling SNNs to achieve accuracy levels that are competitive with those of their ANN counterparts, while maintaining a minimal number of timesteps. For instance, applying our method to ResNet-18 on CIFAR-100 attains 80.48% accuracy with just four timesteps, surpassing the equivalent ANN (79.90%) and yielding a 3.49% improvement over non-distilled SNNs.
ISSN:2634-4386