Enhancing Vision Transformer Performance with Rotation Based Augmentation for Classifying Images of Colon Cancer Pathology

Background: In medical imaging, classifying images of colon cancer pathology is still an essential challenge, especially for facilitating early diagnosis and successful intervention. Recently, Vision Transformer (ViT) models have demonstrated great promise for a variety of computer vision tasks, in...

Full description

Saved in:
Bibliographic Details
Main Authors: Rudy Eko Prasetya, M. Arief Soeleman, Farrikh Al Zami, Affandy Affandy, Aris Marjuni, Mohammad Iqbal Saryuddin Assaqty
Format: Article
Language:English
Published: Universitas Nusantara PGRI Kediri 2025-07-01
Series:Intensif: Jurnal Ilmiah Penelitian Teknologi dan Penerapan Sistem Informasi
Subjects:
Online Access:https://ojs.unpkediri.ac.id/index.php/intensif/article/view/24918
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background: In medical imaging, classifying images of colon cancer pathology is still an essential challenge, especially for facilitating early diagnosis and successful intervention. Recently, Vision Transformer (ViT) models have demonstrated great promise for a variety of computer vision tasks, including the classification of medical images. However, the lack of annotated medical datasets and the intrinsic unpredictability of histopathology pictures sometimes restrict their performance. Objective: This study aims to enhance the performance of ViT models in colon cancer pathology classification by introducing a targeted data augmentation strategy, with a particular focus on rotation-based augmentation. Methods: We proposed a data augmentation pipeline that uses controlled changes to improve the number and diversity of training data. Like Rotation, Flip and Geometry are emphasized to replicate the real-world tissue orientation variations that are frequently seen in colon pathology slides. 10,000 JPEG pictures of colon cancer pathology, each with a resolution of 768 x 768 pixels, are used to train the models. We use models trained with and without the suggested augmentation pipeline to compare ViT performance across accuracy, sensitivity, and specificity in order to assess the impact of augmentation. Results: According to study results, rotation-based augmentation enhances ViT performance, achieving up to 99.30% accuracy and 99.50% sensitivity while preserving training times. In real-world pathology settings, where slide orientation varies greatly and can affect categorization consistency, these enhancements are especially pertinent. Conclusion: The proposed rotation-centric data augmentation technique enhances the performance of the ViT model in the classification of images showing colon cancer pathology.
ISSN:2580-409X
2549-6824