A Harmonized Multi-Source Dataset with Baseline Deep Learning Validation for Staging Diabetic Retinopathy

Mukesh  Delu; Priyanka  Harjule; Rajesh  Kumar; Kushal  Gajjar

doi:https://doi.org/10.33889/IJMEMS.2026.11.1.007

Mukesh Delu
Department of Mathematics, Malaviya National Institute of Technology, Jaipur, 302017, Rajasthan, India.

Priyanka Harjule
Department of Mathematics, Malaviya National Institute of Technology, Jaipur, 302017, Rajasthan, India.

Rajesh Kumar
Department of Human Anatomy and Physiology, University of Johannesburg, Johannesburg, 2006, South Africa. & Department of Electrical Engineering, Malaviya National Institute of Technology, Jaipur, 302017, Rajasthan, India.

Kushal Gajjar
Department of Electronics and Communication Engineering, Malaviya National Institute of Technology, Jaipur, 302017, Rajasthan, India.

DOI https://doi.org/10.33889/IJMEMS.2026.11.1.007

Received on August 31, 2025

;

Accepted on November 25, 2025

Abstract

Accurate automated grading of diabetic retinopathy (DR) significantly depends on the quality of retinal fundus images. Inferior-quality pictures, resulting from inadequate lighting, motion blur, distortions, or incomplete retinal coverage, may obscure minor lesions and diminish the accuracy of model predictions. This study constructs a harmonized multi-source dataset using a multi-dimensional image quality assessment framework for multi-class DR staging. Retinal images are collected from IDRiD, Messidor-2, SUSTech-SYSU, APTOS 2019, DeepDRiD-v1.1, and Zenodo DR V03 datasets. The proposed pipeline includes preprocessing, image quality assessment using technical quality and medical relevance indicators, dataset-specific statistics, and adaptively thresholded using DR severity-aware percentiles derived from stratified samples with weighting to match diagnostic needs. Baseline deep learning models were trained for three hierarchical DR classification schemes to validate the dataset. Experimental results show that the quality-filtered merging of datasets improves model generalization accuracy by 3-7% compared to the normal merging of datasets. This work provides a benchmark dataset and baseline performance results to facilitate future research in DR staging and medical image classification.

Keywords- Diabetic retinopathy, Label harmonization, Image quality assessment, Retinal fundus images, Deep learning, Baseline validation, Hierarchical classification.

Citation

Delu, M., Harjule, P., Kumar, R., & Gajjar, K. (2026). A Harmonized Multi-Source Dataset with Baseline Deep Learning Validation for Staging Diabetic Retinopathy. International Journal of Mathematical, Engineering and Management Sciences, 11(1), 130-147. https://doi.org/10.33889/IJMEMS.2026.11.1.007.

Volume 11 (2026)

Number 1 (February)

Pages 130-147

PDF

Downloads: 32

International Journal of Mathematical, Engineering and Management Sciences

eISSN: 2455-7749 . Open Access

A Harmonized Multi-Source Dataset with Baseline Deep Learning Validation for Staging Diabetic Retinopathy