Research Article Open Access

Hybrid Attention-Based Stacked Bi-LSTM Model for Automated MultiImage Captioning

Paspula Ravinder1 and Saravanan Srinivasan1
  • 1 Department of Computer Science and Engineering, School of Computing Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamil Nadu, India

Abstract

In recent days, the process of medical image captioning is become a prominent field. The distinct characteristics of medical imaging data provide a number of challenges when captioning medical images. Also, the variability in image modalities makes it difficult to generate an effective captioning process. Thus, the proposed study aims to design a novel Multi-image Captioning Hybrid Attention Model to afford effective automated medical image captioning with minimum medical errors. Image acquisition is the initial stage of acquiring input images from the specified dataset. Then, data augmentation is accomplished to maximize the dataset's size. After that, preprocessing is performed to enhance the quality of inputs through Improved Wiener Filtering (IWF), image resizing and color channel conversion. Next, the necessary features are extracted and bounding boxes are generated by utilizing a new Position Attentional YOLOv5 (PA-YOLOV5) approach. Subsequently, the captioning process is performed through the proposed innovative Attention-based Stacked Bi-directional Long-ShortTerm capsule network (A-SBiLSTCN) model. To enhance the efficiency of the proposed model, its hyper-parameters are finetuned by using the Chaotic Flamingo Search Optimization (CFSO) algorithm during the training stage. For experimentation, the Python platform is used, and the simulation is performed using the PEIR dataset. The proposed study outperformed other existing methods in terms of BLEU score (92.87%), METEOR score (88.20%), ROUGE-L score (73.20%), SPICE score (70.76%) and RIBES score (60.40%).

Journal of Computer Science
Volume 21 No. 4, 2025, 883-904

DOI: https://doi.org/10.3844/jcssp.2025.883.904

Submitted On: 12 December 2024 Published On: 8 March 2025

How to Cite: Ravinder, P. & Srinivasan, S. (2025). Hybrid Attention-Based Stacked Bi-LSTM Model for Automated MultiImage Captioning. Journal of Computer Science, 21(4), 883-904. https://doi.org/10.3844/jcssp.2025.883.904

  • 54 Views
  • 23 Downloads
  • 0 Citations

Download

Keywords

  • Medical Image Captioning
  • Hybrid Attention
  • Color Channel
  • YOLOv5
  • Optimization
  • Hyperparameter Tuning
  • Bleu Score