Hybrid transformer-CNN model for precise Fish segmentation, localization, and species classification in aquaculture
DOI:
https://doi.org/10.62110/sciencein.jist.2025.v13.1134Keywords:
Fish Segmentation, Aquaculture Monitoring, Deep Learning, Fish Localization, Transformer-CNN FusionAbstract
Marine biodiversity researchers require state-of-the-art computational tools for precise fish species identification. To better segment, localize, and classify fish, this work introduces a new Vision Transformer (ViT) fusion model. This system tackles important problems in automated marine species recognition by integrating state-of-the-art transformer models, such as Swin Transformer and Data-efficient Image Transformers (DeiT), with a hybrid CNN-Transformer architecture. The suggested method accomplishes remarkable results by making use of adaptive attention mechanisms, sophisticated hierarchical learning, and multi-scale feature extraction. Through experimental validation on multiple marine datasets, the model proves to be superior, outperforming state-of-the-art approaches with 97.8% accuracy in classification, 98.2% accuracy in segmentation, and 98.3% accuracy in localization. This study offers a strong and adaptable method for automated fish identification, which is a huge help for conservation efforts, ecological monitoring, and fisheries management. This discovery opens the door to more effective and efficient biodiversity studies and conservation efforts by incorporating artificial intelligence into marine ecosystems.
Downloads
Downloads
Published
Issue
Section
URN
License
Copyright (c) 2025 Pradumn Kumar, Praveen Kumar Shukla

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Rights and Permission