Document Type : Original Research

Authors

Department of Computer, Engineering and Information Technology, Shiraz University of Technology, Shiraz, Iran

Abstract

Background: With the rapid growth of patient-generated textual reviews, sentiment analysis has become essential for extracting meaningful insights in the healthcare sector. However, embedding models, which are typically pre-trained on general-domain corpora, struggle to capture domain-specific terminology and contextual nuances in healthcare content accurately. This limitation has led to increasing interest in hybrid embedding strategies that combine the strengths of multiple representation methods.
Objective: This study aims to improve the performance of Persian healthcare sentiment analysis by proposing a hybrid embedding approach capable of more accurately identifying domain-specific expressions and linguistic ambiguities.
Material and Methods: In this analytical study, a hybrid embedding model integrating Bidirectional Encoder Representations from Transformers (BERT) and Frequency–Inverse Document Frequency (TF‑IDF) representations were developed and applied to a dataset of Persian healthcare-related comments. Then, the Bidirectional Gated Recurrent Unit (Bi GRU) algorithm was employed for sentiment classification, and its performance was subsequently compared with that of other machine learning and deep learning models.
Results: The proposed hybrid method achieved an accuracy of 83.05%. In comparison, models using only BERT and TF‑IDF embeddings achieved accuracies of 63.46% and 80%, respectively. 
Conclusion: The results highlight the superiority of the proposed hybrid embedding approach in capturing domain-specific vocabulary and resolving ambiguous expressions within healthcare reviews. The strong performance of the Bi-GRU classifier further demonstrates the importance of modeling semantic and long-term dependencies to enhance sentiment analysis accuracy in the healthcare domain.

Keywords