World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

Aspect-Level Sentiment Analysis Based on Lite Bidirectional Encoder Representations From Transformers and Graph Attention Networks

    https://doi.org/10.1142/S0218126625500628Cited by:0 (Source: Crossref)

    Abstract

    Aspect-level sentiment analysis is a critical component of sentiment analysis, aiming to determine the sentiment polarity associated with specific aspect words. However, existing methodologies have limitations in effectively managing aspect-level sentiment analysis. These limitations include insufficient utilization of syntactic information and an inability to precisely capture the contextual nuances surrounding aspect words. To address these issues, we propose an Aspect-Oriented Graph Attention Network (AOGAT) model. This model incorporates syntactic information to generate dynamic word vectors through the pre-trained model ALBERT and combines a graph attention network with BiGRU to capture both syntactic and semantic features. Additionally, the model introduces an aspect-focused attention mechanism to retrieve features related to aspect words and integrates the generated representations for sentiment classification. Our experiments on three datasets demonstrate that the AOGAT model outperforms traditional models.

    This paper was recommended by Regional Editor Takuro Sato.

    1. Introduction

    Aspect-level sentiment classification aims to identify sentiment tendencies in sentences concerning specific aspects. Early methods for this task combined manually labeled features with traditional machine learning techniques. However, the advent of deep learning,1,2 particularly neural networks,3,4 has led to a shift toward automated feature extraction, making deep learning-based approaches predominant in sentiment analysis.5 In a seminal work, Tang et al.6 integrated aspectual word information into Long Short-Term Memory (LSTM) networks for semantic encoding, demonstrating the critical importance of including aspectual word information. To further refine the integration of aspectual and contextual information, researchers have increasingly adopted attention mechanisms. For instance, Joshi et al.7 utilized Convolutional Neural Networks (CNN) within a textual attention-based neural network. This model aimed to extract features and model semantic relationships between sentences and aspectual words. However, this attention-based approach primarily focused on semantic aspects, potentially neglecting valuable syntactic information.

    In recent years, researchers have proposed various approaches using graph neural networks (GNNs)8 and have achieved significant performance improvements. GNNs effectively utilize syntactic information, leading to important advances in the field. Zhao et al.9 employed Graph Convolutional Networks (GCNs) to address emotional dependencies between multiple aspects, achieving good results. Similarly, Wang et al.10 used Graph Attention Networks (GATs) for Aspect-Based Sentiment Analysis (ABSA), demonstrating that converting syntactic information into a syntactic dependency tree during data preprocessing, with the designated aspect as the root node, improved model performance. These experiments highlighted that generating aspect-specific representations significantly enhanced model effectiveness.

    With the advancement of language models, pre-trained models like BERT and ALBERT have shown significant success across various natural language processing tasks. In sentiment analysis, these models transform static word vectors into dynamic counterparts, enabling more nuanced and dynamic semantic representations. This development effectively addresses challenges in sentiment analysis for lengthy sentences, establishing these models as standards in the field. To enhance the utilization of both semantic and syntactic information within sentences, this study employs ALBERT to extract dynamic word vectors, GAT to capture syntactic details, and BiGRU to capture semantic nuances. Aspect-focused attention is then applied to facilitate interaction between sentences and aspect words, thereby extracting sentiment information more comprehensively.

    Building on the preceding analysis, we introduce the Aspect-Oriented Graph Attention Network (AOGAT) model, which aims to fully leverage both semantic and syntactic information within sentences. This model synergizes GAT and BiGRU to concurrently extract syntactic and semantic details. Initially, ALBERT is employed to derive dynamic word vectors, which are then input into both GAT and BiGRU. GAT utilizes the adjacency matrix from the dependency parser to extract syntactic features, while BiGRU captures the hidden state expressions of the text. Aspect-focused attention is subsequently applied to retrieve crucial features related to the semantics of aspectual words from the hidden state vectors of both GAT and BiGRU. Ultimately, the sentiment classification layer predicts sentiment polarity. The model’s efficacy is validated through experiments on three benchmark datasets.

    In summary, this paper’s contributions lie in proposing the AOGAT model, which seamlessly integrates GAT and BiGRU to extract and combine semantic and syntactic information. Our main research findings are as follows :

    (1)

    This study generated dynamic word vectors using ALBERT, an optimized version of BERT, which reduces the parameter count and enhances operational speed compared to BERT.

    (2)

    We introduce the AOGAT model, leveraging ALBERT for input processing and employing a hybrid neural network comprising GAT, BiGRU and aspect-focused attention to effectively address the Aspect-Based Sentiment Classification (ABSC) task.

    (3)

    Comprehensive experimentation determined that the AOGAT model outperformed other baseline models, demonstrating superior performance on the SemEval 2014 and Twitter datasets.

    2. Related Work

    In the rapidly expanding Internet industry, online evaluations have become increasingly prevalent, often involving multiple aspects of the same entity. To provide comprehensive and detailed sentiment information to web users, targeted and finer-grained sentiment category analysis is required. Aspect-level sentiment categorization aims to ascertain the sentiment polarity (positive, neutral, negative) associated with specific aspects mentioned in a text. For example, in the statement “The service is good, but the location is remote,” the term “service” is evaluated positively, while “location” is evaluated negatively. This finer analysis allows for a nuanced understanding of sentiments expressed toward distinct aspects within a given text.

    Various research methodologies and techniques have emerged to address aspect-level sentiment analysis tasks. Machine learning and sentiment lexicon approaches necessitate manual annotation of textual features, which are then used to build classifiers that determine the overall sentiment polarity of the text. Keshavarz et al.11 employed an approach combining a corpus and a lexicon to create an adaptive sentiment lexicon, enhancing the judgment of sentiment polarity. Ding et al.12 addressed deficiencies in sentiment lexicons by extending them using synonyms and antonyms from WordNet and word co-occurrence information, achieving a more thorough assessment of sentiment polarity. Pang et al.13 simplified sentiment classification to binary classification (ignoring neutrality) and applied three machine learning algorithms to predict sentiments in movie review data.

    Deep learning approaches eliminate the need for intricate feature engineering by autonomously learning mappings from data to high-level features. In natural language processing, these methods consistently outperform traditional approaches. The rapid evolution of neural network technology has led to novel models yielding remarkable results in sentiment analysis tasks. The attention mechanism has emerged as a powerful tool for recognizing sentiment information by calculating the semantic correlation between aspects and text, effectively highlighting emotional nuances. Wang et al.14 adopted this approach, using the attention mechanism to weigh the encoding of the textual hidden layer in long and short-term memory networks, significantly improving model performance. Fan et al.15 introduced interactive attention and proposed a multi-granularity attention approach, integrating two types of attention to enhance the model’s judgment capabilities. These mechanisms refine the understanding of sentiment information, capturing different granularity levels of sentiment in the text and making the analysis more comprehensive and accurate.

    However, models relying solely on the attention mechanism focus exclusively on the semantic aspects of sentences, neglecting syntactic information. In sentences with multiple emotion words of opposing polarities, the attention mechanism may be swayed by words unrelated to the aspectual words. For example, in Fig. 1, the sentiment toward “location” might be influenced by the positive opinion word “good,” which is actually associated with “service.”

    Fig. 1.

    Fig. 1. An example sentence with a dependency tree.

    GNNs have significant advantages in processing unstructured information. In sentiment analysis tasks, syntactic information helps models identify sentiment information more accurately. Compared to the attention mechanism, GNNs process syntactic information more finely, resulting in more accurate access to sentiment information related to aspectual sentiment polarity. Leveraging GNN capabilities enhances our grasp of syntactic structure in textual data, elevating sentiment analysis model performance and adaptability to diverse unstructured information contexts. Sun et al.16 employed GCNs to derive node representations from a dependency tree, incorporating these representations into a sentiment classification task. Huang et al.13 introduced GATs, integrating the attention mechanism into GCNs, enhancing model flexibility and precision in establishing word dependencies, and improving sentiment classification performance.

    With advancements in language modeling, pre-trained models like BERT17 have demonstrated remarkable efficacy in handling sentiment analysis for lengthy sentences, becoming prevalent in sentiment analysis. Sun et al.18 enhanced BERT through fine-tuning, constructing auxiliary sentences incorporating aspectual words and transforming aspect-level sentiment analysis into a sentence pair classification problem, yielding excellent text classification performance. Xia et al.19 focused on different contexts to obtain deep contextual information through a BERT-based context perceptron. Google researchers and others improved BERT, introducing ALBERT,20 which reduces parameter count and enhances operational speed. Leveraging these models’ strengths, this paper introduces the AOGAT model, which extracts both semantic and syntactic information from pre-trained ALBERT using a combination of GAT and Bidirectional Gated Recurrent Units (BiGRU). It employs Aspect-Focused Attention to consider important features in both context and aspectual terminology, providing sufficient information for accurately identifying the affective polarity of given aspects.

    3. Proposed AOGAT-Based Aspect-Level Sentiment Analysis Approach

    The proposed AOGAT for ABSA consists of three layers: the sentence encoding layer, the aspect-focused hybrid neural network layer and the output layer. The network structure is depicted in Fig. 2.

    Fig. 2.

    Fig. 2. Structure of AOGAT model.

    3.1. Sentence coding layer

    The text embedding layer maps the words in the comment text into word vectors that can be processed by the neural network model. Given a text of length n, a sequence of words s={w1,w2,,wn}s={w1,w2,,wn}, the text s contains a sequence of aspect words of length m(1m<n)a={wa,,wa+m2,wa+m+1}m(1m<n)a={wa,,wa+m2,wa+m+1}. The embedding of the text s is expressed as hs={es1,es2,,esn1,esn}hs={es1,es2,,esn1,esn}, and the embedding of aspect words ha={ea1,ea2,,eam1,eam}ha={ea1,ea2,,eam1,eam}, where eidi×n, the variable n represents the length of the input sentence and di denotes the word embedding dimension of ALBERT.

    The ALBERT model, through its transformer-based architecture and self-attention mechanism, is capable of generating dynamic word vectors that not only richly express the semantic and syntactic information of the text, but also capture the relationships between words and the subtleties of context. This capability allows the model’s GAT and BiGRU components to more effectively extract text features, thereby achieving more accurate identification of sentiment polarity associated with specific aspects of the text in sentiment classification tasks, enhancing the overall performance of the model.

    3.2. Aspect focused hybrid neural networks

    A GAT extends traditional CNNs to process graph-structured data. Like GCNs, GATs encode local information in unstructured data while incorporating an attention mechanism to flexibly capture relationships between nodes. This paper’s GAT model introduces a multi-head attention mechanism, allowing it to focus on different subsets of nodes simultaneously, thereby capturing complex patterns within the graph structure. Each attention head has its own set of parameters, enabling the network to dynamically determine which parts of the graph to focus on during learning. The specific formula is provided as follows:

    eij=α([WhiWhj]),(1)
    aij=exp(LeakyReLU(αT[WhiWhj]))kNiexp(LeakyReLU(aT[WhiWhk])),(2)
    where W is the shared parameter used for feature enhancement, [] denotes the feature vectors of vertices ij that have been spliced through a linear transformation. Specifically, we utilize the dynamic word embeddings generated by ALBERT as the initial node features and employ the GAT to compute the attention weights between nodes, capturing the syntactic and semantic relationships between them. The attention weights are calculated using a multi-head attention mechanism, ultimately producing an aggregated representation for each node.

    A single-layer attention mechanism may not effectively handle the neighboring nodes relative to the target.21 Therefore, this paper introduces a multi-head attention mechanism, where K independent attention mechanisms perform computations. Their resulting features are concatenated and averaged to produce the final output feature representation:

    hi=σ(1KKk=1jNiakijWkhj),(3)
    where akij denotes the attention score of the kth head. Wk is the input linear transformation matrix, and hi denotes the feature representation of node i after integrating information from its neighboring nodes.

    The contextual hidden vectors hs and adjacency matrices AijRn×n (where n denotes the number of nodes) generated by the ALBERT model are subsequently input into the GAT. The final outputs are derived from the training of the L-layer graph-attentive network GL={gL1,gL2,,gLn1,gLn}.

    The Gated Recurrent Unit (GRU), illustrated in Fig. 3, is a variant of the Recurrent Neural Network (RNN).22 In the AOGAT model, the Bidirectional GRU (BiGRU) receives hidden state vectors produced by ALBERT as input, encoding inputs from both forward and backward directions. GRU addresses the vanishing gradient problem found in traditional RNNs by incorporating update gates and reset gates. The update gate zt determines the extent to which information from the previous hidden state is incorporated into the current hidden state, while the reset gate rt controls how much the current input influences the memory neurons. These gating mechanisms enable the model to better manage which information to retain or discard, enhancing the capture of sentence semantics and contextual relationships. The formula for GRU is provided as follows:

    zt=σ(Wxzxt+Whzht1),(4)
    rt=σ(Wxrxt+Whrht1),(5)
    ̃ht=σ(Wxt+U(rht1)),(6)
    ht=(1zt)ht1+zt̃ht.(7)

    Fig. 3.

    Fig. 3. GRU model diagram.

    In the given framework, let t denote the time step. xt represents the input to the GRU cell at time t, while WW signifies the weight matrix that governs the connections within the GRU. ht indicates the cell state representation at time t and ̃ht denotes the hidden layer output of the GRU cell at the same time step. The sigmoid activation function is represented by σ. The contextual hidden vector hs, which is generated by the ALBERT model, is subsequently processed through the BiGRU to produce the resultant vector U.

    Aspect-focused attention involves extracting crucial features related to aspectual semantics from hidden state vectors. To achieve this, the hidden state vectors of nonaspect words are masked, while those of aspect words remain unchanged:

    CLi=01tτ+1,τ+m<tn.(8)
    The output of the masking layer is represented by the aspect-oriented feature CLmask={0,,cLτ+1,,cLτ+m,,0}. CLmask effectively captures contextual features by integrating syntactic dependencies and distant multi-word relationships. Furthermore, CLmask employs the dot product to quantify the semantic correlation between aspectual constituent words and other words within the sentence, thereby facilitating aspect masking. This is detailed in the following equation:
    βi=τ+mi=τ+1hctCLi,(9)
    αt=exp(βi)ni=1exp(βi),(10)
    r=nt=1αthct,(11)
    where hct can be the outputs GL and U of GAT and BiGRU, and with the above equations, we obtain the final representations Gf and Uf.

    3.3. Output layer

    The aspect-focused attention layer produces representations Gf and Uf, which are then merged into Hfin, subsequently, an averaging operation is applied to obtain Havg, and these averaged vectors are inputted into a fully connected layer to yield diverse sentiment classification outcomes:

    Hfin=[G:U],(12)
    x=WaHavg+ba,(13)
    y=softmax(x).(14)

    3.4. Training

    A gradient descent algorithm is used to train the model using cross-entropy loss and L2 regularization as shown in the following equation:

    Loss=Di=1Cj=1ˆpjilogpji+λθ2.(15)
    Here, D denotes the size of the training dataset. pji represents the predicted sentiment category of the j-th text instance, whereas ˆpji signifies the true sentiment category. To address model complexity and mitigate overfitting, a regularization term λθ2 is included, where θ encompasses all trainable parameters and λ is the L2 regularization coefficient. In this study, the parameter C is fixed at 3.

    4. Experiments

    4.1. Datasets

    In this study, we have chosen three publicly available datasets: the Twitter comment dataset,23 and the SemEval-2014 dataset,24 which encompasses laptop reviews (Laptop) and restaurant reviews (Restaurant). Additionally, each review within these datasets comprises one or more aspects. The statistical details of the three datasets are presented in Table 1.

    Table 1. Data set information.

    PositiveNeuralNegative
    DatasetTrainTestTrainTestTrainTest
    Twitter156117331273461560173
    Restaurant2164728637196807196
    Laptop994341464169870128

    4.2. Experimental parameters

    In our experimental setup, we employed the ALBERT-xLarge version with an embedding dimension of 2,048, fine-tuned using a learning rate of 2×105. The Adam optimizer was chosen for experimentation. For the GAT, we configured it with two layers, five heads for multi-head attention, a dropout coefficient of 0.2, a training batch size of 16, and a regularization factor of 1×104.

    4.3. Baseline model

    TD-LSTM:6 This approach employs two LSTM networks, designed for the pre- and post-contexts of aspect terms. The extension of LSTM to the ABSA task involves connecting the final hidden states of these two LSTM networks to predict emotional polarity.

    ASGCN:25 It leverages GCN to represent the context, utilizing syntactic information to establish dependencies between aspects and words.

    TD-GAT:26 Aspect words are treated as individual words to capture their dependencies with other words. At the same time, the dependencies between words are processed using a multi-head map attention network, allowing the model to capture their associations in a more comprehensive way.

    KumaGCN:27 It combines information from dependency graphs and latent graphs to learn grammatical features.

    R-GAT:28 The dependency tree is reconstructed, and redundant information is removed, the original GNN is extended, and a relational attention mechanism is added.

    DGEDT:29 A dual-transform structure based on dependency graph enhancement is proposed that can simultaneously fuse sequence representation and graph representation.

    T-GCN:30 A method for explicitly exploiting the ABSA dependency types of type-aware GCNs is proposed.

    dotGCN:31 A discrete potential tree, specifically designed for aspects and independent of language, is introduced as an alternative structure to dependency trees.

    DualGCN:32 This model includes SynGCN modules with orthogonal regularizers and SemGCN modules with differential regularizers.

    SSEGCN:33 The model integrates an aspect-aware attention mechanism, through graph convolution operations on this attention score matrix, the model enhances the representation of nodes.

    4.4. Comparison with the state-of-the-art methods

    In this section, we assess the sentiment classification performance of the AOGAT model against the baseline model across the three datasets. The tabulated experimental results can be found in Table 2.

    Table 2. Comparison of Accuracy and F1 on the Three Datasets.

    RestaurantLaptopTwitter
    ModelsAccuracy (%)Marco-F1 (%)Accuracy (%)Marco-F1 (%)Accuracy (%)Marco-F1 (%)
    TD-LSTM77.4665.7069.4261.5371.0270.21
    ASGCN81.3272.8574.9871.5473.1269.23
    TD-GAT80.2372.7673.8671.4372.2170.34
    KumaGCN82.5472.9876.7671.6372.8671.64
    R-GAT83.1277.3476.9474.0175.6472.98
    DGEDT84.5474.7877.2673.3273.8772.54
    T-GCN83.5475.2176.3672.0974.1673.19
    DualGCN83.8979.4579.4375.3476.3274.87
    SSEGCN85.6776.9680.2676.9177.4674.67
    DGEDT+BERT86.7580.4779.3475.7976.9175.78
    T-GCN+BERT85.6779.3480.7677.2776.7676.12
    dotGCN+BERT86.5481.4780.7578.7678.7578.33
    DualGCN+BERT87.7680.6581.5877.9576.9375.56
    SSEGCN+BERT87.8580.3880.8477.5677.2675.34
    Our AOGAT87.8481.8581.9279.2177.6376.74

    Table 2 illustrates that the AOGAT model achieved the highest performance. In contrast, the TD-LSTM model exhibited shortcomings, primarily due to its simplified approach to processing aspect words, failing to fully utilize them for modeling context words. GNN-based methods, such as ASGCN and R-GAT, account for syntactic relations in sentences and perform better than traditional semantic-based methods. Models like SSEGCN+BERT and DualGCN+BERT leverage the powerful feature representation capabilities of the BERT pre-trained model, resulting in substantial improvements across all metrics. Our approach addresses these issues by incorporating both syntactic and traditional semantic information from sentences and leveraging the pre-trained ALBERT model to extract dynamic word vectors. The use of aspect-focused attention ensures precise focus on information relevant to aspect words, significantly enhancing the model’s classification accuracy.

    4.5. Ablation experiment

    To evaluate the significance of each module within the AOGAT model, we conducted a series of ablation experiments, the results of which are presented in Table 3.

    Table 3. Ablation experiments with the AOGAT model.

    RestaurantLaptopTwitter
    ModelsAccuracy (%)Marco-F1 (%)Accuracy (%)Marco-F1 (%)Accuracy (%)Marco-F1 (%)
    Our AOGAT87.8481.8581.9279.2177.6376.74
    w/o GAT85.3280.9779.5677.6375.8773.98
    w/o BiGRU86.7879.4280.5678.1276.2374.04
    w/o Attention85.6580.1779.8780.2375.9273.42

    The removal of any module led to a decrease in model performance. The degradation caused by removing the GAT module is attributed to its effective use of point-by-point computation of attention coefficients, which leverages the attention mechanism based on syntactic dependency to enhance performance. Similarly, removing the BiGRU module degrades performance because the bi-directional GRU further extracts features from the pre-trained model, capturing superior feature vectors. The removal of aspect-focused attention also results in performance degradation, as this mechanism enables the model to concentrate on aspect vocabulary, thereby improving its expressive power.

    4.6. Example analyses

    To explore the nuanced impact of various models on aspect-level sentiment classification, we conducted a detailed examination of three samples from the test set. We compared the performance of R-GAT, DualGCN+BERT, TD-LSTM, ASGCN and AOGAT. The results, presented in Table 4, use the symbols P, O and N to denote positive, neutral and negative emotions, respectively.

    Table 4. Comparison of AOGAT model and baseline model case studies.

    ExamplesAspect wordsR-GATDualGCN+BERTTD-LSTMASGCNAOGATTrue_label
    The new team collaboration software significantly enhances productivity in the workplace, fostering a positive and efficient work environment.EnhancesO×PN×N×PP
    The new fitness app seamlessly combines personalized workout plans with engaging challenges, promoting both individual wellness and a sense of community.CombinesPPO×PPP
    PromotesN×O×N×PPP
    The latest software update introduces a more user-friendly interface but unfortunately brings about slower system performance.IntroducesOOP×OOO
    PerformanceNNNP×NN

    Examining Table 4 reveals that the presence of two aspects with distinct sentiment polarities in the third sentence may introduce complexity in the model’s decision-making process. Consequently, GNN-based approaches like R-GAT outperform traditional deep learning methods such as TD-LSTM in these scenarios. Pre-trained model-based methods achieve better results compared to others due to their use of dynamic word vectors and large-scale text training. The proposed AOGAT model accurately predicts sentiment for all three samples, underscoring its capability to effectively leverage pre-trained models, combining semantic and syntactic information to yield superior classification outcomes.

    4.7. Impact of the number of GCNs

    To investigate the impact of the number of GAT layers (N1) in the AOGAT model on performance, we conducted comparative experiments on three benchmark datasets. The number of GAT layers varied from 1 to 7, and the results are shown in Fig. 4. The findings indicate that the model achieves optimal performance with two GAT layers. Adding more GAT layers introduces noise, which negatively affects performance.

    Fig. 4.

    Fig. 4. Effect of the number of GAT.

    4.8. Effect of regularity coefficients on model performance

    The regularization penalty term enhances the model’s generalization ability. To assess this, experiments were conducted on the Laptop dataset using different L2 regularization coefficients and batch sizes. The performance metrics were recorded and are presented in Fig. 5. The results indicate that the model performs best with a batch size of 16 and an L2 coefficient of 1×104.

    Fig. 5.

    Fig. 5. Impact of L2 regularity coefficient on the model.

    5. Conclusion

    To address the limitations of existing text sentiment analysis models, which often underutilize syntactic information, we propose an enhanced GAT model. This model constructs a syntactic GAT to augment node interactions, leveraging dynamic word vectors obtained by encoding context through the ALBERT pre-training model. Simultaneously, it employs BiGRU to preserve contextual semantic information, resulting in improved accuracy in text sentiment analysis tasks. Experimental results demonstrate the model’s superiority over both traditional and recent models across three benchmark datasets. Our study also identifies potential relationships such as superlatives and near-synonyms in the text.

    Future research will focus on enhancing the model’s comprehensive extraction of syntactic information by incorporating common-sense information such as sentiment lexicons, enabling a more nuanced understanding of word associations and improving the model’s sensitivity to textual context. Additionally, the research will extend the AOGAT model to other languages, using multilingual datasets to validate and enhance the model’s cross-linguistic effectiveness and robustness, thereby improving its generalization capabilities.

    Acknowledgment

    This work was supported by the Guangdong Provincial General Colleges and Universities Key Areas Special Project (No. 2022ZDZX1042).

    ORCID

    Longming Xu  https://orcid.org/0009-0001-8473-5455