Research PaperOpen Access

Aspect-Level Sentiment Analysis Based on Lite Bidirectional Encoder Representations From Transformers and Graph Attention Networks

Longming Xu

https://orcid.org/0009-0001-8473-5455

School of Communication Engineering, Guangzhou City University of Technology, Guangzhou, Guangdong 510800, P. R. China

E-mail Address: xulm@gcu.edu.cn

Corresponding author.

Search for more papers by this author

Ping Xiao

School of Communication Engineering, Guangzhou City University of Technology, Guangzhou, Guangdong 510800, P. R. China

E-mail Address: xiaop@gcu.edu.cn

Search for more papers by this author

, and

Huixia Zeng

School of Communication Engineering, Guangzhou City University of Technology, Guangzhou, Guangdong 510800, P. R. China

E-mail Address: zenghx@gcu.edu.cn

Search for more papers by this author

https://doi.org/10.1142/S0218126625500628Cited by:0 (Source: Crossref)

Abstract

Aspect-level sentiment analysis is a critical component of sentiment analysis, aiming to determine the sentiment polarity associated with specific aspect words. However, existing methodologies have limitations in effectively managing aspect-level sentiment analysis. These limitations include insufficient utilization of syntactic information and an inability to precisely capture the contextual nuances surrounding aspect words. To address these issues, we propose an Aspect-Oriented Graph Attention Network (AOGAT) model. This model incorporates syntactic information to generate dynamic word vectors through the pre-trained model ALBERT and combines a graph attention network with BiGRU to capture both syntactic and semantic features. Additionally, the model introduces an aspect-focused attention mechanism to retrieve features related to aspect words and integrates the generated representations for sentiment classification. Our experiments on three datasets demonstrate that the AOGAT model outperforms traditional models.

This paper was recommended by Regional Editor Takuro Sato.

Keywords:

1. Introduction

Aspect-level sentiment classification aims to identify sentiment tendencies in sentences concerning specific aspects. Early methods for this task combined manually labeled features with traditional machine learning techniques. However, the advent of deep learning,^1,2 particularly neural networks,^3,4 has led to a shift toward automated feature extraction, making deep learning-based approaches predominant in sentiment analysis.⁵ In a seminal work, Tang et al.⁶ integrated aspectual word information into Long Short-Term Memory (LSTM) networks for semantic encoding, demonstrating the critical importance of including aspectual word information. To further refine the integration of aspectual and contextual information, researchers have increasingly adopted attention mechanisms. For instance, Joshi et al.⁷ utilized Convolutional Neural Networks (CNN) within a textual attention-based neural network. This model aimed to extract features and model semantic relationships between sentences and aspectual words. However, this attention-based approach primarily focused on semantic aspects, potentially neglecting valuable syntactic information.

In recent years, researchers have proposed various approaches using graph neural networks (GNNs)⁸ and have achieved significant performance improvements. GNNs effectively utilize syntactic information, leading to important advances in the field. Zhao et al.⁹ employed Graph Convolutional Networks (GCNs) to address emotional dependencies between multiple aspects, achieving good results. Similarly, Wang et al.¹⁰ used Graph Attention Networks (GATs) for Aspect-Based Sentiment Analysis (ABSA), demonstrating that converting syntactic information into a syntactic dependency tree during data preprocessing, with the designated aspect as the root node, improved model performance. These experiments highlighted that generating aspect-specific representations significantly enhanced model effectiveness.

With the advancement of language models, pre-trained models like BERT and ALBERT have shown significant success across various natural language processing tasks. In sentiment analysis, these models transform static word vectors into dynamic counterparts, enabling more nuanced and dynamic semantic representations. This development effectively addresses challenges in sentiment analysis for lengthy sentences, establishing these models as standards in the field. To enhance the utilization of both semantic and syntactic information within sentences, this study employs ALBERT to extract dynamic word vectors, GAT to capture syntactic details, and BiGRU to capture semantic nuances. Aspect-focused attention is then applied to facilitate interaction between sentences and aspect words, thereby extracting sentiment information more comprehensively.

Building on the preceding analysis, we introduce the Aspect-Oriented Graph Attention Network (AOGAT) model, which aims to fully leverage both semantic and syntactic information within sentences. This model synergizes GAT and BiGRU to concurrently extract syntactic and semantic details. Initially, ALBERT is employed to derive dynamic word vectors, which are then input into both GAT and BiGRU. GAT utilizes the adjacency matrix from the dependency parser to extract syntactic features, while BiGRU captures the hidden state expressions of the text. Aspect-focused attention is subsequently applied to retrieve crucial features related to the semantics of aspectual words from the hidden state vectors of both GAT and BiGRU. Ultimately, the sentiment classification layer predicts sentiment polarity. The model’s efficacy is validated through experiments on three benchmark datasets.

In summary, this paper’s contributions lie in proposing the AOGAT model, which seamlessly integrates GAT and BiGRU to extract and combine semantic and syntactic information. Our main research findings are as follows :

(1)	This study generated dynamic word vectors using ALBERT, an optimized version of BERT, which reduces the parameter count and enhances operational speed compared to BERT.
(2)	We introduce the AOGAT model, leveraging ALBERT for input processing and employing a hybrid neural network comprising GAT, BiGRU and aspect-focused attention to effectively address the Aspect-Based Sentiment Classification (ABSC) task.
(3)	Comprehensive experimentation determined that the AOGAT model outperformed other baseline models, demonstrating superior performance on the SemEval 2014 and Twitter datasets.

2. Related Work

In the rapidly expanding Internet industry, online evaluations have become increasingly prevalent, often involving multiple aspects of the same entity. To provide comprehensive and detailed sentiment information to web users, targeted and finer-grained sentiment category analysis is required. Aspect-level sentiment categorization aims to ascertain the sentiment polarity (positive, neutral, negative) associated with specific aspects mentioned in a text. For example, in the statement “The service is good, but the location is remote,” the term “service” is evaluated positively, while “location” is evaluated negatively. This finer analysis allows for a nuanced understanding of sentiments expressed toward distinct aspects within a given text.

Various research methodologies and techniques have emerged to address aspect-level sentiment analysis tasks. Machine learning and sentiment lexicon approaches necessitate manual annotation of textual features, which are then used to build classifiers that determine the overall sentiment polarity of the text. Keshavarz et al.¹¹ employed an approach combining a corpus and a lexicon to create an adaptive sentiment lexicon, enhancing the judgment of sentiment polarity. Ding et al.¹² addressed deficiencies in sentiment lexicons by extending them using synonyms and antonyms from WordNet and word co-occurrence information, achieving a more thorough assessment of sentiment polarity. Pang et al.¹³ simplified sentiment classification to binary classification (ignoring neutrality) and applied three machine learning algorithms to predict sentiments in movie review data.

Deep learning approaches eliminate the need for intricate feature engineering by autonomously learning mappings from data to high-level features. In natural language processing, these methods consistently outperform traditional approaches. The rapid evolution of neural network technology has led to novel models yielding remarkable results in sentiment analysis tasks. The attention mechanism has emerged as a powerful tool for recognizing sentiment information by calculating the semantic correlation between aspects and text, effectively highlighting emotional nuances. Wang et al.¹⁴ adopted this approach, using the attention mechanism to weigh the encoding of the textual hidden layer in long and short-term memory networks, significantly improving model performance. Fan et al.¹⁵ introduced interactive attention and proposed a multi-granularity attention approach, integrating two types of attention to enhance the model’s judgment capabilities. These mechanisms refine the understanding of sentiment information, capturing different granularity levels of sentiment in the text and making the analysis more comprehensive and accurate.

However, models relying solely on the attention mechanism focus exclusively on the semantic aspects of sentences, neglecting syntactic information. In sentences with multiple emotion words of opposing polarities, the attention mechanism may be swayed by words unrelated to the aspectual words. For example, in Fig. 1, the sentiment toward “location” might be influenced by the positive opinion word “good,” which is actually associated with “service.”

Fig. 1. An example sentence with a dependency tree.

GNNs have significant advantages in processing unstructured information. In sentiment analysis tasks, syntactic information helps models identify sentiment information more accurately. Compared to the attention mechanism, GNNs process syntactic information more finely, resulting in more accurate access to sentiment information related to aspectual sentiment polarity. Leveraging GNN capabilities enhances our grasp of syntactic structure in textual data, elevating sentiment analysis model performance and adaptability to diverse unstructured information contexts. Sun et al.¹⁶ employed GCNs to derive node representations from a dependency tree, incorporating these representations into a sentiment classification task. Huang et al.¹³ introduced GATs, integrating the attention mechanism into GCNs, enhancing model flexibility and precision in establishing word dependencies, and improving sentiment classification performance.

With advancements in language modeling, pre-trained models like BERT¹⁷ have demonstrated remarkable efficacy in handling sentiment analysis for lengthy sentences, becoming prevalent in sentiment analysis. Sun et al.¹⁸ enhanced BERT through fine-tuning, constructing auxiliary sentences incorporating aspectual words and transforming aspect-level sentiment analysis into a sentence pair classification problem, yielding excellent text classification performance. Xia et al.¹⁹ focused on different contexts to obtain deep contextual information through a BERT-based context perceptron. Google researchers and others improved BERT, introducing ALBERT,²⁰ which reduces parameter count and enhances operational speed. Leveraging these models’ strengths, this paper introduces the AOGAT model, which extracts both semantic and syntactic information from pre-trained ALBERT using a combination of GAT and Bidirectional Gated Recurrent Units (BiGRU). It employs Aspect-Focused Attention to consider important features in both context and aspectual terminology, providing sufficient information for accurately identifying the affective polarity of given aspects.

3. Proposed AOGAT-Based Aspect-Level Sentiment Analysis Approach

The proposed AOGAT for ABSA consists of three layers: the sentence encoding layer, the aspect-focused hybrid neural network layer and the output layer. The network structure is depicted in Fig. 2.

3.1. Sentence coding layer

The text embedding layer maps the words in the comment text into word vectors that can be processed by the neural network model. Given a text of length n, a sequence of words $s = {w_{1}, w_{2}, \dots, w_{n}}$ $s = {w_{1}, w_{2}, \dots, w_{n}}$ , the text s contains a sequence of aspect words of length $m (1 \leq m < n) a = {w_{a}, \dots, w_{a + m - 2}, w_{a + m + 1}}$ $m (1 \leq m < n) a = {w_{a}, \dots, w_{a + m - 2}, w_{a + m + 1}}$ . The embedding of the text s is expressed as $h_{s} = {e_{1}^{s}, e_{2}^{s}, \dots, e_{n - 1}^{s}, e_{n}^{s}}$ $h_{s} = {e_{1}^{s}, e_{2}^{s}, \dots, e_{n - 1}^{s}, e_{n}^{s}}$ , and the embedding of aspect words $h_{a} = {e_{1}^{a}, e_{2}^{a}, \dots, e_{m - 1}^{a}, e_{m}^{a}}$ $h_{a} = {e_{1}^{a}, e_{2}^{a}, \dots, e_{m - 1}^{a}, e_{m}^{a}}$ , where $e_{i} \in ℝ^{d_{i} \times n}$ , the variable n represents the length of the input sentence and $d_{i}$ denotes the word embedding dimension of ALBERT.

The ALBERT model, through its transformer-based architecture and self-attention mechanism, is capable of generating dynamic word vectors that not only richly express the semantic and syntactic information of the text, but also capture the relationships between words and the subtleties of context. This capability allows the model’s GAT and BiGRU components to more effectively extract text features, thereby achieving more accurate identification of sentiment polarity associated with specific aspects of the text in sentiment classification tasks, enhancing the overall performance of the model.

3.2. Aspect focused hybrid neural networks

A GAT extends traditional CNNs to process graph-structured data. Like GCNs, GATs encode local information in unstructured data while incorporating an attention mechanism to flexibly capture relationships between nodes. This paper’s GAT model introduces a multi-head attention mechanism, allowing it to focus on different subsets of nodes simultaneously, thereby capturing complex patterns within the graph structure. Each attention head has its own set of parameters, enabling the network to dynamically determine which parts of the graph to focus on during learning. The specific formula is provided as follows:

e i j = α ([W h i ∥ W h j]), <math display="block" altimg="eq-00007.gif"><msub><mrow><mi>e</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>=</mo><mi>α</mi><mo stretchy="false">(</mo><mo stretchy="false">[</mo><mi>W</mi><msub><mrow><mi>h</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>∥</mo><mi>W</mi><msub><mrow><mi>h</mi></mrow><mrow><mi>j</mi></mrow></msub><mo stretchy="false">]</mo><mo stretchy="false">)</mo><mo>,</mo></math> (1)

aij=exp(LeakyReLU(αT[Whi∥Whj]))∑k∈Niexp(LeakyReLU(aT[Whi∥Whk])),<math display="block" altimg="eq-00008.gif"><msub><mrow><mi>a</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>=</mo><mfrac><mrow><mo>exp</mo><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">LeakyReLU</mtext></mstyle><mo stretchy="false">(</mo><msup><mrow><mi>α</mi></mrow><mrow><mi>T</mi></mrow></msup><mo stretchy="false">[</mo><mi>W</mi><msub><mrow><mi>h</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>∥</mo><mi>W</mi><msub><mrow><mi>h</mi></mrow><mrow><mi>j</mi></mrow></msub><mo stretchy="false">]</mo><mo stretchy="false">)</mo><mo stretchy="false">)</mo></mrow><mrow><munder><mrow><mo>∑</mo></mrow><mrow><mi>k</mi><mo>∈</mo><msub><mrow><mi>N</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow></munder><mo>exp</mo><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">LeakyReLU</mtext></mstyle><mo stretchy="false">(</mo><msup><mrow><mi>a</mi></mrow><mrow><mi>T</mi></mrow></msup><mo stretchy="false">[</mo><mi>W</mi><msub><mrow><mi>h</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>∥</mo><mi>W</mi><msub><mrow><mi>h</mi></mrow><mrow><mi>k</mi></mrow></msub><mo stretchy="false">]</mo><mo stretchy="false">)</mo><mo stretchy="false">)</mo></mrow></mfrac><mo>,</mo></math>(2)

where W is the shared parameter used for feature enhancement,

$[\cdot ∥ \cdot]$ denotes the feature vectors of vertices

$i \cdot j$ that have been spliced through a linear transformation. Specifically, we utilize the dynamic word embeddings generated by ALBERT as the initial node features and employ the GAT to compute the attention weights between nodes, capturing the syntactic and semantic relationships between them. The attention weights are calculated using a multi-head attention mechanism, ultimately producing an aggregated representation for each node.

A single-layer attention mechanism may not effectively handle the neighboring nodes relative to the target.²¹ Therefore, this paper introduces a multi-head attention mechanism, where K independent attention mechanisms perform computations. Their resulting features are concatenated and averaged to produce the final output feature representation:

→h′i=σ(1KK∑k=1∑j∈NiakijWk→hj),<math display="block" altimg="eq-00011.gif"><mover><mrow><msubsup><mrow><mi>h</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>′</mi></mrow></msubsup></mrow><mo>→</mo></mover><mo>=</mo><mi>σ</mi><mfenced separators="" open="(" close=")"><mrow><mfrac><mrow><mn>1</mn></mrow><mrow><mi>K</mi></mrow></mfrac><munderover accentunder="true" accent="true"><mrow><mo>∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>K</mi></mrow></munderover><munder><mrow><mo>∑</mo></mrow><mrow><mi>j</mi><mo>∈</mo><msub><mrow><mi>N</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow></munder><msubsup><mrow><mi>a</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow><mrow><mi>k</mi></mrow></msubsup><msup><mrow><mi>W</mi></mrow><mrow><mi>k</mi></mrow></msup><mover><mrow><msub><mrow><mi>h</mi></mrow><mrow><mi>j</mi></mrow></msub></mrow><mo> →</mo></mover></mrow></mfenced><mo>,</mo></math>(3)

where

$a_{i j}^{k}$ denotes the attention score of the kth head.

$W^{k}$ is the input linear transformation matrix, and

$\vec{h_{i}^{'}}$ denotes the feature representation of node i after integrating information from its neighboring nodes.

The contextual hidden vectors $h_{s}$ and adjacency matrices $A_{i j} \in R^{n \times n}$ (where n denotes the number of nodes) generated by the ALBERT model are subsequently input into the GAT. The final outputs are derived from the training of the L-layer graph-attentive network $G^{L} = {g_{1}^{L}, g_{2}^{L}, \dots, g_{n - 1}^{L}, g_{n}^{L}}$ .

The Gated Recurrent Unit (GRU), illustrated in Fig. 3, is a variant of the Recurrent Neural Network (RNN).²² In the AOGAT model, the Bidirectional GRU (BiGRU) receives hidden state vectors produced by ALBERT as input, encoding inputs from both forward and backward directions. GRU addresses the vanishing gradient problem found in traditional RNNs by incorporating update gates and reset gates. The update gate $z_{t}$ determines the extent to which information from the previous hidden state is incorporated into the current hidden state, while the reset gate $r_{t}$ controls how much the current input influences the memory neurons. These gating mechanisms enable the model to better manage which information to retain or discard, enhancing the capture of sentence semantics and contextual relationships. The formula for GRU is provided as follows:

z t = σ (W x z x t + W h z h t - 1), <math display="block" altimg="eq-00020.gif"><msub><mrow><mi>z</mi></mrow><mrow><mi>t</mi></mrow></msub><mo>=</mo><mi>σ</mi><mo stretchy="false">(</mo><msub><mrow><mi>W</mi></mrow><mrow><mi>x</mi><mi>z</mi></mrow></msub><msub><mrow><mi>x</mi></mrow><mrow><mi>t</mi></mrow></msub><mo>+</mo><msub><mrow><mi>W</mi></mrow><mrow><mi>h</mi><mi>z</mi></mrow></msub><msub><mrow><mi>h</mi></mrow><mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub><mo stretchy="false">)</mo><mo>,</mo></math> (4)

r t = σ (W x r x t + W h r h t - 1), <math display="block" altimg="eq-00021.gif"><msub><mrow><mi>r</mi></mrow><mrow><mi>t</mi></mrow></msub><mo>=</mo><mi>σ</mi><mo stretchy="false">(</mo><msub><mrow><mi>W</mi></mrow><mrow><mi>x</mi><mi>r</mi></mrow></msub><msub><mrow><mi>x</mi></mrow><mrow><mi>t</mi></mrow></msub><mo>+</mo><msub><mrow><mi>W</mi></mrow><mrow><mi>h</mi><mi>r</mi></mrow></msub><msub><mrow><mi>h</mi></mrow><mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub><mo stretchy="false">)</mo><mo>,</mo></math> (5)

̃ h t = σ (W x t + U (r \cdot h t - 1)), <math display="block" altimg="eq-00022.gif"><mover accent="true"><mrow><msub><mrow><mi>h</mi></mrow><mrow><mi>t</mi></mrow></msub></mrow><mo>̃</mo></mover><mo>=</mo><mi>σ</mi><mo stretchy="false">(</mo><mi>W</mi><msub><mrow><mi>x</mi></mrow><mrow><mi>t</mi></mrow></msub><mo>+</mo><mi>U</mi><mo stretchy="false">(</mo><mi>r</mi><mo>\cdot</mo><msub><mrow><mi>h</mi></mrow><mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo>,</mo></math> (6)

h t = (1 - z t) * h t - 1 + z t * ̃ h t . <math display="block" altimg="eq-00023.gif"><msub><mrow><mi>h</mi></mrow><mrow><mi>t</mi></mrow></msub><mo>=</mo><mo stretchy="false">(</mo><mn>1</mn><mo>-</mo><msub><mrow><mi>z</mi></mrow><mrow><mi>t</mi></mrow></msub><mo stretchy="false">)</mo><mo>*</mo><msub><mrow><mi>h</mi></mrow><mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub><mo>+</mo><msub><mrow><mi>z</mi></mrow><mrow><mi>t</mi></mrow></msub><mo>*</mo><mover accent="true"><mrow><msub><mrow><mi>h</mi></mrow><mrow><mi>t</mi></mrow></msub></mrow><mo>̃</mo></mover><mo>.</mo></math> (7)

In the given framework, let t denote the time step. $x_{t}$ represents the input to the GRU cell at time t, while $W W$ signifies the weight matrix that governs the connections within the GRU. $h_{t}$ indicates the cell state representation at time t and $\tilde{h_{t}}$ denotes the hidden layer output of the GRU cell at the same time step. The sigmoid activation function is represented by $σ$ . The contextual hidden vector $h_{s}$ , which is generated by the ALBERT model, is subsequently processed through the BiGRU to produce the resultant vector U.

Aspect-focused attention involves extracting crucial features related to aspectual semantics from hidden state vectors. To achieve this, the hidden state vectors of nonaspect words are masked, while those of aspect words remain unchanged:

C L i = 01 \leq t \leq τ + 1, τ + m < t \leq n . <math display="block" altimg="eq-00030.gif"><msubsup><mrow><mi>C</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>L</mi></mrow></msubsup><mo>=</mo><mn>0</mn><mn>1</mn><mo>\leq</mo><mi>t</mi><mo>\leq</mo><mi>τ</mi><mo>+</mo><mn>1</mn><mo>,</mo><mspace width="1em" class="quad"></mspace><mi>τ</mi><mo>+</mo><mi>m</mi><mo>&lt;</mo><mi>t</mi><mo>\leq</mo><mi>n</mi><mo>.</mo></math> (8)

The output of the masking layer is represented by the aspect-oriented feature

$C_{mask}^{L} = {0, \dots, c_{τ + 1}^{L}, \dots, c_{τ + m}^{L}, \dots, 0}$ .

$C_{mask}^{L}$ effectively captures contextual features by integrating syntactic dependencies and distant multi-word relationships. Furthermore,

$C_{mask}^{L}$ employs the dot product to quantify the semantic correlation between aspectual constituent words and other words within the sentence, thereby facilitating aspect masking. This is detailed in the following equation:

β i = τ + m \sum i = τ + 1 h c t C L i, <math display="block" altimg="eq-00034.gif"><msub><mrow><mi>β</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>=</mo><munderover accentunder="true" accent="true"><mrow><mo>\sum</mo></mrow><mrow><mi>i</mi><mo>=</mo><mi>τ</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>τ</mi><mo>+</mo><mi>m</mi></mrow></munderover><msubsup><mrow><mi>h</mi></mrow><mrow><mi>t</mi></mrow><mrow><mi>c</mi></mrow></msubsup><msubsup><mrow><mi>C</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>L</mi></mrow></msubsup><mo>,</mo></math> (9)

αt=exp(βi)∑ni=1exp(βi),<math display="block" altimg="eq-00035.gif"><msub><mrow><mi>α</mi></mrow><mrow><mi>t</mi></mrow></msub><mo>=</mo><mfrac><mrow><mo>exp</mo><mo stretchy="false">(</mo><msub><mrow><mi>β</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo></mrow><mrow><msubsup><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>n</mi></mrow></msubsup><mo>exp</mo><mo stretchy="false">(</mo><msub><mrow><mi>β</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo></mrow></mfrac><mo>,</mo></math>(10)

r = n \sum t = 1 α t h c t, <math display="block" altimg="eq-00036.gif"><mi>r</mi><mo>=</mo><munderover accentunder="true" accent="true"><mrow><mo>\sum</mo></mrow><mrow><mi>t</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>n</mi></mrow></munderover><msub><mrow><mi>α</mi></mrow><mrow><mi>t</mi></mrow></msub><msubsup><mrow><mi>h</mi></mrow><mrow><mi>t</mi></mrow><mrow><mi>c</mi></mrow></msubsup><mo>,</mo></math> (11)

where

$h_{t}^{c}$ can be the outputs

$G^{L}$ and U of GAT and BiGRU, and with the above equations, we obtain the final representations

$G^{f}$ and

$U^{f}$ .

3.3. Output layer

The aspect-focused attention layer produces representations $G^{f}$ and $U^{f}$ , which are then merged into $H_{fin}$ , subsequently, an averaging operation is applied to obtain $H_{avg}$ , and these averaged vectors are inputted into a fully connected layer to yield diverse sentiment classification outcomes:

H fin = [G : U], <math display="block" altimg="eq-00045.gif"><msub><mrow><mi>H</mi></mrow><mrow><mstyle><mtext mathvariant="normal">fin</mtext></mstyle></mrow></msub><mo>=</mo><mo stretchy="false">[</mo><mi>G</mi><mo>:</mo><mi>U</mi><mo stretchy="false">]</mo><mo>,</mo></math> (12)

x = W a H avg + b a, <math display="block" altimg="eq-00046.gif"><mi>x</mi><mo>=</mo><msub><mrow><mi>W</mi></mrow><mrow><mi>a</mi></mrow></msub><msub><mrow><mi>H</mi></mrow><mrow><mstyle><mtext mathvariant="normal">avg</mtext></mstyle></mrow></msub><mo>+</mo><msub><mrow><mi>b</mi></mrow><mrow><mi>a</mi></mrow></msub><mo>,</mo></math> (13)

y = softmax (x) . <math display="block" altimg="eq-00047.gif"><mi>y</mi><mo>=</mo><mstyle><mtext mathvariant="normal">softmax</mtext></mstyle><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>.</mo></math> (14)

3.4. Training

A gradient descent algorithm is used to train the model using cross-entropy loss and L2 regularization as shown in the following equation:

Loss = - D \sum i = 1 C \sum j = 1 ˆ p j i log p j i + λ ∥ θ ∥ 2 . <math display="block" altimg="eq-00048.gif"><mstyle><mtext mathvariant="normal">Loss</mtext></mstyle><mo>=</mo><mo>-</mo><munderover accentunder="true" accent="true"><mrow><mo>\sum</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>D</mi></mrow></munderover><munderover accentunder="true" accent="true"><mrow><mo>\sum</mo></mrow><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>C</mi></mrow></munderover><msubsup><mrow><mover accent="true"><mrow><mi>p</mi></mrow><mo>̂</mo></mover></mrow><mrow><mi>i</mi></mrow><mrow><mi>j</mi></mrow></msubsup><mo>log</mo><msubsup><mrow><mi>p</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>j</mi></mrow></msubsup><mo>+</mo><mi>λ</mi><mo>∥</mo><mi>θ</mi><msup><mrow><mo>∥</mo></mrow><mrow><mn>2</mn></mrow></msup><mo>.</mo></math> (15)

Here, D denotes the size of the training dataset.

$p_{i}^{j}$ represents the predicted sentiment category of the j-th text instance, whereas

${\hat{p}}_{i}^{j}$ signifies the true sentiment category. To address model complexity and mitigate overfitting, a regularization term

$λ ∥ θ ∥^{2}$ is included, where

$θ$ encompasses all trainable parameters and

$λ$ is the L2 regularization coefficient. In this study, the parameter C is fixed at 3.

4. Experiments

4.1. Datasets

In this study, we have chosen three publicly available datasets: the Twitter comment dataset,²³ and the SemEval-2014 dataset,²⁴ which encompasses laptop reviews (Laptop) and restaurant reviews (Restaurant). Additionally, each review within these datasets comprises one or more aspects. The statistical details of the three datasets are presented in Table 1.

**Table 1. Data set information.**
	Positive		Neural		Negative
Dataset	Train	Test	Train	Test	Train	Test
Twitter	1561	173	3127	346	1560	173
Restaurant	2164	728	637	196	807	196
Laptop	994	341	464	169	870	128

4.2. Experimental parameters

In our experimental setup, we employed the ALBERT-xLarge version with an embedding dimension of 2,048, fine-tuned using a learning rate of $2 \times 1 0^{- 5}$ . The Adam optimizer was chosen for experimentation. For the GAT, we configured it with two layers, five heads for multi-head attention, a dropout coefficient of 0.2, a training batch size of 16, and a regularization factor of $1 \times 1 0^{- 4}$ .

4.3. Baseline model

•	TD-LSTM:⁶ This approach employs two LSTM networks, designed for the pre- and post-contexts of aspect terms. The extension of LSTM to the ABSA task involves connecting the final hidden states of these two LSTM networks to predict emotional polarity.
•	ASGCN:²⁵ It leverages GCN to represent the context, utilizing syntactic information to establish dependencies between aspects and words.
•	TD-GAT:²⁶ Aspect words are treated as individual words to capture their dependencies with other words. At the same time, the dependencies between words are processed using a multi-head map attention network, allowing the model to capture their associations in a more comprehensive way.
•	KumaGCN:²⁷ It combines information from dependency graphs and latent graphs to learn grammatical features.
•	R-GAT:²⁸ The dependency tree is reconstructed, and redundant information is removed, the original GNN is extended, and a relational attention mechanism is added.
•	DGEDT:²⁹ A dual-transform structure based on dependency graph enhancement is proposed that can simultaneously fuse sequence representation and graph representation.
•	T-GCN:³⁰ A method for explicitly exploiting the ABSA dependency types of type-aware GCNs is proposed.
•	dotGCN:³¹ A discrete potential tree, specifically designed for aspects and independent of language, is introduced as an alternative structure to dependency trees.
•	DualGCN:³² This model includes SynGCN modules with orthogonal regularizers and SemGCN modules with differential regularizers.
•	SSEGCN:³³ The model integrates an aspect-aware attention mechanism, through graph convolution operations on this attention score matrix, the model enhances the representation of nodes.

4.4. Comparison with the state-of-the-art methods

In this section, we assess the sentiment classification performance of the AOGAT model against the baseline model across the three datasets. The tabulated experimental results can be found in Table 2.

**Table 2. Comparison of Accuracy and F1 on the Three Datasets.**
	Restaurant		Laptop		Twitter
Models	Accuracy (%)	Marco-F1 (%)	Accuracy (%)	Marco-F1 (%)	Accuracy (%)	Marco-F1 (%)
TD-LSTM	77.46	65.70	69.42	61.53	71.02	70.21
ASGCN	81.32	72.85	74.98	71.54	73.12	69.23
TD-GAT	80.23	72.76	73.86	71.43	72.21	70.34
KumaGCN	82.54	72.98	76.76	71.63	72.86	71.64
R-GAT	83.12	77.34	76.94	74.01	75.64	72.98
DGEDT	84.54	74.78	77.26	73.32	73.87	72.54
T-GCN	83.54	75.21	76.36	72.09	74.16	73.19
DualGCN	83.89	79.45	79.43	75.34	76.32	74.87
SSEGCN	85.67	76.96	80.26	76.91	77.46	74.67
$DGEDT + BERT$	86.75	80.47	79.34	75.79	76.91	75.78
T- $GCN + BERT$	85.67	79.34	80.76	77.27	76.76	76.12
$dotGCN + BERT$	86.54	81.47	80.75	78.76	78.75	78.33
$DualGCN + BERT$	87.76	80.65	81.58	77.95	76.93	75.56
$SSEGCN + BERT$	87.85	80.38	80.84	77.56	77.26	75.34
Our AOGAT	87.84	81.85	81.92	79.21	77.63	76.74

Table 2 illustrates that the AOGAT model achieved the highest performance. In contrast, the TD-LSTM model exhibited shortcomings, primarily due to its simplified approach to processing aspect words, failing to fully utilize them for modeling context words. GNN-based methods, such as ASGCN and R-GAT, account for syntactic relations in sentences and perform better than traditional semantic-based methods. Models like SSEGCN+BERT and DualGCN+BERT leverage the powerful feature representation capabilities of the BERT pre-trained model, resulting in substantial improvements across all metrics. Our approach addresses these issues by incorporating both syntactic and traditional semantic information from sentences and leveraging the pre-trained ALBERT model to extract dynamic word vectors. The use of aspect-focused attention ensures precise focus on information relevant to aspect words, significantly enhancing the model’s classification accuracy.

4.5. Ablation experiment

To evaluate the significance of each module within the AOGAT model, we conducted a series of ablation experiments, the results of which are presented in Table 3.

**Table 3. Ablation experiments with the AOGAT model.**
	Restaurant		Laptop		Twitter
Models	Accuracy (%)	Marco-F1 (%)	Accuracy (%)	Marco-F1 (%)	Accuracy (%)	Marco-F1 (%)
Our AOGAT	87.84	81.85	81.92	79.21	77.63	76.74
$w/o$ GAT	85.32	80.97	79.56	77.63	75.87	73.98
$w/o$ BiGRU	86.78	79.42	80.56	78.12	76.23	74.04
$w/o$ Attention	85.65	80.17	79.87	80.23	75.92	73.42

The removal of any module led to a decrease in model performance. The degradation caused by removing the GAT module is attributed to its effective use of point-by-point computation of attention coefficients, which leverages the attention mechanism based on syntactic dependency to enhance performance. Similarly, removing the BiGRU module degrades performance because the bi-directional GRU further extracts features from the pre-trained model, capturing superior feature vectors. The removal of aspect-focused attention also results in performance degradation, as this mechanism enables the model to concentrate on aspect vocabulary, thereby improving its expressive power.

4.6. Example analyses

To explore the nuanced impact of various models on aspect-level sentiment classification, we conducted a detailed examination of three samples from the test set. We compared the performance of R-GAT, DualGCN+BERT, TD-LSTM, ASGCN and AOGAT. The results, presented in Table 4, use the symbols P, O and N to denote positive, neutral and negative emotions, respectively.

**Table 4. Comparison of AOGAT model and baseline model case studies.**
Examples	Aspect words	R-GAT	$DualGCN + BERT$	TD-LSTM	ASGCN	AOGAT	True_label
The new team collaboration software significantly enhances productivity in the workplace, fostering a positive and efficient work environment.	Enhances	$O_{\times}$	$P_{\sqrt}$	$N_{\times}$	$N_{\times}$	$P_{\sqrt}$	$P$
The new fitness app seamlessly combines personalized workout plans with engaging challenges, promoting both individual wellness and a sense of community.	Combines	$P_{\sqrt}$	$P_{\sqrt}$	$O_{\times}$	$P_{\sqrt}$	$P_{\sqrt}$	$P$
	Promotes	$N_{\times}$	$O_{\times}$	$N_{\times}$	$P_{\sqrt}$	$P_{\sqrt}$	$P$
The latest software update introduces a more user-friendly interface but unfortunately brings about slower system performance.	Introduces	$O_{\sqrt}$	$O_{\sqrt}$	$P_{\times}$	$O_{\sqrt}$	$O_{\sqrt}$	$O$
	Performance	$N_{\sqrt}$	$N_{\sqrt}$	$N_{\sqrt}$	$P_{\times}$	$N_{\sqrt}$	$N$

Examining Table 4 reveals that the presence of two aspects with distinct sentiment polarities in the third sentence may introduce complexity in the model’s decision-making process. Consequently, GNN-based approaches like R-GAT outperform traditional deep learning methods such as TD-LSTM in these scenarios. Pre-trained model-based methods achieve better results compared to others due to their use of dynamic word vectors and large-scale text training. The proposed AOGAT model accurately predicts sentiment for all three samples, underscoring its capability to effectively leverage pre-trained models, combining semantic and syntactic information to yield superior classification outcomes.

4.7. Impact of the number of GCNs

To investigate the impact of the number of GAT layers ( $N_{1}$ ) in the AOGAT model on performance, we conducted comparative experiments on three benchmark datasets. The number of GAT layers varied from 1 to 7, and the results are shown in Fig. 4. The findings indicate that the model achieves optimal performance with two GAT layers. Adding more GAT layers introduces noise, which negatively affects performance.

4.8. Effect of regularity coefficients on model performance

The regularization penalty term enhances the model’s generalization ability. To assess this, experiments were conducted on the Laptop dataset using different L2 regularization coefficients and batch sizes. The performance metrics were recorded and are presented in Fig. 5. The results indicate that the model performs best with a batch size of 16 and an L2 coefficient of $1 \times 1 0^{- 4}$ .

Fig. 5. Impact of L2 regularity coefficient on the model.

5. Conclusion

To address the limitations of existing text sentiment analysis models, which often underutilize syntactic information, we propose an enhanced GAT model. This model constructs a syntactic GAT to augment node interactions, leveraging dynamic word vectors obtained by encoding context through the ALBERT pre-training model. Simultaneously, it employs BiGRU to preserve contextual semantic information, resulting in improved accuracy in text sentiment analysis tasks. Experimental results demonstrate the model’s superiority over both traditional and recent models across three benchmark datasets. Our study also identifies potential relationships such as superlatives and near-synonyms in the text.

Future research will focus on enhancing the model’s comprehensive extraction of syntactic information by incorporating common-sense information such as sentiment lexicons, enabling a more nuanced understanding of word associations and improving the model’s sensitivity to textual context. Additionally, the research will extend the AOGAT model to other languages, using multilingual datasets to validate and enhance the model’s cross-linguistic effectiveness and robustness, thereby improving its generalization capabilities.

Acknowledgment

This work was supported by the Guangdong Provincial General Colleges and Universities Key Areas Special Project (No. 2022ZDZX1042).

ORCID

Longming Xu https://orcid.org/0009-0001-8473-5455

References

1. L. Tan, K. Yu, A. K. Bashir, X. Cheng, F. Ming, L. Zhao and X. Zhou , Toward real-time and efficient cardiovascular monitoring for COVID-19 patients by 5G-enabled wearable medical devices: A deep learning approach, Neural Comput. Appl. 35 (2023) 1–14. Crossref, Web of Science, Google Scholar
2. J. H. Syu, J. C. W. Lin, G. Srivastava and K. Yu , A comprehensive survey on artificial intelligence empowered edge computing on consumer electronics, IEEE Trans. Consumer Electron. 69(4) (2023) 1023–1034. Crossref, Web of Science, Google Scholar
3. J. Zhang, L. Zhao, K. Yu, G. Min, A. Y. Al-Dubai and A. Y. Zomaya , A novel federated learning scheme for generative adversarial networks, IEEE Trans. Mobile Comput. 23(5) (2023) 3633–3649. Crossref, Web of Science, Google Scholar
4. Q. He, L. Zhang, H. Fang, X. Wang, L. Ma, K. Yu and J. Zhang , Multistage competitive opinion maximization with Q-learning-based method in social networks, IEEE Trans. Neural Netw. Learn. Syst. (2024) 1–11, https://doi.org/10.1109/TNNLS.2024.3387293. Web of Science, Google Scholar
5. X. Qiu, T. Sun, Y. Xu, Y. Shao, N. Dai and X. Huang , Pre-trained models for natural language processing: A survey, Sci. China Technologic. Sci. 63 (2020) 1872–1897. Crossref, Web of Science, Google Scholar
6. D. Y. Tang, B. Qin, X. C. Feng and T. Liu, Effective LSTMs for Target-Dependent Sentiment Classification, arxiv:1512.01100 (2015). Google Scholar
7. Compositions for sentiment analysis of Hindi–English code mixed text, Proc. COLING 2016, the 26th Int. Conf. Computational Linguistics: Technical Papers (The COLING 2016 Organizing Committee, 2016), pp. 2482–2491. Google Scholar
8. X. Wang, K. Guan, D. He, A. Hrovat, R. Liu, Z. Zhong and K. Yu , Graph neural network enabled propagation graph method for channel modeling, IEEE Trans. Vehicular Technol. (2024) 1–11, https://doi.org/10.1109/TVT.2024.3382650. Web of Science, Google Scholar
9. X. Li, R. Lu, P. Liu and Z. Zhu , Graph convolutional networks with hierarchical multi-head attention for aspect-level sentiment classification, J. Supercomput. 78 (2022) 14846–14865. Crossref, Web of Science, Google Scholar
10. K. Wang, W. Z. Shen, Y. Y. Yang, X. J. Quan and R. Wang , Relational graph attention network for aspect-based sentiment analysis, Proc. 58th Annual Meet. Association for Computational Linguistics (2020), pp. 3229–3238. Crossref, Google Scholar
11. H. Keshavarz and M. S. Abadeh , ALGA: Adaptive lexicon learning using genetic algorithm for sentiment analysis of microblogs, Knowled. Based Syst. 122 (2017) 1–16. Crossref, Web of Science, Google Scholar
12. X. Ding, B. Liu and P. S. Yu , A holistic lexicon-based approach to opinion mining. Web Search and Data Mining, Proc. 2008 Int. Conf. Web Search and Data Mining Vol. 57 (2008), pp. 231–240. Google Scholar
13. B. Pang, L. Lee and S. Vaithyanathan , Thumbs up? Sentiment classification using machine learning techniques, Proc. 2002 Conf. Empirical Methods in Natural Language Processing (EMNLP), (2002), pp. 79–86. Google Scholar
14. Y. Q. Wang, M. L. Huang, X. Y. Zhu and L. Zhao , Attention-based LSTM for aspect-level sentiment classification, Proc. 2016 Conf. Empirical Methods in Natural Language Processing, (2016), pp. 606–615. Crossref, Google Scholar
15. F. F. Fan, Y. S. Feng and D. Y. Zhao , Multi-grained attention network for aspect-level sentiment classification, Proc. 2018 Conf. Empirical Methods in Natural Language Processing, (2018) 3433–3442, https://doi.org/10.18653/v1/D18-1380. Crossref, Google Scholar
16. K. Sun, R. C. Zhang, S. Mensah, Y. Y. Mao and X. D. Liu , Aspect-level sentiment analysis via convolution over dependency tree, Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. Natural Language Processing, (EMNLP-IJCNLP), (2019), pp. 5679–5688. Crossref, Google Scholar
17. J. Devlin, M. W. Chang, K. Lee and K. Toutanova , BERT: Pre-training of deep bidirectional transformers for language understanding, Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (2019), pp. 4171–4186. Google Scholar
18. C. Sun, L. Y. Huang and X. P. Qiu , Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence, Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (2019), pp. 380–385. Google Scholar
19. C. Xia, C. Zhang, H. Nguyen, J. Zhang and P. S. Yu, CG-BERT: Conditional Text Generation with BERT for Generalized Few-shot Intent Detection. arxiv:abs/2004.01881 (2020). Google Scholar
20. Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma and R. Soricut, ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. arxiv:abs/1909.11942 (2019). Google Scholar
21. A. Vaswani, N. M. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin , Attention is all you need, Neural Inform. Process. Syst. 30 (2017) 5998–6008. Google Scholar
22. J. Chung, C. Gülçehre, K. Cho and Y. Bengio, Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arxiv:abs/1412.3555 (2014). Google Scholar
23. L. Dong, F. R. Wei, C. Q. Tan, D. Y. Tang, M. Zhou and K. Xu , Adaptive recursive neural network for target-dependent twitter sentiment classification, Proc. 52nd Annu. Meet. Assoc. Comput. Linguist. 2 (2014) 49–54. Google Scholar
24. M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos and S. Manandhar , SemEval-2014 Task 4: Aspect based sentiment analysis, Proc. 8th Int. Workshop on Semantic Evaluation, SemEval, (2014), pp. 27–35. Crossref, Google Scholar
25. C. Zhang, Q. C. Li and D. W. Song , Aspect-based sentiment classification with aspect-specific graph convolutional networks, Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. Natural Language Processing, (EMNLP-IJCNLP), (2019), pp. 4568–4578. Crossref, Google Scholar
26. B. X. Huang and K. Carley , Syntax-aware aspect level sentiment classification with graph attention networks, Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. Natural Language Processing, (EMNLP-IJCNLP), (2019), pp. 5469–5477. Crossref, Google Scholar
27. M. Zhang and T. Y. Qian , Convolution over hierarchical syntactic and lexical graphs for aspect level sentiment analysis, Proc. 2020 Conf. Empirical Methods in Natural Language Processing, (EMNLP), (2020), pp. 3540–3549. Crossref, Google Scholar
28. Y. Song, J. Wang, T. Jiang, Z. Liu and Y. Rao , Attentional encoder network for targeted sentiment classification, Int. Conf. Artif. Neural Netw. arxiv:abs/1902.09314 (2019). Google Scholar
29. H. Tang, D. H. Ji, C. L. Li and Q. J. Zhou , Dependency graph enhanced dual-transformer structure for aspect-based sentiment classification, Proc. 58th Annu. Meet. Association for Computational Linguistics, (2020), pp. 6578–6588. Crossref, Google Scholar
30. Y. Tian, G. Chen and Y. Song , Aspect-based sentiment analysis with type-aware graph convolutional networks and layer ensemble, Proc. 2021 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (2021), pp. 2910–2922. Crossref, Google Scholar
31. C. Chen, Z. Teng, Z. Wang and Y. Zhang , Discrete opinion tree induction for aspect-based sentiment analysis, Proc. 60th Annu. Meet. Assoc. Comput. Linguist., Vol. 1 (2022), pp. 2051–2064. Crossref, Google Scholar
32. R. Li, H. Chen, F. Feng, Z. Ma, X. Wang and E. H. Hovy , Dual graph convolutional networks for aspect-based sentiment analysis, Proc. 59th Annu. Meet. Assoc. Comput. Linguist. 11th Int. Joint Conf. Nat. Lang. Process., Vol. 1 (2021), pp. 6319–6329. Crossref, Google Scholar
33. Z. Zhang, Z. Zhou and Y. Wang , SSEGCN: Syntactic and semantic enhanced graph convolutional network for aspect-based sentiment analysis, Proc. 2022 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (2022), pp. 4916–4925. Crossref, Google Scholar

Vol. 34, No. 02

Metrics

Downloaded 230 times

History

Received 18 June 2024

Accepted 17 August 2024

Published: 28 September 2024

Information

This is an Open Access article published by World Scientific Publishing Company. It is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 (CC BY-NC-ND) License, which permits use, distribution and reproduction, provided that the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.

Keywords

PDF download