[in Journal: Vietnam Journal of Computer Science] AND [Keyword: LDA] : Search

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

Advanced Search

SEARCH GUIDE

Results: 1 - 1of1

Follow results:

per page:

Sort: Relevance

Context for search term 1Search term 1*

All Dates

LastSelect static range

Custom Range

Select starting monthSelect starting year

Select ending monthSelect ending year

Search name	Searched On	Run search
[in Journal: Vietnam Journal of Computer Science] AND [Keyword: LDA] (1)	28 Mar 2025	Run
[in Journal: Nano] AND [Keyword: HPW] (1)	28 Mar 2025	Run
[in Journal: International Journal of Information Acquisition] AND [Keyword: ROI] (2)	28 Mar 2025	Run
[in Journal: SPIN] AND [Keyword: Mgo] (1)	28 Mar 2025	Run
[in Journal: Nano] AND [Keyword: DFT] (8)	28 Mar 2025	Run

articleOpen Access
Pre-Training Clustering Models to Summarize Vietnamese Texts
- Ti-Hon Nguyen and
- Thanh-Nghi Do
Vietnam Journal of Computer Science27 Apr 2024
Preview Abstract
Our investigation aims at pre-training clustering models to summarize Vietnamese texts. For this purpose, we create a large-scale dataset by collecting Vietnamese articles from newspaper websites and extracting the plain text to build the dataset, including 1,101,101 documents. We propose a new single-document extractive text summarization model based on clustering models. Our proposal clusters the documents with the hard clustering k-means algorithm and the soft clustering LDA (Latent Dirichlet Allocation) algorithm. Then, based on the pre-training clustering models, a summary model is used to select the salient sentence in the input text to construct the summary. The empirical results showed that our summary model achieved 51.22% ROUGE-1, 17.62% ROUGE-2 and 29.16% ROUGE-L on the testing set. Besides the traditional word representation such as BoW (Bag-of-Words), we also use the word meaning-based tools like FastText and BERT (Bidirectional Encoder Representations from Transformers) in our model. The additional benefit of our proposed extractive summary model is that the output summary is a long-text, readable document. Furthermore, the model’s architecture is straightforward, easy to understand and runs on cost-efficient resources like arm CPU and GPU too.

1