A Delayed Spiking Neural Membrane System for Adaptive Nearest Neighbor-Based Density Peak Clustering
Abstract
Although the density peak clustering (DPC) algorithm can effectively distribute samples and quickly identify noise points, it lacks adaptability and cannot consider the local data structure. In addition, clustering algorithms generally suffer from high time complexity. Prior research suggests that clustering algorithms grounded in P systems can mitigate time complexity concerns. Within the realm of membrane systems (P systems), spiking neural P systems (SN P systems), inspired by biological nervous systems, are third-generation neural networks that possess intricate structures and offer substantial parallelism advantages. Thus, this study first improved the DPC by introducing the maximum nearest neighbor distance and K-nearest neighbors (KNN). Moreover, a method based on delayed spiking neural P systems (DSN P systems) was proposed to improve the performance of the algorithm. Subsequently, the DSNP-ANDPC algorithm was proposed. The effectiveness of DSNP–ANDPC was evaluated through comprehensive evaluations across four synthetic datasets and 10 real-world datasets. The proposed method outperformed the other comparison methods in most cases.
1. Introduction
As the domain of data mining, clustering assumes a pivotal role in both computer vision and natural language processing.1 Density peak clustering (DPC) is a typical case and a density-based algorithm.2 Distinguished from distance-based and distribution-based clustering algorithms, DPC exhibits adaptability across datasets of diverse shapes, with a broad spectrum of applications encompassing domains such as recognition3,4 and detection.5,6
In recent years, several studies have been conducted to improve the DPC algorithm, primarily through the construction of a similarity matrix, selection of parameters, calculation of the local density and determination of the number of clusters. Seyedi et al.7 harnessed the concept of K-nearest neighbors (KNN) to calculate truncation parameters. Zhang et al.8 proposed a method for selecting cluster centers based on density-decay graphs. Guo et al.9 proposed a DPC algorithm designed to optimize the natural nearest-neighbor computations. Fang et al.10 proposed a grid-based DPC algorithm wherein the local density of the DPC was replaced by the density of grid cells, thereby eliminating the need for a cutoff distance. Zhou et al.11 proposed a multi-exemplar affinity propagation clustering algorithm based on local density peaks.
Although the utilization of the DPC algorithm based on KNN can effectively leverage the local characteristics inherent in datasets, it cannot autonomously ascertain the optimal number of clusters. Fang et al.12 proposed an adaptive search method for identifying clustering core points and introduced a kernel fusion strategy to obtain the final clustering outcomes. Flores et al.13 proposed a method for voluntarily confirming the cluster center by gauging the gap between data in a one-dimensional decision diagram. Liang et al.14 defined a boundary separation density that enabled automatic cluster quantity determination. However, when the density distribution is uneven, the adherence to globally consistent parameters can result in catastrophic errors. To address these limitations, Rui et al.15 proposed a density-based adaptive clustering algorithm and designed an adaptive cutoff distance setting method based on the local density using a shared KNN and a conflict game. Furthermore, the challenge of achieving automatic cluster center selection in the DPC algorithm has also been explored in previous studies.16,17
Clustering is often plagued by a high time complexity. Membrane systems, characterized by their substantial parallelism advantages, have garnered significant attention in the quest to combine membrane computing with clustering.18 Their central idea involved sacrificing space to reduce time, which can be realized through cell division. In recent years, neural-like P systems have received considerable attention, particularly spiking neural P systems (SN P systems), which are inspired by biological neural networks, such as spiking neural networks.19 The incorporation of spiking neuron concepts into membrane computing20 has yielded several advantages, most notably distributed parallelism and highly adaptable structural configurations,21 particularly in ensemble learning.22 Owing to their strong structural plasticity, many SN P systems with different structures and communication mechanisms can be easily constructed. There have been many achievements in theory and application, among which the theoretical achievements have been the most complete.
The completeness of the theory encompasses the formal definition and proof of the calculation ability.23 Drawing inspiration from certain attributes of neuronal cells, many innovative neural P systems has been devised. For example, a dendritic P system was proposed based on the feedback mechanism of dendritic structures in neurons.24 An SN P system with adaptive synaptic time delay is proposed based on the dynamic regulation mechanism of synaptic transmission delay.25 Certain studies have shown that calcium signaling in astrocytes is the basis of their function,26 and an SN P system involving astrocytes has been proposed.27 In contrast to the traditional firing rules in SN P systems, the conditions of this system depend on the spikes obtained in neurons and the calcium units received by astrocytes. As third-generation neural networks, SN P systems share structural and functional similarities with artificial neural networks, which has prompted the proposal of SN P systems rooted in neural network principles.28
As a calculation model, SN P systems exhibit high distributed parallelism. However, traditional SN P systems function as discrete computational models that rely on spike encoding; therefore, they can only address discrete integer problems. To address this limitation, a numerical SN P system29 that incorporates the principles of numerical P systems into the SN P framework was proposed. This transformation shifted the core computational unit from spikes to numerical values. In addition, a hypergraph-based numerical spiking neural membrane systems was developed.30 By introducing hypergraphs, traditional neural structures are extended to high-dimensional nonlinear spaces. These studies allowed the SN P system to realize the continuous calculation of real values to a certain extent. Moreover, the signal was coded by variable values rather than being converted into spike train coding.
Currently, SNP systems such as classification31 and image processing32 are widely used. Notably, research on the combination of SNP systems and clustering algorithms is lacking.33 Nevertheless, the flexible structural framework of SN P systems offers significant advantages for solving clustering problems. Consequently, this paper constructs a clustering model based on an SN P system to reduce the complexity of the clustering algorithm. However, it is imperative to adhere to the principles of uncertainty and maximal parallelism when applying SN P rules, which can introduce limitations in algorithm implementation. By contrast, the rules within delayed spiking neural P (DSN P) systems encompass time intervals, thereby imposing a degree of orderliness and determinism on their utilization.34 Therefore, in this study, we designed a clustering model based on DSN P systems. Notably, the clustering problem solved in the framework of SN P system is obtained only in an ideal framework, and no practical membrane computing implementation method has been found on the electronic computer.
DPC is a noniterative algorithm for fast searching and finding density peak. In contrast to other algorithms that require iterative optimization procedures, the structure of DPC algorithm based on SN P systems is simpler and easier to implement. Building on this foundation, this paper proposes DPC based on a DSN P system. This study has two main contributions. (1) The DPC algorithm was improved by introducing the maximum nearest neighbor distance and KNN. This improvement facilitated the automatic calculation of the neighborhood radius, and the number of neighbors to be adaptively selected according to the maximum nearest neighbor distance instead of choosing KNN. Furthermore, a method for calculating the local density was improved, thereby avoiding an artificial setting of truncation distance. (2) An adaptive nearest neighbor-based DPC based on a DSN P system (DSNP-ANDPC) was designed. The algorithm process was realized using the structure and rules of a DSN P system. Finally, the algorithm was verified using synthetic datasets and UCI datasets.
The remainder of this paper is organized as follow. Section 2 introduces the DPC algorithm and DSN P systems. Section 3 presents the proposed DSNP–ANDPC algorithm, which uses a DSNP system to realize the improved DPC algorithm. Section 4 presents the experimental results of the DSNP-ANDPC algorithm. Finally, Sec. 5 presents the conclusions and further work.
2. Related Work
2.1. An introduction of DPC
The DPC has two primary hypotheses. First, the local density of the clustering center is greater than that of the other points in this cluster. Second, the distance between cluster centers is relatively far. Therefore, the DPC has two primary steps: computing of the local density ρiρi and relative distance δi. The related definitions are as follows.
Definition 1. The local density about point xi is computed as
Definition 2. Relative distance between points xi :
The relative distance can be construed as the minimum separation between a given sample point xi and other points with greater local densities. In particular, when the local density increases, the relative distance increases.
After computing the local density and relative distance, a decision diagram with the local density as the horizontal axis and the relative distance as the vertical axis was constructed. Subsequently, in this decision diagram, the initial cluster centers were determined by selecting data points with higher values for both ρi and δi. Finally, the remaining points were allocated to the cluster closest to the cluster centers.
Based on the above analysis, the DPC algorithm has two primary deficiencies. First, its self-adaptive capability is poor, which is primarily attributed to the manual setting of the crucial cutoff distance parameter, which significantly affects density calculations. Second, during the computation of the local density, the algorithm does not consider the local structural characteristics of inherent data, consequently leading to the inadvertent omission of clusters. A lower density cluster is likely to be misclassified if higher-density points are assigned to the incorrect cluster.
2.2. DSN P systems with scheduled rules
Definition 3. A DSN P system of degree m≥1 is formally defined by
(1) | O={a} is the alphabet, and the symbol a is used to represent a spike. | ||||
(2) | σ1,σ2,…,σm are neurons of the form (ni,Ri), ni∈N+, and Ri is the set of rules associated with neuron σi represented in two distinctive forms :
| ||||
(3) | ref⊆{1,2,…,m} represents a collection of reference neurons. Once the reference neuron is activated, the rule begins to be applied, indicating that the rule in the connecting neuron has taken effect. | ||||
(4) | syn⊆{1,…,m}×{1,…,m} indicates synapses for any (i,j)∈syn, 1≤i,j≤h, i≠j. | ||||
(5) | in and out indicate the input neuron and the output neuron, respectively. |
The schedules of DSN P systems depend on the interplay of spiking and forgetting rules, which collectively regulate the behavior of spikes within the neuron. The DSN P system functions persistently over the entire temporal span. When the activation criteria are satisfied, a rule effectively constrains and governs the behavior of the neuron for a defined duration. For example, in Fig. 1, a3/a2→a;[1,2,3) implies that the rule can only operate within the time interval [1,2,3) when there are three spikes in the neuron: receiving spikes from other neurons at time 1, consuming two spikes to generate one spike at time 2, and sending it to the connected neuron before time 3. After the application of this rule, only the rules with time intervals of [3,X) could be used. Therefore, at time 3, rule a→a;[3,4) was used instead of Rule a→a;[4,5). The use of time intervals enabled the orderly execution of rules.

Fig. 1. Example of DSNP system operation.
3. Methods
3.1. Improved DPC algorithm
Several studies have shown that KNN offers advantages in calculating the density of the DPC algorithm. For a given dataset point xi, its KNN refer to the k points in the dataset that are closest to xi, defined as
Definition 4. 8 The nearest neighbor distance represents the proximity between a point xi and its closest neighboring points, denoted as NND(xi), that is, NND(i)=minj,i≠jdij.
The maximum nearest neighbor distance is denoted as MNND
Definition 5. In our algorithm, the neighbor of point xi is defined by
Definition 6. The local density in our algorithm is redefined as
3.2. The framework of improved DPC algorithm based on a DSN P system
Leveraging the inherent attributes of distributed parallelism and scalability within the framework of P systems offers the potential to substantially ameliorate time complexity from a theoretical perspective. Furthermore, the Turing universality of DSN P systems has been proved. Hence, DSN P systems can simulate the process of a clustering algorithm that has been theoretically established. A DSN P system was designed to realize an improved DPC algorithm, as shown in Fig. 2. The objects in each neuron were spikes with potential values, which may be values, vectors, or matrices, and the clustering process was realized by rules.

Fig. 2. The process of a DSNP system for improved DPC algorithm.
The DSN P system for the DPC algorithm is described as follows :
3.3. Rules on the DSN P system
Each neuron has at least one rule, which can be described as follows:
(a) The rule in input neurons σin1,…,σinn is in the form of R1:axi→axi, and its purpose is to read data into the system.
(b) The rule in neurons σD1,…,σDn is
(c) The data exist in the neuron σD as a matrix :
The rule in neuron σD is R3:ali→aki,[i−1,i), where li=(di1,di2,…,din) is the ith row vector, and is a vector computed using Eqs. (5) and (6). This rule screened distances less than or equal to half of the maximum nearest neighbor distance.
(d) The rule in a neuron σK1,σK2,…,σKn is R4:aki→aρi, and the relationship between ki and ρi satisfies Eqs. (7) and (8). This rule calculated the local density, that is, ρi.
(e) The rule in σL1,σL2,…,σLn is R5:aρi→aδi. This step calculated the relative distance using Eq. (3).
(f) The rule in σRD1,σRD2,…,σRDn is R6:aρi+δi→aγi, where γi=ρi×δi. All the spikes produced with σRD1,σRD2,…,σRDn are sent to the neuron σP and stored as a set {γ1,γ2,…,γn}. This step filtered points with higher local densities and relative distance values in the next step.
(g) The rule in σP is
Through this rule, the positions of the points corresponding to the top k largest values in γ were selected and saved as vectors P with no changes in the position. The purpose of this rule was to locate the points corresponding to the first k maximum γ values. They were then sent to neurons σS1,…,σSn sequentially. Considering γ={2,3,1,4} as an example, we selected the first two biggest ones as targets, and then neurons σP produced P={λ,a,λ,a}. Therefore, σP sent no spikes to neurons σS1 and σS3, and sent one spike to neurons σS2 and σS4. This step determined the positions of the cluster centers.
(h) The rule in neurons σS1,…,σSn was
In addition to the spikes received from σsi, σsi received spikes from σxi. If the number of spikes in neurons σsi exceeded xi, then the spikes xi were generated. If the number of spikes was xi, no spikes were generated. For example, when σS2 received one spike from σP, x2 spikes were sent by σxi, and there were x2+1>x2 spikes in σS2. Consequently, the rule (axi)+/aui→axi,[n,n+1) was used to generate x2 spikes. Further, x2 could be regarded as a cluster center. This step was used to select the k clustering centers.
(i) The rule in neurons σCi (1≤i≤t) is R9:E/axj−xi→axj,[j−1,j), where E is a regular expression in the form aε, ε=min∥xj−xi∥. The rule was triggered only when the distance between xj and the cluster center xi was the smallest, and point xj would be sent to the corresponding output neuron σouti, that is, points xj and xi belong to the same cluster. The rule for this step was the classification of the points closest to the cluster center into one category based on the Euclidean distance between other points and the cluster center.
(j) Neuron σouti is an output neuron that represents a cluster i, and the points inside are all points near the cluster center xi. The output neurons represented the clustering results.
3.4. The DSNP-ANDPC algorithm
According to the above rules, the algorithm flow of the improved DPC based on the DSNP system was described as DSNP-ANDPC.
3.5. Analysis of complexity
This analysis was limited to the DPC algorithm of the DSNP system framework. We ensured that the dataset contained N points and M clusters. The time complexity of DSNP-ANDPC relies principally on the following points: (a) the cost of calculating distance is O(N) because rule R2 requires N iterations to complete the operation; (b) the cost of calculating local densities is O(N) because rules R3 and R4 require N iterations to complete the operation; (d) the cost of calculating the relative distance is through rules R5 and R6 involves consumption once; (e) selecting initial cluster centers cost O(N) because rule R7 requires N iterations; (f) assigning every point to the corresponding cluster cost O(N). Each neuron σCi must run rule R9 N times; therefore, this step requires N iterations. Consequently, a time complexity of O(N) for DSNP-ANDPC was obtained.
4. Experiment and Analysis
The performance of the DSNP-ANDPC algorithm was analyzed through experimental results for synthetic and real datasets, and compared with those of six clustering algorithms: K-means,35 spectral Clustering (SC),36 DPC,2 DPC-KNN,37 DPC-CE,9 and MDPC+.38 Four synthetic datasets and 10 real datasets were used, such as compound dataset, Iris, zoo, etc. Except for the DPC-CE and MDPC+, which used the code published by the authors, all other algorithms were implemented in Python3.7. The system configuration was windows10, Intel(R)Core(TM)i7-9750HCPU and 16.0GB of RAM. Because the dataset and evaluation indices used in this study were different from those used in other studies, all comparison algorithms were rerun. K-means and SC use the algorithm package in scikit-learn directly, wherein SC uses the Gaussian kernel function and implements DPC-KNN and DSNP-ANDPC based on DPC code. Each algorithm was executed 20 times, and the final result was obtained by averaging these 20 executions. Moreover, stochastic processes are absent in all DPC algorithms. Hence, when parameters remained constant, the outcomes remained invariant across multiple iterations.

4.1. Evaluation metrics
The DSNP-ANDPC algorithm uses four distinct cluster evaluation metrics: adjusted rand index (ARI), normalized mutual information (NMI), F-score (F1), and accuracy (ACC).
ARI is calculated as
NMI is calculated as
F1 score is calculated as
ACC is calculated as
4.2. Experiments on synthetic datasets
This study conducts a comparative analysis of five distinct algorithms, namely K-means, SC, DPC-KNN, DPC-CE, and DSNP-ANDPC, using four synthetic datasets. Table 1 presents comprehensive details of the selected artificial datasets. Subsequently, the evaluation of clustering performance relies on the assessment of NMI and ACC. Table 2 elucidates the parameter configurations for each algorithm across various experimental datasets and presents the corresponding experimental outcomes for all five algorithms.
Datasets | Instances | Dimensions | Clusters |
---|---|---|---|
R15 | 600 | 2 | 15 |
D31 | 3100 | 2 | 31 |
Compound | 7266 | 2 | 6 |
Aggregation | 788 | 2 | 7 |
Datasets | Algorithm | Parameter | NMI | ACC |
---|---|---|---|---|
R15 | K-means | K=15 | 0.9942 | 0.9967 |
SC | σ=1, K=15 | 0.9942 | 0.9967 | |
DPC-KNN | p=0.02 | 0.9860 | 0.9787 | |
DPC-CE | dc=0.02, 0.25, 0.3 | 0.9892 | 0.9967 | |
DSNP-ANDPC | — | 0.9942 | 0.9967 | |
D31 | K-means | K=31 | 0.9675 | 0.9771 |
SC | σ=2, K=31 | 0.9652 | 0.9751 | |
DPC-KNN | p=0.02 | 0.9570 | 0.9680 | |
DPC-CE | dc=0.02, 0.25, 0.3 | 0.9586 | 0.9690 | |
DSNP-ANDPC | — | 0.9706 | 0.9800 | |
Compound | K-means | K=6 | 0.6541 | 0.7195 |
SC | σ=1, K=6 | 0.6616 | 0.5311 | |
DPC-KNN | p=0.01 | 0.7042 | 0.7419 | |
DPC-CE | dc=0.02, 0.25, 0.3 | 0.6751 | 0.7393 | |
DSNP-ANDPC | — | 0.7469 | 0.8404 | |
Aggregation | K-means | K=7 | 0.7842 | 0.8780 |
SC | σ=1, K=7 | 0.9467 | 0.9635 | |
DPC-KNN | p=0.01 | 0.9957 | 0.9987 | |
DPC-CE | dc=0.02,0.25,0.3 | 0.9956 | 0.9987 | |
DSNP-ANDPC | — | 0.9957 | 0.9987 |
The five algorithms under comparison entail distinct parameter configurations, necessitating proactive manual input prior to analysis. The specific parameter settings for each algorithm are listed in Table 2. For instance, K-means required specifying the number of clusters, expressed as K, which was set to the actual number of clusters. However, SC must manually input the parameter σ of the Gaussian kernel function and the number of clusters K. DPC requires a cutoff distance dc ranging from 0% to 20%, and DPC-KNN necessitates a percentage parameter p, ranging from 0% to 10%. Further, DPC-CE necessitated the input of the cutoff distance dc ranging from 0% to 20%. As in the original paper, the radios of threshold and punishment were set to Tr=0.25 and Pr=0.3. Remarkably, the DSNP-ANDPC algorithm stood apart because it obviates the need for manual input of parameter values, offering a distinctive attribute in this comparative analysis. It is worth noting that this study selects the center of DPC, DPC-KNN, DPC-CE, MDPC+, and DSNP-ANDPC algorithms by calculating the product of local density and relative distance.
Figure 3 shows the clustering results of five algorithms on the R15 dataset which is characterized by 15 clusters and 600 data points. Notably, the visual disparities between these algorithms appear inconspicuous in graphical representation. However, as shown in Table 2, NMI and ACC of DSNP-ANDPC were the best, although the results were the same as those of K-means and spectral clustering. The D31 dataset contained 3100 points and 31 clusters, and subtle visual distinctions are shown in Fig. 4. From the analysis in Table 2, it can be concluded that the NMI and ACC of DSNP-ANDPC were greater than those of the other four algorithms. Thus, the proposed DSNP-ANDPC algorithm had the best effect on datasets R15 and D31.

Fig. 3. Experimental results of four clustering algorithms on the R15 dataset.

Fig. 4. Experimental results of four clustering algorithms on D31 dataset.
The compound dataset encompasses 7266 data points, which were categorized into six distinct clusters. The clustering results depicted in Fig. 5 revealed the superiority of the DSNP-ANDPC algorithm over its counterparts. This observation was further substantiated by the values presented in Table 2, where both the ACC and NMI values for DSNP-ANDPC exhibit a substantial advantage over the other algorithms.

Fig. 5. Experimental results of four clustering algorithms on the Compound dataset.
Moreover, the aggregation dataset comprises seven clusters, two of which are interconnected. The clustering results are visually represented in Fig. 6, where it is evident that several points were erroneously allocated. Suboptimal cluster center selection using the K-means algorithm resulted in the poorest clustering outcomes. The SC algorithm also exhibited misclassification issues in the upper-left cluster. However, an analysis of the data in Table 2 shows that the DPC-KNN and DSNP-ANDPC algorithms shared the highest levels of ACC and NMI, signifying their superior performance in this context.

Fig. 6. Experimental results of four clustering algorithms on aggregation dataset.
4.3. Experiments on real datasets
Ten real datasets are selected for testing (Table 3). All real datasets were obtained from the UCI database.39 Moreover, parameter settings are different for the different datasets, and the settings are listed in Table 4. Except for the MDPC+ algorithm, the parameter settings for other algorithms are the same as those described in Sec. 4.2. And MDPC+ required an attenuation coefficient λ and the number of neighbors k as λ=2 and k=√n, where n is the number of samples.
Dataset | Instances | Dimensions | Clusters |
---|---|---|---|
Iris | 150 | 4 | 3 |
Seeds | 210 | 7 | 3 |
Zoo | 101 | 18 | 7 |
Spambase | 5473 | 10 | 2 |
Banknote | 1372 | 4 | 2 |
E.coli | 336 | 7 | 8 |
WDBC | 569 | 30 | 2 |
Breast Cancer | 569 | 9 | 2 |
Spambase | 4601 | 57 | 2 |
Raisin | 900 | 7 | 2 |
K-means | SC | DPC | DPC-KNN | DPC-CE | MDPC+ | DSNP-ANDPC | |
---|---|---|---|---|---|---|---|
Iris | 3 | 1,3 | 0.2 | 0.01 | 0.02,0.25,0.3 | 2 | — |
Seeds | 3 | 0.1,3 | 0.2 | 0.02 | 0.02,0.25,0.3 | 2 | — |
Zoo | 7 | 0.5,7 | 0.2 | 0.02 | 0.02,0.25,0.3 | 2 | — |
PageBlock | 5 | 1,5 | 0.2 | 0.02 | 0.02,0.25,0.3 | 2 | — |
Banknote | 2 | 2,2 | 0.2 | 0.02 | 0.02,0.25,0.3 | 2 | — |
E.coli | 8 | 1,8 | 0.02 | 0.02 | 0.02,0.25,0.3 | 2 | — |
WDBC | 2 | 0.8,2 | 0.02 | 0.02 | 0.02,0.25,0.3 | 2 | — |
Breast Cancer | 2 | 200,2 | 0.02 | 0.02 | 0.02,0.25,0.3 | 2 | — |
Spambase | 2 | 2,2 | 0.2 | 0.02 | 0.02,0.25,0.3 | 2 | — |
Raisin | 2 | 100,2 | 0.02 | 0.02 | 0.02,0.25,0.3 | 2 | — |
The clustering results were evaluated according to ARI, NMI, F1 and ACC. The scores of the seven algorithms are listed in Table 5. For the convenience of comparison, the best results are indicated in bold. DSNP-ANDPC has the best results in terms of four evaluation indices on five datasets: Iris, Seeds, Zoo, Banknote, and Raisin datasets, and was superior to other comparison algorithms.
K-means | SC | DPC | DPC-KNN | DPC-CE | MDPC+ | DSNP-ANDPC | ||
---|---|---|---|---|---|---|---|---|
Iris | ARI | 73.02 | 74.36 | 75.92 | 88.58 | 75.92 | 76.70 | 94.10 |
NMI | 75.82 | 76.61 | 80.57 | 87.05 | 79.60 | 81.08 | 91.92 | |
F1 | 89.33 | 90.00 | 90.67 | 95.99 | 84.04 | 90.99 | 98.00 | |
ACC | 89.33 | 90.00 | 90.67 | 96.00 | 90.67 | 91.16 | 98.00 | |
Seeds | ARI | 71.66 | 71.45 | 70.27 | 71.70 | 72.88 | 71.50 | 76.24 |
NMI | 69.49 | 68.89 | 69.83 | 67.44 | 68.51 | 66.16 | 71.44 | |
F1 | 89.52 | 89.52 | 88.57 | 89.46 | 89.65 | 89.40 | 91.40 | |
ACC | 89.52 | 89.52 | 88.57 | 89.52 | 90.00 | 89.52 | 91.43 | |
Zoo | ARI | 66.29 | 70.94 | 47.43 | 45.90 | 72.59 | 53.92 | 82.38 |
NMI | 75.33 | 72.70 | 60.54 | 60.22 | 79.10 | 70.65 | 81.69 | |
F1 | 75.49 | 78.21 | 60.26 | 64.03 | 78.16 | 69.80 | 79.81 | |
ACC | 75.49 | 78.21 | 70.30 | 71.29 | 81.19 | 72.88 | 81.19 | |
PageBlock | ARI | 12.34 | 31.73 | 25.86 | 41.65 | — | 35.93 | 41.27 |
NMI | 5.51 | 17.07 | 14.19 | 27.37 | 29.99 | — | 30.20 | |
F1 | 81.17 | 87.63 | 82.83 | 88.74 | 89.51 | 89.62 | 89.97 | |
ACC | 80.88 | 87.63 | 80.30 | 91.10 | 89.77 | 91.56 | 92.20 | |
Banknote | ARI | 4.85 | 13.82 | 23.08 | 7.05 | 20.32 | 0.14 | 23.22 |
NMI | 3.03 | 22.71 | 34.71 | 4.64 | 30.57 | 12.81 | 34.80 | |
F1 | 61.22 | 64.09 | 73.11 | 62.84 | 63.88 | 64.39 | 73.20 | |
ACC | 61.22 | 68.97 | 74.13 | 63.41 | 72.67 | 54.75 | 74.20 | |
E.coli | ARI | 40.54 | 37.23 | 46.78 | 65.44 | 73.26 | 62.95 | 73.11 |
NMI | 55.52 | 47.64 | 51.67 | 63.44 | 60.70 | — | 70.99 | |
F1 | 63.87 | 64.70 | 64.95 | 69.97 | 81.35 | 74.70 | 72.78 | |
ACC | 59.23 | 69.32 | 62.91 | 75.6 | 77.68 | 74.70 | 78.57 | |
WDBC | ARI | 49.14 | 0.13 | 20.88 | 42.56 | 13.35 | 54.98 | 33.83 |
NMI | 46.72 | 00.22 | 20.57 | 41.71 | 13.02 | 52.59 | 47.71 | |
F1 | 84.43 | 48.38 | 71.84 | 81.71 | 70.06 | 86.63 | 85.71 | |
ACC | 85.41 | 62.74 | 72.89 | 83.13 | 70.83 | 87.35 | 86.19 | |
Breast Cancer | ARI | 49.14 | 0.03 | 25.3 | −1.47 | 42.36 | 62.43 | 62.53 |
NMI | 46.72 | 0.05 | 34.51 | 1.70 | 34.11 | 53.56 | 50.96 | |
F1 | 84.43 | 53.25 | 67.57 | 47.86 | 77.43 | 89.53 | 89.72 | |
ACC | 85.41 | 53.25 | 58.42 | 60.46 | 83.30 | 89.53 | 89.60 | |
Spambase | ARI | 3.94 | 0.30 | 4.82 | 1.65 | 49.69 | 10.94 | 13.92 |
NMI | 4.72 | 1.51 | 6.35 | 4.91 | 37.12 | 6.87 | 8.55 | |
F1 | 63.59 | 45.52 | 64.31 | 44.38 | 75.77 | 66.64 | 67.93 | |
ACC | 63.59 | 60.16 | 64.31 | 57.77 | 64.42 | 66.67 | 69.14 | |
Raisin | ARI | 16.66 | 0.05 | 20.88 | 23.83 | 32.79 | 2.28 | 33.83 |
NMI | 26.1 | 0.13 | 20.57 | 26.18 | 26.47 | 3.28 | 34.18 | |
F1 | 67.75 | 50.74 | 71.84 | 74.94 | 66.91 | 61.70 | 78.43 | |
ACC | 70.44 | 51.61 | 72.89 | 75.11 | 78.66 | 57.67 | 79.11 |
For the PageBlock dataset, the NMI, F1 and ACC values obtained by DSNP-ANDPC are clearly higher than those of the other algorithms. For the E.coli dataset, the DPC-CE algorithm obtained the highest ARI and F1 values, followed by the DSNP-ANDPC algorithm. For the WDBC dataset, the MDPC+ algorithm outperformed the DSNP–ANDPC algorithm, except for ACC. In the Breast Cancer dataset, except for NMI, the other values of the DSNP–ANDPC algorithm were greater than those of MDPC+. For the Spambase dataset, DPC-CE was outperformed by the DSNP-ANDPC algorithm in terms of ACC, but it was the best in terms of the other three evaluation indices. Therefore, DSNP-ANDPC is superior to DPC, DPC-KNN, DPC-CE, MDPC+, K-means and SC algorithm in most cases.
Heatmaps were utilized, as demonstrated in Fig. 7, to present the results in a more intuitive manner. In this context, darker blue shades correspond to higher values, signifying superior outcomes. It is evident that the algorithm proposed in this paper, DSNP-ANDPC, performs best on most datasets, with the last row having the darkest shade of blue, particularly in Figs. 7(c) and 7(d).

Fig. 7. Heatmaps of ARI(a), NMI(b), F1(c), and ACC(d) values for seven algorithms across 10 datasets.
5. Conclusion
The DPC algorithm offers the advantage of automatically discovering cluster centers and the ability to process arbitrarily shaped data. However, the algorithm has several limitations, such as the high impact of local density calculations on the clustering results and high time complexity. In this study, to overcome the shortcomings of the DPC algorithm, the maximum nearest neighbor distance and KNN algorithms were introduced. In addition, an improved DPC algorithm was integrated into a DSN P system and a DSNP-ANDPC algorithm was proposed. This algorithm exploited the distributed parallel computing advantages of membrane computing to construct a clustering model based on a DSNP system. The clustering process of the improved DPC algorithm was implemented using the rules and objects of the DSNP system to reduce the algorithm time complexity. The use of KNN and the maximum nearest-neighbor distance to avoid setting a truncation distance, and the incorporation of the local structure of the data reduced the parameter settings of the DPC algorithm and enhanced its adaptability.
Subsequently, the DSNP–ANDPC algorithm was validated using four synthetic and 10 UCI datasets. The evaluation metrics used included ARI, NMI, F1 score, and ACC. The experiments compared six other clustering algorithms, and the results showed that DSNP-ANDPC performed best on most of the datasets considered.
Future studies can focus on three research directions. First, a self-loop can be added to the DSNP system to realize other clustering algorithms via an iterative process. Second, DSNP systems can be combined with neural networks to expand the application areas of P systems such as neural dynamic classification40 and dynamic ensemble learning41 algorithms. Third, supervised machine learning42 or classification algorithms will be a important research direction. These avenues of exploration are promising for further inquiry and expansion of the research framework.
Acknowledgment
This work was supported by the Program for Youth Innovative Research Team in University of Shandong Province (2022KJ179) and the Natural Science Foundation of Rizhao City (RZ2022ZR66).