Although the density peak clustering (DPC) algorithm can effectively distribute samples and quickly identify noise points, it lacks adaptability and cannot consider the local data structure. In addition, clustering algorithms generally suffer from high time complexity. Prior research suggests that clustering algorithms grounded in P systems can mitigate time complexity concerns. Within the realm of membrane systems (P systems), spiking neural P systems (SN P systems), inspired by biological nervous systems, are third-generation neural networks that possess intricate structures and offer substantial parallelism advantages. Thus, this study first improved the DPC by introducing the maximum nearest neighbor distance and K-nearest neighbors (KNN). Moreover, a method based on delayed spiking neural P systems (DSN P systems) was proposed to improve the performance of the algorithm. Subsequently, the DSNP-ANDPC algorithm was proposed. The effectiveness of DSNP–ANDPC was evaluated through comprehensive evaluations across four synthetic datasets and 10 real-world datasets. The proposed method outperformed the other comparison methods in most cases.

Keywords:

1. Introduction

As the domain of data mining, clustering assumes a pivotal role in both computer vision and natural language processing.¹ Density peak clustering (DPC) is a typical case and a density-based algorithm.² Distinguished from distance-based and distribution-based clustering algorithms, DPC exhibits adaptability across datasets of diverse shapes, with a broad spectrum of applications encompassing domains such as recognition^3,4 and detection.^5,6

In recent years, several studies have been conducted to improve the DPC algorithm, primarily through the construction of a similarity matrix, selection of parameters, calculation of the local density and determination of the number of clusters. Seyedi et al.⁷ harnessed the concept of K-nearest neighbors (KNN) to calculate truncation parameters. Zhang et al.⁸ proposed a method for selecting cluster centers based on density-decay graphs. Guo et al.⁹ proposed a DPC algorithm designed to optimize the natural nearest-neighbor computations. Fang et al.¹⁰ proposed a grid-based DPC algorithm wherein the local density of the DPC was replaced by the density of grid cells, thereby eliminating the need for a cutoff distance. Zhou et al.¹¹ proposed a multi-exemplar affinity propagation clustering algorithm based on local density peaks.

Although the utilization of the DPC algorithm based on KNN can effectively leverage the local characteristics inherent in datasets, it cannot autonomously ascertain the optimal number of clusters. Fang et al.¹² proposed an adaptive search method for identifying clustering core points and introduced a kernel fusion strategy to obtain the final clustering outcomes. Flores et al.¹³ proposed a method for voluntarily confirming the cluster center by gauging the gap between data in a one-dimensional decision diagram. Liang et al.¹⁴ defined a boundary separation density that enabled automatic cluster quantity determination. However, when the density distribution is uneven, the adherence to globally consistent parameters can result in catastrophic errors. To address these limitations, Rui et al.¹⁵ proposed a density-based adaptive clustering algorithm and designed an adaptive cutoff distance setting method based on the local density using a shared KNN and a conflict game. Furthermore, the challenge of achieving automatic cluster center selection in the DPC algorithm has also been explored in previous studies.^16,17

Clustering is often plagued by a high time complexity. Membrane systems, characterized by their substantial parallelism advantages, have garnered significant attention in the quest to combine membrane computing with clustering.¹⁸ Their central idea involved sacrificing space to reduce time, which can be realized through cell division. In recent years, neural-like P systems have received considerable attention, particularly spiking neural P systems (SN P systems), which are inspired by biological neural networks, such as spiking neural networks.¹⁹ The incorporation of spiking neuron concepts into membrane computing²⁰ has yielded several advantages, most notably distributed parallelism and highly adaptable structural configurations,²¹ particularly in ensemble learning.²² Owing to their strong structural plasticity, many SN P systems with different structures and communication mechanisms can be easily constructed. There have been many achievements in theory and application, among which the theoretical achievements have been the most complete.

The completeness of the theory encompasses the formal definition and proof of the calculation ability.²³ Drawing inspiration from certain attributes of neuronal cells, many innovative neural P systems has been devised. For example, a dendritic P system was proposed based on the feedback mechanism of dendritic structures in neurons.²⁴ An SN P system with adaptive synaptic time delay is proposed based on the dynamic regulation mechanism of synaptic transmission delay.²⁵ Certain studies have shown that calcium signaling in astrocytes is the basis of their function,²⁶ and an SN P system involving astrocytes has been proposed.²⁷ In contrast to the traditional firing rules in SN P systems, the conditions of this system depend on the spikes obtained in neurons and the calcium units received by astrocytes. As third-generation neural networks, SN P systems share structural and functional similarities with artificial neural networks, which has prompted the proposal of SN P systems rooted in neural network principles.²⁸

As a calculation model, SN P systems exhibit high distributed parallelism. However, traditional SN P systems function as discrete computational models that rely on spike encoding; therefore, they can only address discrete integer problems. To address this limitation, a numerical SN P system²⁹ that incorporates the principles of numerical P systems into the SN P framework was proposed. This transformation shifted the core computational unit from spikes to numerical values. In addition, a hypergraph-based numerical spiking neural membrane systems was developed.³⁰ By introducing hypergraphs, traditional neural structures are extended to high-dimensional nonlinear spaces. These studies allowed the SN P system to realize the continuous calculation of real values to a certain extent. Moreover, the signal was coded by variable values rather than being converted into spike train coding.

Currently, SNP systems such as classification³¹ and image processing³² are widely used. Notably, research on the combination of SNP systems and clustering algorithms is lacking.³³ Nevertheless, the flexible structural framework of SN P systems offers significant advantages for solving clustering problems. Consequently, this paper constructs a clustering model based on an SN P system to reduce the complexity of the clustering algorithm. However, it is imperative to adhere to the principles of uncertainty and maximal parallelism when applying SN P rules, which can introduce limitations in algorithm implementation. By contrast, the rules within delayed spiking neural P (DSN P) systems encompass time intervals, thereby imposing a degree of orderliness and determinism on their utilization.³⁴ Therefore, in this study, we designed a clustering model based on DSN P systems. Notably, the clustering problem solved in the framework of SN P system is obtained only in an ideal framework, and no practical membrane computing implementation method has been found on the electronic computer.

DPC is a noniterative algorithm for fast searching and finding density peak. In contrast to other algorithms that require iterative optimization procedures, the structure of DPC algorithm based on SN P systems is simpler and easier to implement. Building on this foundation, this paper proposes DPC based on a DSN P system. This study has two main contributions. (1) The DPC algorithm was improved by introducing the maximum nearest neighbor distance and KNN. This improvement facilitated the automatic calculation of the neighborhood radius, and the number of neighbors to be adaptively selected according to the maximum nearest neighbor distance instead of choosing KNN. Furthermore, a method for calculating the local density was improved, thereby avoiding an artificial setting of truncation distance. (2) An adaptive nearest neighbor-based DPC based on a DSN P system (DSNP-ANDPC) was designed. The algorithm process was realized using the structure and rules of a DSN P system. Finally, the algorithm was verified using synthetic datasets and UCI datasets.

The remainder of this paper is organized as follow. Section 2 introduces the DPC algorithm and DSN P systems. Section 3 presents the proposed DSNP–ANDPC algorithm, which uses a DSNP system to realize the improved DPC algorithm. Section 4 presents the experimental results of the DSNP-ANDPC algorithm. Finally, Sec. 5 presents the conclusions and further work.

2. Related Work

2.1. An introduction of DPC

The DPC has two primary hypotheses. First, the local density of the clustering center is greater than that of the other points in this cluster. Second, the distance between cluster centers is relatively far. Therefore, the DPC has two primary steps: computing of the local density $ρ_{i}$ $ρ_{i}$ and relative distance $δ_{i}$ . The related definitions are as follows.

Definition 1. The local density about point $x_{i}$ is computed as

ρ i = \sum i \neq j χ (d i j - d c), <math display="block" altimg="eq-00004.gif"><msub><mrow><mi>ρ</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>=</mo><munder><mrow><mo>\sum</mo></mrow><mrow><mi>i</mi><mo>\neq</mo><mi>j</mi></mrow></munder><mi>χ</mi><mo stretchy="false">(</mo><msub><mrow><mi>d</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>-</mo><msub><mrow><mi>d</mi></mrow><mrow><mi>c</mi></mrow></msub><mo stretchy="false">)</mo><mo>,</mo></math> (1)

where

$χ (d_{i j} - d_{c}) = \{\begin{matrix} 1, & d_{i j} < d_{c} \\ 0, & d_{i j} > d_{c} \end{matrix}$ , and

$d_{i j}$ denote the Euclidean distance between points

$x_{i}$ and

$x_{j}$ , and

$d_{c}$ denotes the cutoff distance. This formula can also be interpreted as the local density of the point

$x_{i}$ , which is the number of points in the neighborhood that are less than the cutoff distance. A Gaussian kernel is used to calculate the local density.

ρ i = \sum j exp (- d 2 i j / d 2 c) . <math display="block" altimg="eq-00011.gif"><msub><mrow><mi>ρ</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>=</mo><munder><mrow><mo>\sum</mo></mrow><mrow><mi>j</mi></mrow></munder><mo>exp</mo><mo stretchy="false">(</mo><mo>-</mo><msubsup><mrow><mi>d</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow><mrow><mn>2</mn></mrow></msubsup><mo stretchy="false">/</mo><msubsup><mrow><mi>d</mi></mrow><mrow><mi>c</mi></mrow><mrow><mn>2</mn></mrow></msubsup><mo stretchy="false">)</mo><mo>.</mo></math> (2)

Definition 2. Relative distance between points $x_{i}$ :

δ i = {min j : ρ j > ρ i d i j, ρ i \neq max ρ max d i j, ρ i = max ρ . <math display="block" altimg="eq-00013.gif"><msub><mrow><mi>δ</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>=</mo><mfenced separators="" open="{" close=""><mrow><mtable displaystyle="true" equalrows="false" equalcolumns="false"><mtr><mtd columnalign="left"><munder><mrow><mo>min</mo></mrow><mrow><mi>j</mi><mo>:</mo><msub><mrow><mi>ρ</mi></mrow><mrow><mi>j</mi></mrow></msub><mo>&gt;</mo><msub><mrow><mi>ρ</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow></munder><msub><mrow><mi>d</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>,</mo><mspace width="1em" class="quad"></mspace></mtd><mtd columnalign="left"><msub><mrow><mi>ρ</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>\neq</mo><mo>max</mo><mi>ρ</mi></mtd></mtr><mtr><mtd columnalign="left"><mo>max</mo><msub><mrow><mi>d</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>,</mo><mspace width="1em" class="quad"></mspace></mtd><mtd columnalign="left"><msub><mrow><mi>ρ</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>=</mo><mo>max</mo><mi>ρ</mi><mo>.</mo></mtd></mtr></mtable></mrow></mfenced></math> (3)

The relative distance can be construed as the minimum separation between a given sample point $x_{i}$ and other points with greater local densities. In particular, when the local density increases, the relative distance increases.

After computing the local density and relative distance, a decision diagram with the local density as the horizontal axis and the relative distance as the vertical axis was constructed. Subsequently, in this decision diagram, the initial cluster centers were determined by selecting data points with higher values for both $ρ_{i}$ and $δ_{i}$ . Finally, the remaining points were allocated to the cluster closest to the cluster centers.

Based on the above analysis, the DPC algorithm has two primary deficiencies. First, its self-adaptive capability is poor, which is primarily attributed to the manual setting of the crucial cutoff distance parameter, which significantly affects density calculations. Second, during the computation of the local density, the algorithm does not consider the local structural characteristics of inherent data, consequently leading to the inadvertent omission of clusters. A lower density cluster is likely to be misclassified if higher-density points are assigned to the incorrect cluster.

2.2. DSN P systems with scheduled rules

Definition 3. A DSN P system of degree $m \geq 1$ is formally defined by

Π = (O, σ 1, σ 2, \dots, σ m, ref, syn, in, out), <math display="block" altimg="eq-00018.gif"><mrow><mi mathvariant="normal">Π</mi><mo>=</mo><mo stretchy="false">(</mo><mi>O</mi><mo>,</mo><msub><mrow><mi>σ</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>,</mo><msub><mrow><mi>σ</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>,</mo><mo>\dots</mo><mo>,</mo><msub><mrow><mi>σ</mi></mrow><mrow><mi>m</mi></mrow></msub><mo>,</mo><mstyle><mtext mathvariant="normal">ref</mtext></mstyle><mo>,</mo><mstyle><mtext mathvariant="normal">syn</mtext></mstyle><mo>,</mo><mstyle><mtext mathvariant="normal">in</mtext></mstyle><mo>,</mo><mstyle><mtext mathvariant="normal">out</mtext></mstyle><mo stretchy="false">)</mo><mo>,</mo></mrow></math>

where

(1)	$O = {a}$ is the alphabet, and the symbol a is used to represent a spike.
(2)	$σ_{1}, σ_{2}, \dots, σ_{m}$ are neurons of the form $(n_{i}, R_{i})$ , $n_{i} \in N^{+}$ , and $R_{i}$ is the set of rules associated with neuron $σ_{i}$ represented in two distinctive forms : Spiking rules are all set in neurons for $E / a^{α} \to a; [t_{1}, τ, t_{2})$ , where E is a regular expression, $α \geq 1$ , $0 \leq t_{1} < τ \leq t_{2}$ . Forgetting rules all reside within neurons for $E / a^{β} \to λ; [t_{1}, τ, t_{2})$ , within which the rules are governed by the presence of a regular expression denominated E, $β \geq 1$ , $0 \leq t_{1} < τ \leq t_{2}$ .
(3)	$ref \subseteq {1, 2, \dots, m}$ represents a collection of reference neurons. Once the reference neuron is activated, the rule begins to be applied, indicating that the rule in the connecting neuron has taken effect.
(4)	$syn \subseteq {1, \dots, m} \times {1, \dots, m}$ indicates synapses for any $(i, j) \in s y n$ , $1 \leq i, j \leq h$ , $i \neq j$ .
(5)	in and out indicate the input neuron and the output neuron, respectively.

The schedules of DSN P systems depend on the interplay of spiking and forgetting rules, which collectively regulate the behavior of spikes within the neuron. The DSN P system functions persistently over the entire temporal span. When the activation criteria are satisfied, a rule effectively constrains and governs the behavior of the neuron for a defined duration. For example, in Fig. 1, $a^{3} / a^{2} \to a; [1, 2, 3)$ implies that the rule can only operate within the time interval [1,2,3) when there are three spikes in the neuron: receiving spikes from other neurons at time 1, consuming two spikes to generate one spike at time 2, and sending it to the connected neuron before time 3. After the application of this rule, only the rules with time intervals of [3,X) could be used. Therefore, at time 3, rule $a \to a; [3, 4)$ was used instead of Rule $a \to a; [4, 5)$ . The use of time intervals enabled the orderly execution of rules.

Fig. 1. Example of DSNP system operation.

3. Methods

3.1. Improved DPC algorithm

Several studies have shown that KNN offers advantages in calculating the density of the DPC algorithm. For a given dataset point $x_{i}$ , its KNN refer to the k points in the dataset that are closest to $x_{i}$ , defined as

KNN (x i) = {j | d i j \leq d i, N N k (i)}, <math display="block" altimg="eq-00041.gif"><mstyle><mtext mathvariant="normal">KNN</mtext></mstyle><mo stretchy="false">(</mo><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo><mo>=</mo><mo stretchy="false">{</mo><mi>j</mi><mi>|</mi><msub><mrow><mi>d</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>\leq</mo><msub><mrow><mi>d</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>N</mi><msub><mrow><mi>N</mi></mrow><mrow><mi>k</mi></mrow></msub><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msub><mo stretchy="false">}</mo><mo>,</mo></math> (4)

where

$d_{i j}$ is the Euclidean distance between points

$x_{i}$ and

$x_{j}$ , and

$N N_{k} (i)$ represents the Kth nearest neighbor point to

$x_{i}$ . In this study, the concepts of KNN and the maximum nearest neighbor distance are introduced into the DPC. Certain important definitions are as follows.

Definition 4. ⁸ The nearest neighbor distance represents the proximity between a point $x_{i}$ and its closest neighboring points, denoted as $NND (x_{i})$ , that is, $NND (i) = {min}_{j, i \neq j} d_{i j}$ .

The maximum nearest neighbor distance is denoted as MNND

MNND = max j NND (j) . <math display="block" altimg="eq-00050.gif"><mstyle><mtext mathvariant="normal">MNND</mtext></mstyle><mo>=</mo><munder><mrow><mo>max</mo></mrow><mrow><mi>j</mi></mrow></munder><mstyle><mtext mathvariant="normal">NND</mtext></mstyle><mo stretchy="false">(</mo><mi>j</mi><mo stretchy="false">)</mo><mo>.</mo></math> (5)

Definition 5. In our algorithm, the neighbor of point $x_{i}$ is defined by

N(xi)={j|dij≤12MNND}.<math display="block" altimg="eq-00052.gif"><mi>N</mi><mo stretchy="false">(</mo><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo><mo>=</mo><mfenced separators="" open="{" close="}"><mrow><mi>j</mi><mo stretchy="true">|</mo><msub><mrow><mi>d</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>≤</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac><mstyle><mtext mathvariant="normal">MNND</mtext></mstyle></mrow></mfenced><mo>.</mo></math>(6)

Definition 6. The local density in our algorithm is redefined as

ρi=exp(−1k∑dij≤N(xi)d2ij),<math display="block" altimg="eq-00053.gif"><msub><mrow><mi>ρ</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>=</mo><mo>exp</mo><mfenced separators="" open="(" close=")"><mrow><mo>−</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mi>k</mi></mrow></mfrac><munder><mrow><mo>∑</mo></mrow><mrow><msub><mrow><mi>d</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>≤</mo><mi>N</mi><mo stretchy="false">(</mo><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo></mrow></munder><msubsup><mrow><mi>d</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow><mrow><mn>2</mn></mrow></msubsup></mrow></mfenced><mo>,</mo></math>(7)

where k is obtained by calculating the average of neighbors of all points in the dataset,

$Conut (\cdot)$ indicates the number of elements in the set.

k=Count(N(x1)+⋯+N(xn))n.<math display="block" altimg="eq-00055.gif"><mi>k</mi><mo>=</mo><mfrac><mrow><mstyle><mtext mathvariant="normal">Count</mtext></mstyle><mo stretchy="false">(</mo><mi>N</mi><mo stretchy="false">(</mo><msub><mrow><mi>x</mi></mrow><mrow><mn>1</mn></mrow></msub><mo stretchy="false">)</mo><mo>+</mo><mo>⋯</mo><mo>+</mo><mi>N</mi><mo stretchy="false">(</mo><msub><mrow><mi>x</mi></mrow><mrow><mi>n</mi></mrow></msub><mo stretchy="false">)</mo><mo stretchy="false">)</mo></mrow><mrow><mi>n</mi></mrow></mfrac><mo>.</mo></math>(8)

3.2. The framework of improved DPC algorithm based on a DSN P system

Leveraging the inherent attributes of distributed parallelism and scalability within the framework of P systems offers the potential to substantially ameliorate time complexity from a theoretical perspective. Furthermore, the Turing universality of DSN P systems has been proved. Hence, DSN P systems can simulate the process of a clustering algorithm that has been theoretically established. A DSN P system was designed to realize an improved DPC algorithm, as shown in Fig. 2. The objects in each neuron were spikes with potential values, which may be values, vectors, or matrices, and the clustering process was realized by rules.

Fig. 2. The process of a DSNP system for improved DPC algorithm.

The DSN P system for the DPC algorithm is described as follows :

Π = (O, σ D 1, \dots, σ C k, syn, W, ref, σ in 1, \dots, σ out 1, \dots), <math display="block" altimg="eq-00056.gif"><mrow><mi mathvariant="normal">Π</mi><mo>=</mo><mo stretchy="false">(</mo><mi>O</mi><mo>,</mo><msub><mrow><mi>σ</mi></mrow><mrow><msub><mrow><mi>D</mi></mrow><mrow><mn>1</mn></mrow></msub></mrow></msub><mo>,</mo><mo>\dots</mo><mo>,</mo><msub><mrow><mi>σ</mi></mrow><mrow><msub><mrow><mi>C</mi></mrow><mrow><mi>k</mi></mrow></msub></mrow></msub><mo>,</mo><mstyle><mtext mathvariant="normal">syn</mtext></mstyle><mo>,</mo><mi>W</mi><mo>,</mo><mstyle><mtext mathvariant="normal">ref</mtext></mstyle><mo>,</mo><msub><mrow><mi>σ</mi></mrow><mrow><msub><mrow><mstyle><mtext mathvariant="normal">in</mtext></mstyle></mrow><mrow><mn>1</mn></mrow></msub></mrow></msub><mo>,</mo><mo>\dots</mo><mo>,</mo><msub><mrow><mi>σ</mi></mrow><mrow><msub><mrow><mstyle><mtext mathvariant="normal">out</mtext></mstyle></mrow><mrow><mn>1</mn></mrow></msub></mrow></msub><mo>,</mo><mo>\dots</mo><mo stretchy="false">)</mo><mo>,</mo></mrow></math>

where

$O = {a}$ represents the set of spikes,

$σ_{i}$ represents neurons,

$s y n$ represents synapses, and W represents the weights of the synapses. Initially, the input neurons

$σ_{{in}_{1}}, \dots, σ_{{in}_{n}}$ read the dataset

$X = {x_{1}, x_{2}, \dots, x_{n}}$ with the form of a spike train

$a^{x_{i}}$ , where each point corresponds to

$x_{i}$ spikes and an input neuron. Neurons

$σ_{D_{1}}, \dots, σ_{C_{k}}$ undertake different jobs. Based on the rules of the neurons, the signal is transmitted and calculated, and the ultimate outcome is transmitted to the output neurons

$σ_{{out}_{1}}, \dots, σ_{{out}_{t}}$ . Special explanations were provided for the connection modes and weights between neurons. If

$i = j$ , the weight of the input neuron

$σ_{{in}_{i}}$ and the neuron

$σ_{D_{j}}$ is 1, where

$1 \leq i \leq n$ . If

$i \neq j$ , the weight between

$σ_{{in}_{i}}$ and

$σ_{D_{j}}$ was set as

$- 1$ . In addition, neurons

$σ_{{in}_{1}}, \dots, σ_{{in}_{n}}$ and neurons

$σ_{D_{1}}, σ_{D_{2}}, \dots, σ_{D_{t}}$ are fully connected, and the synapses between

$σ_{K_{1}}, σ_{K_{2}}, \dots, σ_{K_{n}}$ and

$σ_{R D_{1}}, σ_{R D_{2}}, \dots, σ_{R D_{n}}$ are correspondingly connected. Thus,

$σ_{K_{i}}$ was connected only to

$σ_{R D_{i}}$ . The spikes in

$σ_{D}$ and

$σ_{P}$ are stored in matrix.

$σ_{S_{i}}$ is a selected neuron whose function is to select initial cluster centers.

3.3. Rules on the DSN P system

Each neuron has at least one rule, which can be described as follows:

(a) The rule in input neurons $σ_{{in}_{1}}, \dots, σ_{{in}_{n}}$ is in the form of $R_{1} : a^{x_{i}} \to a^{x_{i}}$ , and its purpose is to read data into the system.

(b) The rule in neurons $σ_{D_{1}}, \dots, σ_{D_{n}}$ is

R 2 : a x i - x 1 \to a d i 1, [0, 1) ⋮ a x i - x j \to a d i j, [1, 2) ⋮ a x i - x n \to a d in, [n - 1, n), <math display="block" altimg="eq-00086.gif"><mrow><msub><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>:</mo><mtable displaystyle="true" equalrows="false" equalcolumns="false"><mtr><mtd columnalign="center"><msup><mrow><mi>a</mi></mrow><mrow><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>-</mo><msub><mrow><mi>x</mi></mrow><mrow><mn>1</mn></mrow></msub></mrow></msup><mo>\to</mo><msup><mrow><mi>a</mi></mrow><mrow><msub><mrow><mi>d</mi></mrow><mrow><mi>i</mi><mn>1</mn></mrow></msub></mrow></msup><mo>,</mo><mo stretchy="false">[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo stretchy="false">)</mo></mtd></mtr><mtr><mtd columnalign="center"><mo>⋮</mo></mtd></mtr><mtr><mtd columnalign="center"><msup><mrow><mi>a</mi></mrow><mrow><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>-</mo><msub><mrow><mi>x</mi></mrow><mrow><mi>j</mi></mrow></msub></mrow></msup><mo>\to</mo><msup><mrow><mi>a</mi></mrow><mrow><msub><mrow><mi>d</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub></mrow></msup><mo>,</mo><mo stretchy="false">[</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo stretchy="false">)</mo></mtd></mtr><mtr><mtd columnalign="center"><mo>⋮</mo></mtd></mtr><mtr><mtd columnalign="center"><msup><mrow><mi>a</mi></mrow><mrow><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>-</mo><msub><mrow><mi>x</mi></mrow><mrow><mi>n</mi></mrow></msub></mrow></msup><mo>\to</mo><msup><mrow><mi>a</mi></mrow><mrow><msub><mrow><mi>d</mi></mrow><mrow><mtext>in</mtext></mrow></msub></mrow></msup><mo>,</mo><mo stretchy="false">[</mo><mi>n</mi><mo>-</mo><mn>1</mn><mo>,</mo><mi>n</mi><mo stretchy="false">)</mo></mtd></mtr></mtable><mo>,</mo></mrow></math>

where

$d_{i j} = ∥ x_{i} - x_{j} ∥$ is the distance between

$x_{i}$ and

$x_{j}$ . In addition, owing to the time interval after rules, the rules were applied in sequence. This rule was employed for computing the distance between data points.

(d 11 \dots d 1 n ⋮ ⋱ ⋮ d n 1 \dots d n n) . <math display="block" altimg="eq-00091.gif"><mrow><mfenced separators="" open="(" close=")"><mrow><mtable displaystyle="true" equalrows="false" equalcolumns="false"><mtr><mtd columnalign="center"><msub><mrow><mi>d</mi></mrow><mrow><mn>1</mn><mn>1</mn></mrow></msub><mspace width="1em" class="quad"></mspace></mtd><mtd columnalign="center"><mo>\dots</mo><mspace width="1em" class="quad"></mspace></mtd><mtd columnalign="center"><msub><mrow><mi>d</mi></mrow><mrow><mn>1</mn><mi>n</mi></mrow></msub></mtd></mtr><mtr><mtd columnalign="center"><mo>⋮</mo><mspace width="1em" class="quad"></mspace></mtd><mtd columnalign="center"><mo>⋱</mo><mspace width="1em" class="quad"></mspace></mtd><mtd columnalign="center"><mo>⋮</mo></mtd></mtr><mtr><mtd columnalign="center"><msub><mrow><mi>d</mi></mrow><mrow><mi>n</mi><mn>1</mn></mrow></msub><mspace width="1em" class="quad"></mspace></mtd><mtd columnalign="center"><mo>\dots</mo><mspace width="1em" class="quad"></mspace></mtd><mtd columnalign="center"><msub><mrow><mi>d</mi></mrow><mrow><mi>n</mi><mi>n</mi></mrow></msub></mtd></mtr></mtable></mrow></mfenced><mo>.</mo></mrow></math>

The rule in neuron $σ_{D}$ is $R_{3} : a^{l_{i}} \to a^{k_{i}}, [i - 1, i)$ , where $l_{i} = (d_{i 1}, d_{i 2}, \dots, d_{in})$ is the ith row vector, and is a vector computed using Eqs. (5) and (6). This rule screened distances less than or equal to half of the maximum nearest neighbor distance.

(d) The rule in a neuron $σ_{K_{1}}, σ_{K_{2}}, \dots, σ_{K_{n}}$ is $R_{4} : a^{k_{i}} \to a^{ρ_{i}}$ , and the relationship between $k_{i}$ and $ρ_{i}$ satisfies Eqs. (7) and (8). This rule calculated the local density, that is, $ρ_{i}$ .

(e) The rule in $σ_{L_{1}}, σ_{L_{2}}, \dots, σ_{L_{n}}$ is $R_{5} : a^{ρ_{i}} \to a^{δ_{i}}$ . This step calculated the relative distance using Eq. (3).

(f) The rule in $σ_{R D_{1}}, σ_{R D_{2}}, \dots, σ_{R D_{n}}$ is $R_{6} : a^{ρ_{i} + δ_{i}} \to a^{γ_{i}}$ , where $γ_{i} = ρ_{i} \times δ_{i}$ . All the spikes produced with $σ_{R D_{1}}, σ_{R D_{2}}, \dots, σ_{R D_{n}}$ are sent to the neuron $σ_{P}$ and stored as a set ${γ_{1}, γ_{2}, \dots, γ_{n}}$ . This step filtered points with higher local densities and relative distance values in the next step.

(g) The rule in $σ_{P}$ is

R 7 : a γ x \to a, [0, 1) a γ i \to a, [1, 2) ⋮ a γ y \to λ, [n - 1, n), <math display="block" altimg="eq-00109.gif"><mrow><msub><mrow><mi>R</mi></mrow><mrow><mn>7</mn></mrow></msub><mo>:</mo><mtable displaystyle="true" equalrows="false" equalcolumns="false"><mtr><mtd columnalign="center"><msup><mrow><mi>a</mi></mrow><mrow><msub><mrow><mi>γ</mi></mrow><mrow><mi>x</mi></mrow></msub></mrow></msup><mo>\to</mo><mi>a</mi><mo>,</mo><mo stretchy="false">[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo stretchy="false">)</mo></mtd></mtr><mtr><mtd columnalign="center"><msup><mrow><mi>a</mi></mrow><mrow><msub><mrow><mi>γ</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow></msup><mo>\to</mo><mi>a</mi><mo>,</mo><mo stretchy="false">[</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo stretchy="false">)</mo></mtd></mtr><mtr><mtd columnalign="center"><mo>⋮</mo></mtd></mtr><mtr><mtd columnalign="center"><msup><mrow><mi>a</mi></mrow><mrow><msub><mrow><mi>γ</mi></mrow><mrow><mi>y</mi></mrow></msub></mrow></msup><mo>\to</mo><mi>λ</mi><mo>,</mo><mo stretchy="false">[</mo><mi>n</mi><mo>-</mo><mn>1</mn><mo>,</mo><mi>n</mi><mo stretchy="false">)</mo></mtd></mtr></mtable><mo>,</mo></mrow></math>

where

$γ = {γ_{1}, γ_{2}, \dots, γ_{n}}$ represents all the

$γ$ values calculated in the previous step. When the first rule of

$R_{7}$ was executed,

$γ$ consumed

$γ_{x}$ , which is the largest number in

$γ$ , and produced one spike. Consequently, the second rule was executed, selecting the largest value from the remaining

$γ$ values until the top k largest values from

$γ$ were selected. The remaining

$γ$ values were empty.

Through this rule, the positions of the points corresponding to the top k largest values in $γ$ were selected and saved as vectors P with no changes in the position. The purpose of this rule was to locate the points corresponding to the first k maximum $γ$ values. They were then sent to neurons $σ_{S_{1}}, \dots, σ_{S_{n}}$ sequentially. Considering $γ = {2, 3, 1, 4}$ as an example, we selected the first two biggest ones as targets, and then neurons $σ_{P}$ produced $P = {λ, a, λ, a}$ . Therefore, $σ_{P}$ sent no spikes to neurons $σ_{S_{1}}$ and $σ_{S_{3}}$ , and sent one spike to neurons $σ_{S_{2}}$ and $σ_{S_{4}}$ . This step determined the positions of the cluster centers.

(h) The rule in neurons $σ_{S_{1}}, \dots, σ_{S_{n}}$ was

R 8 : (a x i) + / a u i \to a x i a x i \to λ . <math display="block" altimg="eq-00131.gif"><mrow><msub><mrow><mi>R</mi></mrow><mrow><mn>8</mn></mrow></msub><mo>:</mo><mtable displaystyle="true" equalrows="false" equalcolumns="false"><mtr><mtd columnalign="center"><msup><mrow><mo stretchy="false">(</mo><msup><mrow><mi>a</mi></mrow><mrow><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow></msup><mo stretchy="false">)</mo></mrow><mrow><mo>+</mo></mrow></msup><mo stretchy="false">/</mo><msup><mrow><mi>a</mi></mrow><mrow><msub><mrow><mi>u</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow></msup><mo>\to</mo><msup><mrow><mi>a</mi></mrow><mrow><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow></msup></mtd></mtr><mtr><mtd columnalign="center"><msup><mrow><mi>a</mi></mrow><mrow><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow></msup><mo>\to</mo><mi>λ</mi></mtd></mtr></mtable><mo>.</mo></mrow></math>

In addition to the spikes received from $σ_{s_{i}}$ , $σ_{s_{i}}$ received spikes from $σ_{x_{i}}$ . If the number of spikes in neurons $σ_{s_{i}}$ exceeded $x_{i}$ , then the spikes $x_{i}$ were generated. If the number of spikes was $x_{i}$ , no spikes were generated. For example, when $σ_{S_{2}}$ received one spike from $σ_{P}$ , $x_{2}$ spikes were sent by $σ_{x_{i}}$ , and there were $x_{2} + 1 > x_{2}$ spikes in $σ_{S_{2}}$ . Consequently, the rule ${(a^{x_{i}})}^{+} / a^{u_{i}} \to a^{x_{i}}, [n, n + 1)$ was used to generate $x_{2}$ spikes. Further, $x_{2}$ could be regarded as a cluster center. This step was used to select the k clustering centers.

(i) The rule in neurons $σ_{C_{i}}$ ( $1 \leq i \leq t$ ) is $R_{9} : E / a^{x_{j} - x_{i}} \to a^{x_{j}}, [j - 1, j)$ , where E is a regular expression in the form $a^{ε}$ , $ε = min ∥ x_{j} - x_{i} ∥$ . The rule was triggered only when the distance between $x_{j}$ and the cluster center $x_{i}$ was the smallest, and point $x_{j}$ would be sent to the corresponding output neuron $σ_{{out}_{i}}$ , that is, points $x_{j}$ and $x_{i}$ belong to the same cluster. The rule for this step was the classification of the points closest to the cluster center into one category based on the Euclidean distance between other points and the cluster center.

(j) Neuron $σ_{{out}_{i}}$ is an output neuron that represents a cluster i, and the points inside are all points near the cluster center $x_{i}$ . The output neurons represented the clustering results.

3.4. The DSNP-ANDPC algorithm

According to the above rules, the algorithm flow of the improved DPC based on the DSNP system was described as DSNP-ANDPC.

3.5. Analysis of complexity

This analysis was limited to the DPC algorithm of the DSNP system framework. We ensured that the dataset contained N points and M clusters. The time complexity of DSNP-ANDPC relies principally on the following points: (a) the cost of calculating distance is $O (N)$ because rule R2 requires N iterations to complete the operation; (b) the cost of calculating local densities is $O (N)$ because rules R3 and R4 require N iterations to complete the operation; (d) the cost of calculating the relative distance is through rules R5 and R6 involves consumption once; (e) selecting initial cluster centers cost $O (N)$ because rule R7 requires N iterations; (f) assigning every point to the corresponding cluster cost O(N). Each neuron $σ_{C_{i}}$ must run rule R9 N times; therefore, this step requires N iterations. Consequently, a time complexity of $O (N)$ for DSNP-ANDPC was obtained.

4. Experiment and Analysis

The performance of the DSNP-ANDPC algorithm was analyzed through experimental results for synthetic and real datasets, and compared with those of six clustering algorithms: K-means,³⁵ spectral Clustering (SC),³⁶ DPC,² DPC-KNN,³⁷ DPC-CE,⁹ and MDPC+.³⁸ Four synthetic datasets and 10 real datasets were used, such as compound dataset, Iris, zoo, etc. Except for the DPC-CE and MDPC+, which used the code published by the authors, all other algorithms were implemented in Python3.7. The system configuration was windows10, Intel(R)Core(TM)i7-9750HCPU and 16.0GB of RAM. Because the dataset and evaluation indices used in this study were different from those used in other studies, all comparison algorithms were rerun. K-means and SC use the algorithm package in scikit-learn directly, wherein SC uses the Gaussian kernel function and implements DPC-KNN and DSNP-ANDPC based on DPC code. Each algorithm was executed 20 times, and the final result was obtained by averaging these 20 executions. Moreover, stochastic processes are absent in all DPC algorithms. Hence, when parameters remained constant, the outcomes remained invariant across multiple iterations.

4.1. Evaluation metrics

The DSNP-ANDPC algorithm uses four distinct cluster evaluation metrics: adjusted rand index (ARI), normalized mutual information (NMI), F-score (F1), and accuracy (ACC).

ARI is calculated as

ARI(U,V)=2(ab−cd)(a+c)(c+b)+(a+c)(b+d),<math display="block" altimg="eq-00166.gif"><mstyle><mtext mathvariant="normal">ARI</mtext></mstyle><mo stretchy="false">(</mo><mi>U</mi><mo>,</mo><mi>V</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mrow><mn>2</mn><mo stretchy="false">(</mo><mi>a</mi><mi>b</mi><mo>−</mo><mi>c</mi><mi>d</mi><mo stretchy="false">)</mo></mrow><mrow><mo stretchy="false">(</mo><mi>a</mi><mo>+</mo><mi>c</mi><mo stretchy="false">)</mo><mo stretchy="false">(</mo><mi>c</mi><mo>+</mo><mi>b</mi><mo stretchy="false">)</mo><mo>+</mo><mo stretchy="false">(</mo><mi>a</mi><mo>+</mo><mi>c</mi><mo stretchy="false">)</mo><mo stretchy="false">(</mo><mi>b</mi><mo>+</mo><mi>d</mi><mo stretchy="false">)</mo></mrow></mfrac><mo>,</mo></math>(9)

where U is true label set; V is the cluster result label set, a is the number of point pairs belonging to the same class in U and V, b is the number of point pairs belonging to the same class in U but not in V, c is the number of point pairs belonging to the same class in V but not in U, and d is the number of point pairs belonging to the same class in U but in V. The compass of ARI is [−1,1]. The larger the result, the better the clustering effect.

NMI is calculated as

NMI=−2K∑i=1K′∑j=1nijlog(n⋅nijni⋅n′j)K∑i=1nilog(nin)+K′∑j=1njlog(n′jn),<math display="block" altimg="eq-00167.gif"><mstyle><mtext mathvariant="normal">NMI</mtext></mstyle><mo>=</mo><mfrac><mrow><mo>−</mo><mn>2</mn><munderover accentunder="true" accent="true"><mrow><mo movablelimits="false">∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>K</mi></mrow></munderover><munderover accentunder="true" accent="true"><mrow><mo movablelimits="false">∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow><mrow><msup><mrow><mi>K</mi></mrow><mrow><mi>′</mi></mrow></msup></mrow></munderover><msub><mrow><mi>n</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>log</mo><mfenced separators="" open="(" close=")"><mrow><mfrac><mrow><mi>n</mi><mo>⋅</mo><msub><mrow><mi>n</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub></mrow><mrow><msub><mrow><mi>n</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>⋅</mo><msubsup><mrow><mi>n</mi></mrow><mrow><mi>j</mi></mrow><mrow><mi>′</mi></mrow></msubsup></mrow></mfrac></mrow></mfenced></mrow><mrow><munderover accentunder="true" accent="true"><mrow><mo movablelimits="false">∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>K</mi></mrow></munderover><msub><mrow><mi>n</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>log</mo><mfenced separators="" open="(" close=")"><mrow><mfrac><mrow><msub><mrow><mi>n</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow><mrow><mi>n</mi></mrow></mfrac></mrow></mfenced><mo>+</mo><munderover accentunder="true" accent="true"><mrow><mo movablelimits="false">∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow><mrow><msup><mrow><mi>K</mi></mrow><mrow><mi>′</mi></mrow></msup></mrow></munderover><msub><mrow><mi>n</mi></mrow><mrow><mi>j</mi></mrow></msub><mo>log</mo><mfenced separators="" open="(" close=")"><mrow><mfrac><mrow><msubsup><mrow><mi>n</mi></mrow><mrow><mi>j</mi></mrow><mrow><mi>′</mi></mrow></msubsup></mrow><mrow><mi>n</mi></mrow></mfrac></mrow></mfenced></mrow></mfrac><mo>,</mo></math>(10)

where K represents the number of real label classes,

$K^{'}$ represents the number of clustering result label classes, n represents the number of samples,

$n_{i}$ indicates the number of samples belonging to class i, and

$n_{j}^{'}$ represents the quantity of samples belonging to class j under real labels. NMI generally detects the difference between the clustering and actual results. The values are within [0,1]. The larger the NMI, the smaller the difference between clustering results and reality, and the better the clustering effect.

F1 score is calculated as

F1=2P⋅RP+R.<math display="block" altimg="eq-00171.gif"><mi>F</mi><mn>1</mn><mo>=</mo><mfrac><mrow><mn>2</mn><mi>P</mi><mo>⋅</mo><mi>R</mi></mrow><mrow><mi>P</mi><mo>+</mo><mi>R</mi></mrow></mfrac><mo>.</mo></math>(11)

ACC P indicates the ratio of samples with true labels and true prediction results to samples with true prediction results. The recall rate, R, indicates the ratio of samples with true labels and prediction results to samples with true labels. The value range of F1 is within [0,1], and the greater the F1-score, the better the clustering effect.

ACC is calculated as

ACC=N∑i=1δ(yi,map(ci))N,<math display="block" altimg="eq-00172.gif"><mstyle><mtext mathvariant="normal">ACC</mtext></mstyle><mo>=</mo><mfrac><mrow><munderover><mrow><mo movablelimits="false">∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>N</mi></mrow></munderover><mi>δ</mi><mo stretchy="false">(</mo><msub><mrow><mi>y</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>,</mo><mstyle><mtext mathvariant="normal">map</mtext></mstyle><mo stretchy="false">(</mo><msub><mrow><mi>c</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo><mo stretchy="false">)</mo></mrow><mrow><mi>N</mi></mrow></mfrac><mo>,</mo></math>(12)

where

$y_{i}$ is the true label,

$c_{i}$ is the predicted label, and

$map (\cdot)$ is the function that maps each cluster label onto a true label. If

$y_{i} = c_{i}$ , let

$δ (y_{i}, map (c_{i})) = 1$ , that is,

$ACC = 1$ , and the clustering result is the best. By contrast,

$δ (y_{i}, map (c_{i})) = 0$ ,

$y_{i}$ , and

$c_{i}$ were completely different. ACC is a real number in the range of [0,1]. The higher the ACC value, the better the clustering effect.

4.2. Experiments on synthetic datasets

This study conducts a comparative analysis of five distinct algorithms, namely K-means, SC, DPC-KNN, DPC-CE, and DSNP-ANDPC, using four synthetic datasets. Table 1 presents comprehensive details of the selected artificial datasets. Subsequently, the evaluation of clustering performance relies on the assessment of NMI and ACC. Table 2 elucidates the parameter configurations for each algorithm across various experimental datasets and presents the corresponding experimental outcomes for all five algorithms.

**Table 1. Four synthetic datasets.**
Datasets	Instances	Dimensions	Clusters
R15	600	2	15
D31	3100	2	31
Compound	7266	2	6
Aggregation	788	2	7

**Table 2. The results of five clustering methods on synthetic datasets.**
Datasets	Algorithm	Parameter	NMI	ACC
R15	K-means	$K = 15$	0.9942	0.9967
	SC	$σ = 1$ , $K = 15$	0.9942	0.9967
	DPC-KNN	$p = 0.02$	0.9860	0.9787
	DPC-CE	$d c = 0.02$ , 0.25, 0.3	0.9892	0.9967
	DSNP-ANDPC	—	0.9942	0.9967
D31	K-means	$K = 31$	0.9675	0.9771
	SC	$σ = 2$ , $K = 31$	0.9652	0.9751
	DPC-KNN	$p = 0.02$	0.9570	0.9680
	DPC-CE	$d c = 0.02$ , 0.25, 0.3	0.9586	0.9690
	DSNP-ANDPC	—	0.9706	0.9800
Compound	K-means	$K = 6$	0.6541	0.7195
	SC	$σ = 1$ , $K = 6$	0.6616	0.5311
	DPC-KNN	$p = 0.01$	0.7042	0.7419
	DPC-CE	$d c = 0.02$ , 0.25, 0.3	0.6751	0.7393
	DSNP-ANDPC	—	0.7469	0.8404
Aggregation	K-means	$K = 7$	0.7842	0.8780
	SC	$σ = 1$ , $K = 7$	0.9467	0.9635
	DPC-KNN	$p = 0.01$	0.9957	0.9987
	DPC-CE	$d c = 0.02$ ,0.25,0.3	0.9956	0.9987
	DSNP-ANDPC	—	0.9957	0.9987

The five algorithms under comparison entail distinct parameter configurations, necessitating proactive manual input prior to analysis. The specific parameter settings for each algorithm are listed in Table 2. For instance, K-means required specifying the number of clusters, expressed as K, which was set to the actual number of clusters. However, SC must manually input the parameter $σ$ of the Gaussian kernel function and the number of clusters K. DPC requires a cutoff distance $d_{c}$ ranging from 0% to 20%, and DPC-KNN necessitates a percentage parameter p, ranging from 0% to 10%. Further, DPC-CE necessitated the input of the cutoff distance $d_{c}$ ranging from 0% to 20%. As in the original paper, the radios of threshold and punishment were set to $T_{r} = 0.25$ and $P_{r} = 0.3$ . Remarkably, the DSNP-ANDPC algorithm stood apart because it obviates the need for manual input of parameter values, offering a distinctive attribute in this comparative analysis. It is worth noting that this study selects the center of DPC, DPC-KNN, DPC-CE, MDPC+, and DSNP-ANDPC algorithms by calculating the product of local density and relative distance.

Figure 3 shows the clustering results of five algorithms on the R15 dataset which is characterized by 15 clusters and 600 data points. Notably, the visual disparities between these algorithms appear inconspicuous in graphical representation. However, as shown in Table 2, NMI and ACC of DSNP-ANDPC were the best, although the results were the same as those of K-means and spectral clustering. The D31 dataset contained 3100 points and 31 clusters, and subtle visual distinctions are shown in Fig. 4. From the analysis in Table 2, it can be concluded that the NMI and ACC of DSNP-ANDPC were greater than those of the other four algorithms. Thus, the proposed DSNP-ANDPC algorithm had the best effect on datasets R15 and D31.

Fig. 3. Experimental results of four clustering algorithms on the R15 dataset.

Fig. 4. Experimental results of four clustering algorithms on D31 dataset.

The compound dataset encompasses 7266 data points, which were categorized into six distinct clusters. The clustering results depicted in Fig. 5 revealed the superiority of the DSNP-ANDPC algorithm over its counterparts. This observation was further substantiated by the values presented in Table 2, where both the ACC and NMI values for DSNP-ANDPC exhibit a substantial advantage over the other algorithms.

Moreover, the aggregation dataset comprises seven clusters, two of which are interconnected. The clustering results are visually represented in Fig. 6, where it is evident that several points were erroneously allocated. Suboptimal cluster center selection using the K-means algorithm resulted in the poorest clustering outcomes. The SC algorithm also exhibited misclassification issues in the upper-left cluster. However, an analysis of the data in Table 2 shows that the DPC-KNN and DSNP-ANDPC algorithms shared the highest levels of ACC and NMI, signifying their superior performance in this context.

Fig. 6. Experimental results of four clustering algorithms on aggregation dataset.

4.3. Experiments on real datasets

Ten real datasets are selected for testing (Table 3). All real datasets were obtained from the UCI database.³⁹ Moreover, parameter settings are different for the different datasets, and the settings are listed in Table 4. Except for the MDPC+ algorithm, the parameter settings for other algorithms are the same as those described in Sec. 4.2. And MDPC+ required an attenuation coefficient $λ$ and the number of neighbors k as $λ = 2$ and $k = \sqrt{n}$ , where n is the number of samples.

**Table 3. Ten real datasets.**
Dataset	Instances	Dimensions	Clusters
Iris	150	4	3
Seeds	210	7	3
Zoo	101	18	7
Spambase	5473	10	2
Banknote	1372	4	2
E.coli	336	7	8
WDBC	569	30	2
Breast Cancer	569	9	2
Spambase	4601	57	2
Raisin	900	7	2

**Table 4. Experimental parameter setting on real datasets.**
	K-means	SC	DPC	DPC-KNN	DPC-CE	MDPC+	DSNP-ANDPC
Iris	3	1,3	0.2	0.01	0.02,0.25,0.3	2	—
Seeds	3	0.1,3	0.2	0.02	0.02,0.25,0.3	2	—
Zoo	7	0.5,7	0.2	0.02	0.02,0.25,0.3	2	—
PageBlock	5	1,5	0.2	0.02	0.02,0.25,0.3	2	—
Banknote	2	2,2	0.2	0.02	0.02,0.25,0.3	2	—
E.coli	8	1,8	0.02	0.02	0.02,0.25,0.3	2	—
WDBC	2	0.8,2	0.02	0.02	0.02,0.25,0.3	2	—
Breast Cancer	2	200,2	0.02	0.02	0.02,0.25,0.3	2	—
Spambase	2	2,2	0.2	0.02	0.02,0.25,0.3	2	—
Raisin	2	100,2	0.02	0.02	0.02,0.25,0.3	2	—

The clustering results were evaluated according to ARI, NMI, F1 and ACC. The scores of the seven algorithms are listed in Table 5. For the convenience of comparison, the best results are indicated in bold. DSNP-ANDPC has the best results in terms of four evaluation indices on five datasets: Iris, Seeds, Zoo, Banknote, and Raisin datasets, and was superior to other comparison algorithms.

**Table 5. The ARI, NMI, F1, and ACC values of seven algorithms.**
		K-means	SC	DPC	DPC-KNN	DPC-CE	MDPC+	DSNP-ANDPC
Iris	ARI	73.02	74.36	75.92	88.58	75.92	76.70	94.10
	NMI	75.82	76.61	80.57	87.05	79.60	81.08	91.92
	F1	89.33	90.00	90.67	95.99	84.04	90.99	98.00
	ACC	89.33	90.00	90.67	96.00	90.67	91.16	98.00
Seeds	ARI	71.66	71.45	70.27	71.70	72.88	71.50	76.24
	NMI	69.49	68.89	69.83	67.44	68.51	66.16	71.44
	F1	89.52	89.52	88.57	89.46	89.65	89.40	91.40
	ACC	89.52	89.52	88.57	89.52	90.00	89.52	91.43
Zoo	ARI	66.29	70.94	47.43	45.90	72.59	53.92	82.38
	NMI	75.33	72.70	60.54	60.22	79.10	70.65	81.69
	F1	75.49	78.21	60.26	64.03	78.16	69.80	79.81
	ACC	75.49	78.21	70.30	71.29	81.19	72.88	81.19
PageBlock	ARI	12.34	31.73	25.86	41.65	—	35.93	41.27
	NMI	5.51	17.07	14.19	27.37	29.99	—	30.20
	F1	81.17	87.63	82.83	88.74	89.51	89.62	89.97
	ACC	80.88	87.63	80.30	91.10	89.77	91.56	92.20
Banknote	ARI	4.85	13.82	23.08	7.05	20.32	0.14	23.22
	NMI	3.03	22.71	34.71	4.64	30.57	12.81	34.80
	F1	61.22	64.09	73.11	62.84	63.88	64.39	73.20
	ACC	61.22	68.97	74.13	63.41	72.67	54.75	74.20
E.coli	ARI	40.54	37.23	46.78	65.44	73.26	62.95	73.11
	NMI	55.52	47.64	51.67	63.44	60.70	—	70.99
	F1	63.87	64.70	64.95	69.97	81.35	74.70	72.78
	ACC	59.23	69.32	62.91	75.6	77.68	74.70	78.57
WDBC	ARI	49.14	0.13	20.88	42.56	13.35	54.98	33.83
	NMI	46.72	00.22	20.57	41.71	13.02	52.59	47.71
	F1	84.43	48.38	71.84	81.71	70.06	86.63	85.71
	ACC	85.41	62.74	72.89	83.13	70.83	87.35	86.19
Breast Cancer	ARI	49.14	0.03	25.3	−1.47	42.36	62.43	62.53
	NMI	46.72	0.05	34.51	1.70	34.11	53.56	50.96
	F1	84.43	53.25	67.57	47.86	77.43	89.53	89.72
	ACC	85.41	53.25	58.42	60.46	83.30	89.53	89.60
Spambase	ARI	3.94	0.30	4.82	1.65	49.69	10.94	13.92
	NMI	4.72	1.51	6.35	4.91	37.12	6.87	8.55
	F1	63.59	45.52	64.31	44.38	75.77	66.64	67.93
	ACC	63.59	60.16	64.31	57.77	64.42	66.67	69.14
Raisin	ARI	16.66	0.05	20.88	23.83	32.79	2.28	33.83
	NMI	26.1	0.13	20.57	26.18	26.47	3.28	34.18
	F1	67.75	50.74	71.84	74.94	66.91	61.70	78.43
	ACC	70.44	51.61	72.89	75.11	78.66	57.67	79.11

For the PageBlock dataset, the NMI, F1 and ACC values obtained by DSNP-ANDPC are clearly higher than those of the other algorithms. For the E.coli dataset, the DPC-CE algorithm obtained the highest ARI and F1 values, followed by the DSNP-ANDPC algorithm. For the WDBC dataset, the MDPC+ algorithm outperformed the DSNP–ANDPC algorithm, except for ACC. In the Breast Cancer dataset, except for NMI, the other values of the DSNP–ANDPC algorithm were greater than those of MDPC+. For the Spambase dataset, DPC-CE was outperformed by the DSNP-ANDPC algorithm in terms of ACC, but it was the best in terms of the other three evaluation indices. Therefore, DSNP-ANDPC is superior to DPC, DPC-KNN, DPC-CE, MDPC+, K-means and SC algorithm in most cases.

Heatmaps were utilized, as demonstrated in Fig. 7, to present the results in a more intuitive manner. In this context, darker blue shades correspond to higher values, signifying superior outcomes. It is evident that the algorithm proposed in this paper, DSNP-ANDPC, performs best on most datasets, with the last row having the darkest shade of blue, particularly in Figs. 7(c) and 7(d).

5. Conclusion

The DPC algorithm offers the advantage of automatically discovering cluster centers and the ability to process arbitrarily shaped data. However, the algorithm has several limitations, such as the high impact of local density calculations on the clustering results and high time complexity. In this study, to overcome the shortcomings of the DPC algorithm, the maximum nearest neighbor distance and KNN algorithms were introduced. In addition, an improved DPC algorithm was integrated into a DSN P system and a DSNP-ANDPC algorithm was proposed. This algorithm exploited the distributed parallel computing advantages of membrane computing to construct a clustering model based on a DSNP system. The clustering process of the improved DPC algorithm was implemented using the rules and objects of the DSNP system to reduce the algorithm time complexity. The use of KNN and the maximum nearest-neighbor distance to avoid setting a truncation distance, and the incorporation of the local structure of the data reduced the parameter settings of the DPC algorithm and enhanced its adaptability.

Subsequently, the DSNP–ANDPC algorithm was validated using four synthetic and 10 UCI datasets. The evaluation metrics used included ARI, NMI, F1 score, and ACC. The experiments compared six other clustering algorithms, and the results showed that DSNP-ANDPC performed best on most of the datasets considered.

Future studies can focus on three research directions. First, a self-loop can be added to the DSNP system to realize other clustering algorithms via an iterative process. Second, DSNP systems can be combined with neural networks to expand the application areas of P systems such as neural dynamic classification⁴⁰ and dynamic ensemble learning⁴¹ algorithms. Third, supervised machine learning⁴² or classification algorithms will be a important research direction. These avenues of exploration are promising for further inquiry and expansion of the research framework.

Acknowledgment

This work was supported by the Program for Youth Innovative Research Team in University of Shandong Province (2022KJ179) and the Natural Science Foundation of Rizhao City (RZ2022ZR66).

References

1. E. R. Hruschka, R. Campello and A. Freitas, A survey of evolutionary algorithms for clustering, IEEE Trans. Syst. Man Cybern. C (Appl. Rev.) 39(2) (2009) 133–155. Crossref, Web of Science, Google Scholar
2. A. Rodriguez and A. Laio, Clustering by fast search and find of density peaks, Science 344(6191) (2009) 1492–1496. Crossref, Google Scholar
3. H. Fan et al., A cluster analysis approach based on exploiting density peaks for gas discrimination with electronic noses in open environments, Sens. Actuators B: Chem. 259 (2018) 183–203. Crossref, Web of Science, Google Scholar
4. F. Wang et al., Intradialytic blood pressure pattern recognition based on density peak clustering, J. Biomed. Inf. 83 (2018) 33–39. Crossref, Medline, Web of Science, Google Scholar
5. L. Qiu et al., Improved density peak clustering based adaptive Gaussian mixture model for damage monitoring in aircraft structures under time varying conditions, Mech. Syst. Signal Process. 126 (2019) 281–304. Crossref, Web of Science, Google Scholar
6. Z. Deng et al., Complex network community detection method by improved density peaks model, Physica A: Stat. Mech. Appl. 526 (2019) 1–13. Crossref, Web of Science, Google Scholar
7. S. A. Seyedi et al., Dynamic graph based label propagation for density peaks clustering, Expert Syst. Appl. 115 (2019) 314–328. Crossref, Web of Science, Google Scholar
8. Z. Zhang et al., Density decay graph-based density peak clustering, Knowl.-Based Syst. 224 (2021) 107075. Crossref, Web of Science, Google Scholar
9. W. Guo et al., Density peak clustering with connectivity estimation, Knowl.-Based Syst. 243 (2022) 108501. Crossref, Web of Science, Google Scholar
10. X. Fang et al., A grid-based density peaks clustering algorithm, IEEE Trans. Ind. Inf. 19(4) (2023) 5476–5484. Crossref, Web of Science, Google Scholar
11. S. Zhou, Z. Chen, R. Duan and W. Song, Multi-exemplar affinity propagation clustering based on local density peak, Appl. Intell. 54 (2024) 1–25. Crossref, Web of Science, Google Scholar
12. F. Fang, L. Qiu and S. Yuan, Adaptive core fusion-based density peak clustering for complex data with arbitrary shapes and densities, Pattern Recognit. 107 (2020) 1–14. Crossref, Web of Science, Google Scholar
13. K. G. Flores and S. E. Garza, Density peaks clustering with gap-based automatic center detection, Knowl.-Based Syst. 206 (2020) 106350. Crossref, Web of Science, Google Scholar
14. Z. Liang and P. Chen, An automatic clustering algorithm based on the density-peak framework and Chameleon method, Pattern Recognit. Lett. 150(1) (2021) 40–48. Crossref, Google Scholar
15. Z. A. Rui et al., Adaptive density-based clustering algorithm with shared KNN conflict game, Inf. Sci. 565 (2021) 344–369. Crossref, Google Scholar
16. H. Du, Y. Hao and Z. Wang, An improved density peaks clustering algorithm by automatic determination of cluster centres, Connect. Sci. 34 (2021) 1–17. Web of Science, Google Scholar
17. Y. Yang et al., Density clustering with divergence distance and automatic center selection, Inf. Sci. 596 (2022) 414–438. Crossref, Web of Science, Google Scholar
18. H. Peng et al., Multiobjective fuzzy clustering approach based on tissue-like membrane systems, Knowl.-Based Syst. 125 (2017) 74–82. Crossref, Web of Science, Google Scholar
19. S. Ghosh-Dastidar and H. Adeli, Spiking neural networks, Int. J. Neural Syst. 19(4) (2009) 295–308. Link, Web of Science, Google Scholar
20. Z. Jiang and X. Liu, Novel coupled DP system for fuzzy C-means clustering and image segmentation, Appl. Intell. 71(2) (2020) 1–16. Google Scholar
21. M. Ionescu, G. Păun and T. Yokomori, Spiking Neural P Systems, Fundam. Inf. 71(2) (2006) 279–308. Web of Science, Google Scholar
22. J. Xue et al., Deep ensemble neural-like P systems for segmentation of central serous chorioretinopathy lesion, Inf. Fusion 65 (2021) (2020) 84–94. Google Scholar
23. T. Wu, F. Neri and L. Pan, On the tuning of the computation capability of spiking neural membrane systems with communication on request, Int. J. Neural Syst. 32(8) (2022) 2250037. Link, Web of Science, Google Scholar
24. L. Garcia et al., Small universal spiking neural P systems with dendriticaxonal delays, Neural Netw. 138 (2021) 126–139. Crossref, Medline, Google Scholar
25. Y. Shen, X. Liu, Z. Yang, W. Zang and Y. Zhao, Spiking neural membrane systems with adaptive synaptic time delay, Int. J. Neural Syst. 34(6) (2024) 2450028. Link, Google Scholar
26. X. Yu, J. Nagai and B. S. Khakh, Improved tools to study astrocytes, Nat. Rev. Neurosci. 21(3) (2020) 121–138. Crossref, Medline, Web of Science, Google Scholar
27. B. Aman and G. Ciobanu, Spiking Neural P systems with astrocytes producing calcium, Int. J. Neural Syst. 30(12) (2020) 2050066. Link, Web of Science, Google Scholar
28. Q. Yang et al., Spiking neural P systems with structural plasticity and anti-spikes, Theor. Comput. Sci. 801 (2020) 143–156. Crossref, Web of Science, Google Scholar
29. L. Zhang, F. Xu, D. Xiao, G. Zhang and F. Neri, Enzymatic numerical spiking neural membrane systems and their application in designing membrane controllers, Int. J. Neural Syst. 32(11) (2022) 2250055. Link, Web of Science, Google Scholar
30. X. Yin et al., Hypergraph-based numerical spiking neural membrane systems with novel repartition protocols, Int. J. Neural Syst. (2024) 2450039, https://doi.org/10.1142/S0129065724500394. Link, Web of Science, Google Scholar
31. G. Zhang, X. Zhang, H. Rong, P. Paul, M. Zhu, F. Neri and Y. Ong, A layered spiking neural system for classification problems, Int. J. Neural Syst. 32(8) (2022) 2250023. Link, Web of Science, Google Scholar
32. G. Zhang, H. Rong, P. Paul, Y. He, F. Neri and M. J. Pérez-Jiménez, A complete arithmetic calculator constructed from spiking neural P systems and its application to information fusion, Int. J. Neural Syst. 31(1) (2021) 2050055. Link, Web of Science, Google Scholar
33. D. Kong et al., A grid-density based algorithm by weighted spiking neural P systems with anti-spikes and astrocytes in spatial cluster analysis, Processes 8(9) (2020) 1132. Crossref, Web of Science, Google Scholar
34. Q. Ren and X. Liu, Delayed spiking neural P systems with scheduled rules, Complexity 2021 (2021) 1–13. Crossref, Web of Science, Google Scholar
35. J. A. Hartigan and M. A. Wong, A K-means clustering algorithm, Appl. Stat. 28(1) (1979) 100–108. Crossref, Google Scholar
36. L. Zelnik-Manor and P. Perona, Self-tuning spectral clustering, Adv. Neural Inf. Process. Syst. (2004), pp. 1–8. Google Scholar
37. J. Guan, S. Li, X. He and J. Chen, Clustering by fast detection of main density peaks within a peak digraph, Inf. Sci. 628 (2023) 504–521. Crossref, Web of Science, Google Scholar
38. M. Du, S. Ding and H. Jia, Study on density peaks clustering based on k-nearest neighbors and principal component analysis, Knowl.-Based Syst. 99(1) (2016) 135–145. Crossref, Google Scholar
39. K. Bache and M. Lichman, UCI machine learning repository, University of California, Irvine, School of Information and Computer Science (2013). Google Scholar
40. M. H. Rafiei and H. Adeli, A new neural dynamic classification algorithm, IEEE Trans. Neural Netw. Learn. Syst. 28(12) (2017) 3074–3083. Crossref, Medline, Web of Science, Google Scholar
41. K. M. R. Alam, N. Siddique and H. Adeli, A dynamic ensemble learning algorithm for neural networks, Neural Comput. Appl. 32(10) (2020) 8675–8690. Crossref, Web of Science, Google Scholar
42. M. H. Rafiei, L. Gauthier, H. Adeli and D. Takabi, Self-supervised learning for electroencephalography, IEEE Trans. Neural Netw. Learn. Syst. 35(2) (2024) 1457–1471. Crossref, Medline, Web of Science, Google Scholar

Vol. 34, No. 10

Metrics

Downloaded 471 times

History

Received 31 October 2023

Accepted 8 June 2024

Published: 6 July 2024

Information

This is an Open Access article published by World Scientific Publishing Company. It is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC) License, which permits use, distribution and reproduction in any medium, provided that the original work is properly cited and is used for non-commercial purposes.

Keywords

PDF download

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

A Delayed Spiking Neural Membrane System for Adaptive Nearest Neighbor-Based Density Peak Clustering

Abstract

1. Introduction