Search name | Searched On | Run search |
---|---|---|
Keyword: Standard Errors Of Estimate (1) | 27 Mar 2025 | Run |
Keyword: 2D Non-abelian Gauge Theory (2) | 27 Mar 2025 | Run |
Keyword: Veterans Use Of Information (1) | 27 Mar 2025 | Run |
Keyword: Deep Reinforcement Learning (43) | 27 Mar 2025 | Run |
Keyword: Digital Receive Beamforming (1) | 27 Mar 2025 | Run |
You do not have any saved searches
At the intersection with asymmetric traffic flow, a single neural network or other control methods cannot make a choice in time to ensure that the intersection with a large traffic flow and the intersection with a long queue length can obtain more traffic time. In order to solve this problem, a signal length control method for asymmetric traffic flow intersections based on deep reinforcement learning is proposed. Using deep Q-learning, the traffic signal control problem is transformed into a reinforcement learning problem. The state of traffic intersection is defined as traffic cycle time, asymmetric traffic flow parameters, asymmetric traffic flow parameters, the green signal ratio of the signal, and the control action of a traffic signal is defined as changing the phase and duration of the signal. Through the deep Q-learning model, a neural network model is trained to predict the long-term cumulative return (i.e., Q value) of each action under different conditions, that is, asymmetric traffic flow, and select the optimal control action according to the Q value, so as to realize the signal light duration control of asymmetric traffic flow intersections. Through experimental verification, when the discount factor of the model is 0.5, the learning speed and stability of the optimal agent can be obtained, which effectively reduces the occurrence of traffic congestion and greatly improves the traffic safety of vehicles, which is of great significance for improving urban traffic conditions.
To help Latin dancers feel the rhythm of the sports dance, it is necessary to standardize the posture action during basic training. Therefore, it is important to study the method of correcting Latin dance posture action. The traditional Latin dance posture correction methods have some problems, such as the included angle error of head motion, the angle error of spine transformation, the error of fit between foot and ground and so on. In this paper, a Latin dance posture correction method using improved deep reinforcement learning in the Internet of things (IoT) is proposed. First, the Latin dance posture image acquisition architecture is constructed using IoT and binocular stereo vision to acquire Latin dance posture images and extract Latin dance posture features. Second, the channel attention module in the deep learning network is improved, and the Latin dance posture diagnosis model is constructed based on the action feature extraction results using the improved deep robust chemical network. Finally, the action correction coefficients are calculated according to the Latin dance posture diagnosis results to realize the Latin dance posture correction. The results showed that after the application of the proposed correction method, including angle error of head movement, the spine transformation angle error and fit between foot and ground error of the participants’ motions were kept below 1∘∘, and the frame position offset was 1.3cm. It indicates that the proposed method can effectively improve the degree of Latin dance posture specification.
The application of deep reinforcement learning techniques to adaptive traffic signal control is effective in reducing traffic congestion, due to its advantages over more traditional experience-based or model-driven methods. Although numerous studies have been conducted on traffic signal control, many of them suffer from issues such as unreasonable phase settings and long model-building times in complex environments. By analyzing the spatial-temporal characteristics of intersection traffic volume, this paper puts forward an automatic phase-switching mechanism to improve the problem of phase scheme mismatch with actual traffic flow. To reduce the training time of the model, this paper proposes to add noise to the parameters of the agent to stabilize the environment exploration process, so that the agent’s decision is more dependent on the observed environment state. Using real-world traffic flow data and simulated data for experimental verification, the experimental results show that the signal control model after optimization can effectively reduce the average travel time and increase network throughput.
The implementation of personalized recommendations of educational content contributes to enhancing the efficiency of resource utilization in education. However, during the recommendation process, it is challenging to extract feature vectors of educational content, which can lead to inaccuracies in the recommendation results. To address this issue, a personalized recommendation algorithm using deep reinforcement learning for educational content is proposed. First, basic user data and educational content resource data are collected, user preference is calculated, multiple features of the educational content are extracted, and feature vectors are constructed. Second, under the influence of deep reinforcement learning, user preference is defined as the state space, and educational content feature vectors are defined as the action space. State value functions and action value functions are computed separately, followed by the calculation of state transition probabilities, the setting of a reward function, and iterative updates based on the reward values generated from user selection of educational content, which then output preliminary recommendation results. Finally, considering the proportion of personalized recommendations, a personalized adjustment function is designed to output accurate personalized educational content recommendation results. Experiments conducted on the Coursera-Course dataset demonstrate that, compared to traditional methods, the proposed method yields an average recall of 96%, and an average sparsity of 22.36%, effectively providing users with accurate personalized educational content recommendations.
Rapid human motion pose tracking has extensive applications in fields such as motion capture, intelligent monitoring, sports training, and physical health management. It can provide accurate data support, enhance safety monitoring, optimize training outcomes, and promote physical health. Traditional human pose tracking methods predominantly rely on either sensors or images for tracking, which often results in issues like low tracking accuracy and slow tracking speed. To address these problems, a rapid human motion pose tracking method based on improved deep reinforcement learning and multimodal fusion is proposed. First, this paper designs an overall architecture for rapid human motion pose tracking and utilizes a combination of monocular vision and sensors to extract and collect human motion data. Second, it constructs a complementary filter-based multimodal data fusion method to merge the multimodal data and extract the fused features. Finally, a multi-level attention network is employed to enhance the deep reinforcement learning network, using the fused features as input for training to achieve rapid human motion pose tracking. The results show that the proposed method can achieve efficient and stable human motion pose tracking in complex scenes, with a tracking accuracy of up to 85% and a shortest tracking time of 72ms, which has practical application value.
The core value of cloud manufacturing is to enable optimal allocation of manufacturing resources across enterprises in a wide-area environment. Scheduling is the key technology to realize the core value of cloud manufacturing. At present, large-scale enterprises are integrated in the cloud platform, which poses great difficulties for cloud manufacturing scheduling. How to perform collaborative scheduling of platform and enterprises is therefore a key question. Moreover, in the cloud manufacturing environment, distributed enterprises with autonomy on the edge side access their own resources to the cloud manufacturing service platform and carry out collaborative scheduling with the cloud manufacturing platform. Platform-enterprise collaborative scheduling provides support for large-scale resources and services within the cloud. Given this, the paper provides a platform-enterprise collaborative model that is adopted to study the scheduling problem of large-scale resources and services in cloud manufacturing. The model considers the platform-based service scheduling and enterprise-based resource scheduling. The collaborative scheduling mechanisms of the cloud service and enterprise resource are investigated. The former completes the scheduling of cloud services while collaborating on tasks with the latter, and the latter completes the scheduling of enterprise resources while delivering scheduling information to the former. Moreover, deep reinforcement learning (DRL) has been widely applied to cloud manufacturing scheduling. A platform-enterprise collaborative scheduling algorithm based on dueling deep QQ-Network with prioritized replay (CE-PDDQN) is proposed. To evaluate the effectiveness of our proposed algorithm, this paper selects DQN and dueling DQN for experiments. The experimental results show that the CE-PDDQN algorithm can obtain a better scheduling scheme after training and learning. And the CE-PDDQN algorithm is adaptive and scalable.
Target encirclement of multiple unmanned surface vessels (USVs) is a prevalent tactical strategy in maritime defense missions, significantly enhancing defense efficiency. However, challenges arise due to the limited collaborative capacity stemming from underutilized information within multi-USV systems during such tasks. In this context, a multi-USV target encirclement based on an improved multi-head attention Q-value mixing network is proposed. Initially, a reinforcement learning model tailored for USVs is designed, taking into account the complexities inherent in encirclement tasks. Subsequently, by incorporating distinct action semantics for value calculation, USVs are empowered to accurately assess action values, thereby improving decision-making processes. Leveraging the Qatten algorithm framework, a multi-USV target encirclement method with weighted action semantic-assisted value function decomposition is introduced. Comparative and ablation experiments conducted in uniform-speed target encirclement scenarios validate the effectiveness and high success rate of the proposed method.
In this paper, we leverage a reinforcement learning approach to address the motion control problem of Four In-Wheel Motor Actuated Vehicles aimed at achieving precise control while optimizing energy efficiency. Our control architecture consists of four adaptive Proportional-Integral-Derivative controllers, each assigned to an independent vehicle wheel. We train these controllers using an actor-critic framework in two standard driving scenarios: acceleration and braking, as well as a double lane-change maneuver. This method eliminates the need for a detailed mathematical model of the complex vehicle dynamics. Moreover, the adaptive mechanism enables controllers to dynamically adapt to varying operating conditions. After training, the resulting controllers are tested in unseen scenarios to validate their robustness and adaptability beyond the training environment. The testing results show that our controllers achieve precise velocity and trajectory tracking while maintaining low energy consumption.
Aiming at the problem of multi-agent cooperative confrontation in seize-control scenarios, we design an efficient multi-agent policy self-play (EMAP-SP) learning method. First, a multi-agent centralized policy model is constructed to command the agents to perform tasks cooperatively. Considering that the policy being trained and its historical policies usually have poor exploration capability under incomplete information in self-play trainings, the intrinsic reward mechanism based on random network distillation (RND) is introduced in the self-play learning method. In addition, we propose a multi-step on-policy deep reinforcement learning (DRL) algorithm assisted by off-policy policy evaluation (MSOAO) to learn the best response policy in the self-play. Compared with DRL algorithms commonly used in complex decision problems, MSOAO has more efficient policy evaluation capability, and efficient policy evaluation further improves the policy learning capability. The effectiveness of EMAP-SP is fully verified in MiaoSuan wargame simulation system, and the evaluation results show that EMAP-SP can learn the cooperative policy of effectively defeating the Blue side’s knowledge-based policy under incomplete information. Moreover, the evaluations results in DRL benchmark environments also show that the best response policy learning algorithm MSOAO can promote the agent to learn approximately optimal policies.
In this paper, we provide a flexible framework for optimal trading in an asset listed on different venues. We take into account the dependencies between the imbalance and spread of the venues, and allow for partial execution of limit orders at different limits as well as market orders. We present a Bayesian update of the model parameters to take into account possibly changing market conditions and propose extensions to include short/long trading signals, market impact or hidden liquidity. To solve the stochastic control problem of the trader we apply the finite difference method and also develop a deep reinforcement learning algorithm allowing to consider more complex settings.
Visual neuroprostheses are a promising approach to restore basic sight in visually impaired people. A major challenge is to condense the sensory information contained in a complex environment into meaningful stimulation patterns at low spatial and temporal resolution. Previous approaches considered task-agnostic feature extractors such as edge detectors or semantic segmentation, which are likely suboptimal for specific tasks in complex dynamic environments. As an alternative approach, we propose to optimize stimulation patterns by end-to-end training of a feature extractor using deep reinforcement learning agents in virtual environments. We present a task-oriented evaluation framework to compare different stimulus generation mechanisms, such as static edge-based and adaptive end-to-end approaches like the one introduced here. Our experiments in Atari games show that stimulation patterns obtained via task-dependent end-to-end optimized reinforcement learning result in equivalent or improved performance compared to fixed feature extractors on high difficulty levels. These findings signify the relevance of adaptive reinforcement learning for neuroprosthetic vision in complex environments.
A weighted densely connected convolution network (W-DenseNet) is proposed for reinforcement learning in this work. The W-DenseNet can maximize the information flow between all layers in the network by cross layer connection, which can reduce the phenomenon of gradient vanishing and degradation, and greatly improves the speed of training convergence. The weight coefficient introduced in W-DenseNet, the current layer received all the previous layers’ feature maps with different initial weights, which can extract feature information of different layers more effectively according to tasks. According to the weight adjusted by learning, the cross-layer connection is pruned to remove the cross-layer connection with smaller weight, so as to reduce the number of cross-layer. In this work, GridWorld and FlappyBird games are used for simulation. The simulation results of deep reinforcement learning based on W-DenseNet are compared with the traditional deep reinforcement learning algorithm and reinforcement learning algorithm based on DenseNet. The simulation results show that the proposed W-DenseNet method can make the results more convergent, reduce the training time, and obtain more stable results.
Automated guided vehicle (AGV) is an important transportation equipment, which is widely used in warehouses and factories. In the scenarios of multi-AGVs application, an efficient AGVs task assignment strategy is beneficial for transportation costs, balance of workload and increasing distribution efficiency. The traditional method usually depends on a powerful scheduling system, which solves the task assignment problem in a regular way. In this paper, we present a decentralized framework of multi-task allocation with attention (MTAA) in deep reinforcement learning, which combines with the methods of task assignment in balance and path planning in cooperation for distribution application. As for task assignment balance, we adopt DNN network to achieve task assignment equilibrium. In multi-AGVs path planning, methods of A3C are embedded in MTAA framework, which are instrumental in improving the stationarity and performance in deep reinforcement learning application. In experiments, we designed two different scenarios under different obstacles to verify the performance of MTAA-A3C and MTAA-DQN methods. The experiments show that the proposed approach has feasibility and effectiveness used in multi-AGVs application.
There has been a lot of research on edge-computing task offloading in deep reinforcement learning (DRL). Deep reinforcement learning is one of the important algorithms in the current AI field, but there is still room for improvement in the time cost and adaptive correction ability of the algorithm. This paper studies the application of DRL algorithms in edge-computing task offloading, and its key innovation is to propose an MADRLCO algorithm, which is based on the design idea of the Actor–Critic framework, uses the DNN model to act as an Actor, and can more accurately locate the initial decision through iterative training, and use the LSTM model to optimize the Critic, which can be more accurate. The optimal decision can be located in a short period of time. The main work of this paper is divided into three parts: (1) The AC algorithm of the Actor–Critic framework in DRL is proposed to be applied to edge-computing task offloading. (2) To address the weak generalization ability of the basic version of the Actor–Critic algorithm in multi-objective optimization, the sequential quantitative correction and adaptive correction parameter K method are used to optimize the Critic frame, thereby improving the generalization ability of the model in multi-objective decision-making and improving the rationality of decision-making results. (3) Aiming at the problem of large time cost in the critical framework of the model, a search algorithm for resource allocation-related parameters based on the time-series prediction method is proposed (time-series forecasting is a research branch of pattern recognition), which reduces the time overhead of the algorithm and improves the adaptive correction capability of the model. The algorithm in this paper can adapt to not only the time-varying network channel state, but also the time-varying number of device connections. Finally, it is proved by experiments that compared with the DRL calculation offloading algorithm based on DNN plus binary search, the MADRLCO algorithm reduces the model training time by 66.27%, and in the environment of the time-varying number of devices in the metasystem, the average model average. The standard calculation rate is 0.0403 higher than that of the current optimal algorithm.
Path planning is an important part of the research field of mobile robots, and it is the premise for mobile robots to complete complex tasks. This paper proposes a reflective reward design method based on potential energy function, and combines the ideas of multi-agent and multi-task learning to form a new training framework. The reflective reward represents the quality of the agent’s current decision relative to the past historical decision sequence, using the second-order information of the historical reward sequence. The policy or value function update of the master agent is then assisted by the reflective agent. The method proposed in this paper can easily extend the existing deep reinforcement learning algorithm based on value function and policy gradient, and then form a new learning method, so that the agent has the reflective characteristics in human learning after making full use of the reward information. It is good at distinguishing the optimal action in the corresponding state. Experiments in pathfinding scenarios verify the effectiveness of the algorithm in sparse reward scenarios. Compared with other algorithms, the deep reinforcement learning algorithm has higher exploration success rate and stability. Experiments in survival scenarios verify the improvement effect of the reward feature enhancement method based on the auxiliary task learning mechanism on the original algorithm. Simulation experiments confirm the effectiveness of the proposed algorithm for solving the path planning problem of mobile robots in dynamic environments and the superiority of deep reinforcement learning algorithms. The simulation results show that the algorithm can accurately avoid unknown obstacles and reach the target point, and the planned path is the shortest and the energy consumed by the robot is the least. This demonstrates the effectiveness of deep reinforcement learning algorithms for local path planning and real-time decision making.
Digital microfluidic biochips (DMFBs) are emerging as an alternative to the cumbersome traditional laboratories for biochemical analysis. DMFBs come under micro-electro-mechanical systems and are a class of lab-on-a-chip devices. DMFBs provide automation, miniaturization and software programmability. The droplet routing algorithm determines concurrent routes for a set of droplets from their source cells to individual target cells on a DMFB. In this paper, a double deep Q-network (DDQN)-based droplet routing algorithm has been proposed. DDQN is a temporal difference-based deep reinforcement algorithm that combines Double Q-learning with a deep neural network algorithm. In the proposed work, routes for droplets are determined by DDQN, and later collisions are resolved using stalling and/or detouring. The latest arrival time of droplets arriving last at its target and cell utilization is taken as objectives for routing algorithm performance evaluation. The proposed method is evaluated on two standard benchmark suites. Simulation results show that the proposed DDQN-based droplet routing algorithm produces competitive results compared to state-of-the-art algorithms.
The hyperparameters of the metaheuristic algorithm are difficult to determine when solving optimization problems. The existing methods mainly adjust hyperparameters through preset rules or traditional RL. The performance of the above methods is unsatisfactory and the generalization is poor. This work proposes a deep Q-learning network (DQN)-based dynamic setting framework for combinatorial hyperparameters, and applies it to a Genetic algorithm (GA) to improve its performance. By defining the four elements of the environment, state, action and reward required for learning strategy in advance, the parametrized strategy can be trained offline and different DQN models can be studied. Our method was compared with other algorithms and achieved the shortest path on 14 of 15 public TSP instances. Meanwhile, the test results on our simulation TSP validation dataset revealed that Category DQN achieved the best performance. This means the proposed method can effectively solve the problem of combinatorial hyperparameters setting, and bring more solving advantages to the GA.
Reasonable allocation of resources is an important guarantee for efficient support of power business in edge IoT agents. Facing the above problems of the current power Internet of Things, this paper proposes a resource optimization allocation method based on deep Q-learning. This method first comprehensively considers the communication performance and network security. Involving indicators such as latency and service satisfaction, a complete and reliable mathematical model of the edge Internet of Things proxy network is constructed to achieve efficient and reliable modeling of the power Internet of Things (pIoT), aiming to better fit the practical interaction needs for efficient and secure communication. The Q-learning network model is optimized, and the method combining Reinforcement learning and deep learning is used to solve the model. Used by this network, the optimization and improvement of the deep network model is realized, so that the status, action and other parameters of the network model can be solved in a timely manner, so as to better support the reliable and efficient information interaction of the communication network. The test results prove that the delay of the proposed method can be maintained within 12ms in more complex scenarios, and the interaction success rate reaches 0.975, confirming that the proposed method can provide good information interaction guarantee services.
In this paper, the vehicle position points obtained by multi-sensor fusion are taken as the observed values, and Kalman filter is combined with the vehicle kinematics equation to further improve the vehicle trajectory. To realize this, mathematical principles of deep reinforcement learning are analyzed, and the theoretical basis of reinforcement learning is also analyzed. It is proved that the controller based on dynamic model is better than the controller based on kinematics in deviation control, and the performance of the controller based on deep reinforcement learning is also verified. The simulation data show that the proportion integration differentiation (PID) controller has a better tracking effect, but it does not have the constraint ability, which leads to radical acceleration change, resulting in unstable acceleration and deceleration control. Therefore, the deep reinforcement learning controller is selected as the longitudinal velocity tracking controller. The effectiveness of lateral and longitudinal motion decoupling strategy is verified by simulation experiments.
Although Deep Reinforcement Learning (DRL) is promising in solving Job Shop Scheduling Problems (JSPs), existing DRL-based methods still have large optimality gaps when learning job-to-machine solutions. This is mainly because: (i) existing state representations autonomously learned from graph-structured data cannot fully capture node information to support agents in making optimal decisions; and (ii) existing reward functions cannot accurately reflect some actions that will seriously worsen the current state. Aiming to address these issues, we propose a knowledge-based DRL method that selects nine well-known priority dispatching rules (PDRs) as state features, which can achieve effective model training. To avoid feature over-redundancy, we discard significantly correlated features based on the Pearson correlation relationship analysis, which can help to identify the key factors that affect the agents’ decision-making. Furthermore, since it is difficult to design a reward function that can accurately distinguish actions, we mask poor-performing actions based on problem-specific knowledge to prevent them from being selected at the current decision point. Comprehensive experimental results demonstrate the superiority of our approach over four PDRs and four state-of-the-art methods on various benchmarks.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.