Advanced Search

Narrow Results

Results: 1 - 18of18

Follow results:

refine search

Filters

per page:

Sort: Relevance

Context for search term 1Search term 1*

All Dates

LastSelect static range

Custom Range

Select starting monthSelect starting year

Select ending monthSelect ending year

Advanced

Search name	Searched On	Run search
Keyword: Comparison (23)	27 Mar 2025	Run
Keyword: Q-learning (18)	27 Mar 2025	Run
Keyword: Blood Flow (82)	27 Mar 2025	Run

articleNo Access
Study on pedestrian evacuation model based on reinforcement learning
- Zhu Rui,
- Hu Jun,
- Fan Ling,
- Zhang Qi, and
- Wei Juan
International Journal of Modern Physics C22 Nov 2024
Preview Abstract
This paper proposed a pedestrian evacuation model combined with reinforcement learning in order to study how to better guide pedestrians to complete evacuation in specific indoor scenes. This model introduced the way of establishing a scene in cellular automata and formulated reward rules according to the characteristics of the scene. It fitted the psychological activities of pedestrians in the actual evacuation process and trained the strategy of pedestrians at the overall level through the Q-learning algorithm from the reinforcement learning area. A speed control mechanism combined with real statistical data was introduced to simulate the speed attenuation. A simulation platform was built to compare the evacuation conditions under different scenarios and the different total numbers of pedestrians. The research showed that the model could automatically realize the exit selection function of pedestrians and part of conformity behavior. In the same evacuation scenario, this model could show adaptability for the different total numbers of pedestrians.
articleNo Access
Cooperation in evolutionary games incorporated with extended Q-learning algorithm
- Pinduo Long,
- Qionglin Dai,
- Haihong Li, and
- Junzhong Yang
International Journal of Modern Physics C17 Aug 2024
Preview Abstract
Evolutionary game theory provides a platform to investigate the emergence of cooperation in population consisting of selfish agents. In this work, we study evolutionary games on networks in which agents cooperate or defect according to Q-learning algorithms with extended state space. Extended state space provides agents two types of information, local environment information based on the cooperation level in agents’ neighborhood and personal information based on the last action of agents. Through numerical simulations, we find that rich information on local environment tends to improve cooperation in the population no matter whether personal information is present or not. Moreover, we show that, for the same local environment information, the introduction of personal information may improve cooperation except for the situations with low amount of local environment information where personal information deteriorates cooperation in bad-condition environment. For the same total information, the absence of personal information promotes cooperation in bad-condition environment while the presence of personal information promotes cooperation in good-condition environment. By investigating the distributions and temporal behaviors of Q-values, we present explanations for the above statements. This work suggests an effective way of extending the state space in evolutionary games incorporated with Q-learning algorithm to enhance cooperation.
articleNo Access
CATEGORIZATION OF CONTINUOUS NUMERIC PERCEPTS BY MODIFIED FUZZY ART WITH Q-LEARNING
International Journal of Pattern Recognition and Artificial Intelligence01 Mar 2006
Preview Abstract
We propose a new method to categorize continuous numeric percepts for Q-learning, where percept vectors are classified into categories on the basis of fuzzy ART and Q-learning uses categories as states to acquire rules for agent behavior. For efficient learning, we modify fuzzy ART to reduce the number of categories without deteriorating the efficiency of reinforcement learning. In our modification, a vigilance parameter is defined for each category in order to control the size of a category and it is updated during learning. The method to update a vigilance parameter is based on category integration, which contributes to reducing the number of categories. Here, we define the similarity for any category pair to judge whether category integration should be performed or not. When two categories are integrated into a new category, a vigilance parameter for the category is calculated and categories used for integration are discarded, so that the number of categories is reduced without restricting the number of categories. Experimental results show that Q-learning with modified fuzzy ART acquires good rules for agent behavior more efficiently than Q-learning with ordinary fuzzy ART, although the number of categories generated by modified fuzzy ART is much less than that generated by ordinary one.
articleNo Access
Weighted Densely Connected Convolutional Networks for Reinforcement Learning
- Min Xia,
- Wenzhu Song,
- Xudong Sun,
- Jia Liu,
- Tao Ye, and
- Yiqing Xu
International Journal of Pattern Recognition and Artificial Intelligence07 Aug 2019
Preview Abstract
A weighted densely connected convolution network (W-DenseNet) is proposed for reinforcement learning in this work. The W-DenseNet can maximize the information flow between all layers in the network by cross layer connection, which can reduce the phenomenon of gradient vanishing and degradation, and greatly improves the speed of training convergence. The weight coefficient introduced in W-DenseNet, the current layer received all the previous layers’ feature maps with different initial weights, which can extract feature information of different layers more effectively according to tasks. According to the weight adjusted by learning, the cross-layer connection is pruned to remove the cross-layer connection with smaller weight, so as to reduce the number of cross-layer. In this work, GridWorld and FlappyBird games are used for simulation. The simulation results of deep reinforcement learning based on W-DenseNet are compared with the traditional deep reinforcement learning algorithm and reinforcement learning algorithm based on DenseNet. The simulation results show that the proposed W-DenseNet method can make the results more convergent, reduce the training time, and obtain more stable results.
articleNo Access
Research on Full Coverage Path Planning Based on Reinforcement Learning in Nuclear Environment
- Shiqi Wang,
- Shuzong Song,
- Zhenni Liu, and
- Lijun Ma
Journal of Circuits, Systems and Computers22 Aug 2023
Preview Abstract
In this paper, we study the path-planning problem of emergency fire control robots in the nuclear environment. Given the high risk of the atomic environment, the irregularity of spatial shape, and the complex distribution of obstacles, a robot path planning method is proposed based on the combination of $Q$ $Q$ -learning and BCD raster map decomposition method. It realizes the automatic elimination control of the nuclear-contaminated environment and reduces the exposure risk of manual intervention operation. First, $Q$ $Q$ -learning, a reinforcement learning model, is used to establish the optimal path between the start and end points of the operation area. Second, the BCD raster map decomposition method is used to realize the global division of the operation area. Then, an improved partition merging method based on the $Q$ $Q$ -learning optimal path is proposed to complete the job sub-region merging and cover path planning. Finally, the simulation experiment proves that the technique can quickly and stably achieve the global path coverage of the unique operating environment of the nuclear domain.
articleNo Access
CHAOTIC EXPLORATION EFFECTS ON REINFORCEMENT LEARNING IN SHORTCUT MAZE TASK
International Journal of Bifurcation and Chaos01 Oct 2006
Preview Abstract
Reinforcement learning is usually required in the process of trial and error called exploration, and the uniform pseudorandom number generator is considered effective in that process. As a generator for the exploration, chaotic sources are also useful in creating a random-like sequence such as in the case of stochastic sources. In this research, we investigate the efficiency of the deterministic chaotic generator for the exploration in learning a nonstationary shortcut maze problem. As a result, it is found that the deterministic chaotic generator based on the logistic map is better in the performance of the exploration than in the stochastic random generator. This has been made clear by analyzing the difference of the performances between the two generators in terms of the patterns of exploration occurrence. We also examine the tent map, which is homeomorphic to the logistic map, compared with other generators.
articleNo Access
STATE-ACTION VALUE FUNCTION MODELED BY ELM IN REINFORCEMENT LEARNING FOR HOSE CONTROL PROBLEMS
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems31 Oct 2013
Preview Abstract
This paper addresses the problem of efficiency in reinforcement learning of Single Robot Hose Transport (SRHT) by training an Extreme Learning Machine (ELM) from the state-action value Q-table, obtaining large reduction in data space requirements because the number of ELM parameters is much less than the Q-table's size. Moreover, ELM implements a continuous map which can produce compact representations of the Q-table, and generalizations to increased space resolution and unknown situations. In this paper we evaluate empirically three strategies to formulate ELM learning to provide approximations to the Q-table, namely as classification, multi-variate regression and several independent regression problems.
articleNo Access
REINFORCED CONTRAST ADAPTATION
- HAMID R. TIZHOOSH and
- GRAHAM W. TAYLOR
International Journal of Image and Graphics01 Jul 2006
Preview Abstract
Traditional image enhancement algorithms do not account for the subjective evaluation of human operators. Every observer has a different opinion of an ideally enhanced image. Automated Techniques for obtaining a subjectively ideal image enhancement are desirable, but currently do not exist. In this paper, we demonstrate that Reinforcement Learning is a potential method for solving this problem. We have developed an agent that uses the Q-learning algorithm. The agent modifies the contrast of an image with a simple linear point transformation based on the histogram of the image and feedback it receives from human observers. The results of several testing sessions have indicated that the agent performs well within a limited number of iterations.
articleNo Access
Adaptive Agent-Driven Routing and Load Balancing in Communication Networks
Advances in Complex Systems01 Jun 1998
Preview Abstract
This paper presents an unified overview of a new family of distributed algortithms for routing and load balancing in dynamic communication networks. These new algorithms are described as an extension to the classical routing algorithms: they combine the ideas of online asynchronous distance vector routing with adaptive link state routing. Estimates of the current traffic condition and link costs are measured by sending routing agents in the network that mix with the regular information packets and keep track of the costs (e.g. delay) encountered during their journey. The routing tables are then regularly updated based on that information without any central control nor complete knowledge of the network topology. Two new algorithms are proposed here. The first one is based on round trip routing agents that update the routing tables by backtracking their way after having reached the destination. The second one relies on forward agents that update the routing tables directly as they move toward their destination. An efficient co-operative scheme is proposed to deal with asymmetric connections. All these methods are compared on a simulated network with various traffic loads; the robustness of the new algorithms to network changes is proved on various dynamic scenarii.
articleNo Access
FROM CHOICE OF PROCUREMENT STRATEGY TO SUPPLY NETWORK CONFIGURATION: AN EVOLUTIONARY APPROACH
International Journal of Information Technology & Decision Making01 Jan 2010
Preview Abstract
This study examines the complex and evolutionary nature of supply network configuration. Taking a bottom-up approach, we examine how supply network configuration at the macro-level evolves as a result of individual retailers' dynamic choice of procurement strategies at the micro-level. Employing agent-based modeling, we focus on the effects of switch cost and distributors' ordering policy on the evolution of supply network configuration. Our results show that (1) supply networks tend to evolve into a set of separate supply chains when switch cost is high and into an integrated network when switch cost is low, (2) a responsive ordering policy adopted by distributors is more conducive to the integrated network configuration than a non-responsive policy, and (3) lack of coordination among retailers in their dynamic choice of procurement strategies hurts not only the overall system performance, but also retailers themselves. More importantly, our study demonstrates the capabilities of agent-based modeling as a methodology for researching complex supply network issues.
articleNo Access
A Proactive Decision Support System for Online Event Streams
- Yongheng Wang,
- Xiaozan Zhang, and
- Zengwang Wang
International Journal of Information Technology & Decision Making01 Nov 2018
Preview Abstract
In-stream big data processing is an important part of big data processing. Proactive decision support systems can predict future system states and execute some actions to avoid unwanted states. In this paper, we propose a proactive decision support system for online event streams. Based on Complex Event Processing (CEP) technology, this method uses structure varying dynamic Bayesian network to predict future events and system states. Different Bayesian network structures are learned and used according to different event context. A networked distributed Markov decision processes model with predicting states is proposed as sequential decision making model. A Q-learning method is investigated for this model to find optimal joint policy. The experimental evaluations show that this method works well for congestion control in transportation system.
articleNo Access
Analyzing and Reducing Energy Usage in a Humanoid Robot During Standing Up and Sitting Down Tasks
International Journal of Humanoid Robotics29 Nov 2016
Preview Abstract
The aim of this paper is to reduce the energy consumption of a humanoid by analyzing electrical power as input to the robot and mechanical power as output. The analysis considers motor dynamics during standing up and sitting down tasks. The motion tasks of the humanoid are described in terms of joint position, joint velocity, joint acceleration, joint torque, center of mass (CoM) and center of pressure (CoP). To reduce the complexity of the robot analysis, the humanoid is modeled as a planar robot with four links and three joints. The humanoid robot learns to reduce the overall motion torque by applying Q-Learning in a simulated model. The resulting motions are evaluated on a physical NAO humanoid robot during standing up and sitting down tasks and then contrasted to a pre-programmed task in the NAO. The stand up and sit down motions are analyzed for individual joint current usage, power demand, torque, angular velocity, acceleration, CoM and CoP locations. The overall result is improved energy efficiency between 25–30% when compared to the pre-programmed NAO stand up and sit down motion task.
articleNo Access
Hierarchical fuzzy ART for Q-learning and its application in air combat simulation
- Yanan Zhou,
- Yaofei Ma,
- Xiao Song, and
- Guanghong Gong
International Journal of Modeling, Simulation, and Scientific Computing01 Dec 2017
Preview Abstract
Value function approximation plays an important role in reinforcement learning (RL) with continuous state space, which is widely used to build decision models in practice. Many traditional approaches require experienced designers to manually specify the formulization of the approximating function, leading to the rigid, non-adaptive representation of the value function. To address this problem, a novel Q-value function approximation method named ‘Hierarchical fuzzy Adaptive Resonance Theory’ (HiART) is proposed in this paper. HiART is based on the Fuzzy ART method and is an adaptive classification network that learns to segment the state space by classifying the training input automatically. HiART begins with a highly generalized structure where the number of the category nodes is limited, which is beneficial to speed up the learning process at the early stage. Then, the network is refined gradually by creating the attached sub-networks, and a layered network structure is formed during this process. Based on this adaptive structure, HiART alleviates the dependence on expert experience to design the network parameter. The effectiveness and adaptivity of HiART are demonstrated in the Mountain Car benchmark problem with both fast learning speed and low computation time. Finally, a simulation application example of the one versus one air combat decision problem illustrates the applicability of HiART.
chapterNo Access
Apply Deep Reinforcement Learning with Quantum Computing on the Pricing of American Options
- Junzheng Yang
Internet Finance and Digital Economy24 Aug 2023
Preview Abstract
American options are important financial products traded in enormous volumes across the world. Therefore, accurate and efficient valuation is of paramount importance for global financial markets. Due to the early exercise feature, the pricing of American options is significantly more complicated than European options, and an analytical closed-form solution is unavailable even for simple dynamic models. Practitioners employ various valuation methods to strike the balance: accurate valuation usually suffers inefficiency, while fast valuation likely leads to inaccuracy. In this paper, we provide an innovative solution to address both the accuracy and efficiency issues of pricing American options by applying quantum reinforcement learning. Meanwhile, the quantum part of the new approach would potentially speed up the calculation dramatically.
chapterNo Access
A REINFORCEMENT LEARNING WITH COLLECTIVE ACTIONS FOR GENERATING A MARSHALING PLAN OF FREIGHT TRAIN
- Y. HIRASHIMA
IAENG Transactions on Engineering Sciences27 Jun 2017
Preview Abstract
This chapter introduces a new reinforcement learning method to solve a train marshaling problem for assembling an outgoing train. In the problem, the arrangement of incoming freight cars are assumed to be random. Then, the cars are rearranged to the desired layout in order to assemble an outgoing train. In the proposed method, each set of freight cars that have the same destination make a group, and the desirable group layout constitutes the best outgoing train. Then, a rearrange operation is conducted by using several subtracks and an outgoing train is assembled in the main track. When a rearrangement operation is conducted in the proposed method, several cars located on different subtracks are collected by a locomotive. In order to rearrange cars by the desirable order, cars are removed from a sub-track to another sub-track. Each marshaling plan that consists of series of removal and rearrangement operations are generated by a reinforcement learning system based on the transfer distance of a locomotive. The total transfer distance of the locomotive required to assemble an outgoing train can be minimized by the proposed method.
chapterNo Access
LEARNING ROBUST AND ENERGY-EFFICIENT BIPED WALKING PATTERNS USING QWALKING
- J. MA and
- S. A. CAMERON
Mobile Robotics01 Aug 2009
Preview Abstract
QWalking uses Q-Lcarning to find optimal walking patterns for biped robots using two novel features, firstly, QWalking does not require a precise dynamics model, thus a robot, can learn to dynamically switch walking patterns without knowing exactly the position of the centre of mass nor the angular momentum of each link. Secondly, inspired by the psychological reward system in a human's brain, we introduce two (i.e., positive and negative) rewards and a punishment rate to adjust the weight of these two rewards. In this paper, our contributions include the exploration of learning robust and energy-efficient walking patterns using QWalking. We evaluate QWalking in the context of RoboCup 3D simulator SPARK based on a NAO robot, and the robot successfully yields fast walking patterns adapting to different mass of the torso. We also explore how to yield energy-efficient walking patterns using QWalking. Experimental result shows explicit decrease of the mechanical cost of transport (MCT).
chapterNo Access
LEARNING FAST WALKING PATTERNS FOR A NAO ROBOT USING QWALKING
- J. MA and
- S.A. CAMERON
Field Robotics01 Aug 2011
Preview Abstract
QWalking is a model-free biped locomotion method that uses Q-Learning to find optimal walking patterns for a humanoid robot. Previously tested in a simulated environment, here we investigate the implementation of QWalking on a real NAO robot to yield fast locomotion patterns. We compare related work with respect to locomotion on the NAO robot and discuss the software infrastructure in order to use QWalking with it. Experimental results show that it develops fast walking patterns that are competitive to the current fastest approach published for this robot, with the advantage of not requiring precise dynamic models or extensive state sensing.
chapterNo Access
AN INTELLIGENT TRAIN MARSHALING BASED ON THE PROCESSING TIME CONSIDERING GROUP LAYOUT OF FREIGHT CARS
- YOICHI HIRASHIMA
IAENG Transactions on Electrical Engineering Volume 101 Jan 2013
Preview Abstract
In this chapter, an intelligent method for generating marshaling plan of freight cars in a train is introduced. Initially, freight cars are located in a freight yard by the random layout, and they are moved and lined into a main track in a certain desired order in order to assemble an out bound train. Based on the processing time, Marshaling plans are obtained by a reinforcement learning system. In order to evaluate the processing time, the total transfer distance of a locomotive and the total movement counts of freight cars are simultaneously considered. Moreover, by grouping freight cars that have the same destination, candidates of the desired arrangement of the outbound train is extended. This feature is considered in the learning algorithm, so that the total processing time is reduced. Then, the order of movements of freight cars, the position for each removed car, the layout of groups in a train, the arrangement of cars in a group and the number of cars to be moved are simultaneously optimized to achieve minimization of the total processing time for obtaining the desired arrangement of freight cars for an outbound train.

back

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Narrow Results

Filters Applied

Publication Type

Article Type

Publication Date

Author

Publication/Book Series

Subjects

Access

Study on pedestrian evacuation model based on reinforcement learning

Cooperation in evolutionary games incorporated with extended Q-learning algorithm

CATEGORIZATION OF CONTINUOUS NUMERIC PERCEPTS BY MODIFIED FUZZY ART WITH Q-LEARNING

Weighted Densely Connected Convolutional Networks for Reinforcement Learning

Research on Full Coverage Path Planning Based on Reinforcement Learning in Nuclear Environment

CHAOTIC EXPLORATION EFFECTS ON REINFORCEMENT LEARNING IN SHORTCUT MAZE TASK

STATE-ACTION VALUE FUNCTION MODELED BY ELM IN REINFORCEMENT LEARNING FOR HOSE CONTROL PROBLEMS

REINFORCED CONTRAST ADAPTATION

Adaptive Agent-Driven Routing and Load Balancing in Communication Networks

FROM CHOICE OF PROCUREMENT STRATEGY TO SUPPLY NETWORK CONFIGURATION: AN EVOLUTIONARY APPROACH

A Proactive Decision Support System for Online Event Streams

Analyzing and Reducing Energy Usage in a Humanoid Robot During Standing Up and Sitting Down Tasks

Hierarchical fuzzy ART for Q-learning and its application in air combat simulation

Apply Deep Reinforcement Learning with Quantum Computing on the Pricing of American Options

A REINFORCEMENT LEARNING WITH COLLECTIVE ACTIONS FOR GENERATING A MARSHALING PLAN OF FREIGHT TRAIN

LEARNING ROBUST AND ENERGY-EFFICIENT BIPED WALKING PATTERNS USING QWALKING

LEARNING FAST WALKING PATTERNS FOR A NAO ROBOT USING QWALKING

AN INTELLIGENT TRAIN MARSHALING BASED ON THE PROCESSING TIME CONSIDERING GROUP LAYOUT OF FREIGHT CARS