Aerospace and Mechanical Insider on MSN

Hierarchical reinforcement learning boosts air defense efficiency

Modern air defense confrontations demand rapid, precise task assignments in environments where threats evolve within seconds.
Abstract: To address high dynamics, strong uncertainty, and decision-dimensional explosion in air combat, this paper constructs a PPO-based hierarchical tactical decision-making algorithm (PHT-PPO) ...
Abstract: This research article presents a comparison between two mainstream Deep Reinforcement Learning (DRL) algorithms, Asynchronous Advantage Actor-Critic (A3C) and Proximal Policy Optimization ...
A Deep Reinforcement Learning Algorithm with Multi-view Graph Attention Mechanism for Flexible Job Shop Scheduling Problem - githubxrw/FJSP-DRL-PPO ...
This repository contains implementations and comparisons of Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO) algorithms on standard reinforcement learning environments: ...
Morningstar Quantitative Ratings for Stocks are generated using an algorithm that compares companies that are not under analyst coverage to peer companies that do receive analyst-driven ratings.
PPO(Proximal Policy Optimization) 这个后来在 RLHF 和大模型训练中被广泛使用的经典算法,当年曾被 NIPS 2017 拒之门外。 这件事最近由 PPO 作者 John Schulman 本人提起。他只用一句话概括了这段往事:PPO,曾经被 NIPS 2017 拒了。