
Yan LI1,2, Xuejun ZHANG1,2, Chuxi WANG
@article{title="Strategic flight planning and conflict management for urban air mobility operations: a mission preference-constrained MARL approach",
author="Yan LI1,2, Xuejun ZHANG1,2, Chuxi WANG
journal="Journal of Zhejiang University Science A",
volume="-1",
number="-1",
pages="",
year="1998",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.A2600084"
}
%0 Journal Article
%T Strategic flight planning and conflict management for urban air mobility operations: a mission preference-constrained MARL approach
%A Yan LI1
%A 2
%A Xuejun ZHANG1
%A 2
%A Chuxi WANG
%A Yan SHEN1
%A 2
%A Chenglong LI1
%A 4
%J Journal of Zhejiang University SCIENCE A
%V -1
%N -1
%P
%@ 1673-565X
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.A2600084
TY - JOUR
T1 - Strategic flight planning and conflict management for urban air mobility operations: a mission preference-constrained MARL approach
A1 - Yan LI1
A1 - 2
A1 - Xuejun ZHANG1
A1 - 2
A1 - Chuxi WANG
A1 - Yan SHEN1
A1 - 2
A1 - Chenglong LI1
A1 - 4
J0 - Journal of Zhejiang University Science A
VL - -1
IS - -1
SP -
EP - 0
%@ 1673-565X
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.A2600084
Abstract: As an emerging low-altitude transportation paradigm, urban air mobility (UAM) is envisioned to support high-density and demand-driven operations involving diverse and flexible mission requests. However, the imbalance between limited urban airspace resources and growing operational demands inevitably causes frequent flight conflicts, posing significant challenges to safe and efficient operations. To address this issue, this paper proposes a multiagent reinforcement learning (MARL) approach to achieve strategic four-dimensional trajectory (4DT) flight planning and conflict management during the preflight window. First, a collaborative optimization framework is established, in which the deconfliction problem is formulated as a multiagent Markov decision process (MAMDP) to enable coordinated decision-making. Then, a mission preference-constrained MARL method is developed by integrating two specialized mechanisms into the multiagent deep deterministic policy gradient (MADDPG) algorithm to address UAM operational characteristics. Specifically, an action masking for mission preference (AMMP) mechanism is implemented to ensure execution compliance, and a hierarchical prioritized experience replay (HPER) mechanism is designed to improve learning efficiency. Simulation results demonstrate that the proposed AMMP-HPER-MADDPG (AH-MADDPG) method achieves an average conflict resolution rate exceeding 96% and a preference awareness rate of 100% in scenarios involving 100 flight plans, significantly outperforming other methods. The proposed approach provides an effective and adaptive solution for ensuring operational safety, mission preference, and flight efficiency in future UAM operations.
CLC number:
On-line Access: 2026-06-22
Received: 2026-02-05
Revision Accepted: 2026-06-16
Crosschecked: 0000-00-00
Cited: 0
Clicked: 22
Open peer comments: Debate/Discuss/Question/Opinion
<1>