# Publications

### Learning Domain-Independent Planning Heuristics with Hypergraph Networks

We present the first approach capable of learning domain-independent planning heuristics entirely from scratch. The heuristics we learn map the hypergraph representation of the delete-relaxation of the planning problem at hand, to a cost estimate that approximates that of the least-cost path from the current state to the goal through the hypergraph. We generalise Graph Networks to obtain a new framework for learning over hypergraphs, which we specialise to learn planning heuristics by training over state/value pairs obtained from optimal cost plans. Our experiments show that the resulting architecture, STRIPS-HGNs, is capable of learning heuristics that are competitive with existing delete-relaxation heuristics including LM-cut. We show that the heuristics we learn are able to generalise across different problems and domains, including to domains that were not seen during training.

### Learning Delete-Relaxation Heuristics over Hypergraphs

This paper is subsumed by Learning Domain-Independent Planning Heuristics with Hypergraph Networks in the International Conference on Automated Planning and Scheduling (ICAPS), 2020.

### Learning Heuristics for Planning with Hypergraph Networks

Planning is the fundamental ability of an intelligent agent to reason about what decisions it should make in a given environment to achieve a certain set of goals. State-of-the-art planners use forward-chaining state space search guided by a heuristic. This method of search incrementally builds the search tree from the initial state until a goal is reached, whilst the heuristic is used to guide the search algorithm to what the heuristic identifies as promising parts of the search space.

Deep Learning harnesses the power of neural networks to automatically learn powerful features and knowledge from experience. Although deep learning has gained immense popularity in fields including computer vision and natural language processing, there is still no consensus as to what deep learning architecture is best for reasoning and decision making tasks such as planning. Recent work in deep learning for planning is primarily concerned with learning which planner to apply for a given problem (planner selection), or learning generalised policies which are functions that select which action should be applied by an agent in a given state.

In contrast, this thesis focuses on how we may harness the power of deep learning to learn heuristic functions which exploit the structure of planning problems. We investigate how we may learn heuristics which generalise to problems of larger size than the problems a neural network was trained on within a single domain (i.e., a domain-dependent heuristic). Additionally, we explore the feasibility of learning domain-independent heuristics which are able to generalise across domains.

Our work makes three key contributions. Our first contribution is Hypergraph Networks (HGNs), our novel framework which generalises Graph Networks [Battaglia et al., 2018] to hypergraphs. A hypergraph is a generalisation of a normal graph in which a hyperedge may connect any number of vertices together. The HGN framework may be used to design new hypergraph deep learning models, and inherently supports combinatorial generalisation to hypergraphs with different numbers of vertices and hyperedges. Our second contribution is STRIPS-HGNs, an instance of a Hypergraph Network which is designed to learn heuristics by approximating shortest paths over the hypergraph induced by the delete relaxation of a STRIPS problem. STRIPS-HGNs use a powerful recurrent encode-process-decode architecture which allow them to incrementally propagate messages within the hypergraph in latent space. Our third and final contribution is our detailed empirical evaluation, which rigorously defines the Hypergraph Network configurations and training procedure we use in our experiments. We train and evaluate our STRIPS-HGNs on a variety of domains and show that they are able learn domain-dependent and domain-independent heuristics which potentially outperform $h^{max}$, $h^{add}$ and LM-cut.

### Guiding Search with Generalized Policies for Probabilistic Planning

We examine techniques for combining generalized policies with search algorithms to exploit the strengths and overcome the weaknesses of each when solving probabilistic planning problems. The Action Schema Network (ASNet) is a recent contribution to planning that uses deep learning and neural networks to learn generalized policies for probabilistic planning problems. ASNets are well suited to problems where local knowledge of the environment can be exploited to improve performance, but may fail to generalize to problems they were not trained on. Monte-Carlo Tree Search (MCTS) is a forward-chaining state space search algorithm for optimal decision making which performs simulations to incrementally build a search tree and estimate the values of each state. Although MCTS can achieve state-of-the-art results when paired with domain-specific knowledge, without this knowledge, MCTS requires a large number of simulations in order to obtain reliable state-value estimates. By combining ASNets with MCTS, we are able to improve the capability of an ASNet to generalize beyond the distribution of problems it was trained on, as well as enhance the navigation of the search space by MCTS.

### Guiding MCTS with Generalized Policies for Probabilistic Planning

We examine techniques for combining generalized policies with search algorithms to exploit the strengths and overcome the weaknesses of each when solving probabilistic planning problems. The Action Schema Network (ASNet) is a recent contribution to planning that uses deep learning and neural networks to learn generalized policies for probabilistic planning problems. ASNets are well suited to problems where local knowledge of the environment can be exploited to improve performance, but may fail to generalize to problems they were not trained on. Monte-Carlo Tree Search (MCTS) is a forward-chaining state space search algorithm for optimal decision making which performs simulations to incrementally build a search tree and estimate the values of each state. Although MCTS can achieve state-of-the-art results when paired with domain-specific knowledge, without this knowledge, MCTS requires a large number of simulations in order to obtain reliable estimates in the search tree. By combining ASNets with MCTS, we are able to improve the capability of an ASNet to generalize beyond the distribution of problems it was trained on, as well as enhance the navigation of the search space by MCTS.

### Action Schema Networks with Monte-Carlo Tree Search: The Best of Both Worlds

Planning is the essential ability of an intelligent agent to solve the problem of choosing which action to take in an environment to achieve a certain goal. Planning is an extremely important field within Artificial Intelligence that has countless real-world applications, including robotics and operations research.

Monte-Carlo Tree Search (MCTS) is a state-space search algorithm for optimal decision making that relies on performing Monte-Carlo simulations to incrementally build a search tree, and estimate the values of each state. MCTS can often achieve state-of-the-art performance when combined with domain-specific knowledge. However, without this knowledge, MCTS requires a large number of simulations in order to obtain reliable estimates in the search tree.

The Action Schema Network (ASNets) [Toyer et al., 2018] is a very recent contribution in planning that uses deep learning and neural networks to learn generalized policies for planning problems. ASNets are well suited to problems where the “local knowledge of the environment can help to avoid certain traps”. However, like most machine learning algorithms, an ASNet may fail to generalize to problems that it was not trained on. For example, this could be due to a poor choice of hyperparameters that lead to an undertrained or overtrained network.

This research project is concerned with investigating how we can improve upon the policy learned by an ASNet by combining it with MCTS. Our project has three key contributions. The first contribution is an ingredient-based framework for MCTS that allows us to specify different flavors of MCTS – including those which use the policy learned by an ASNet. Our second contribution is two new methods which allow us to use ASNets to perform simulations in MCTS, and hence directly affect the estimated values of states in the search tree. Our third and final contribution is two new methods for using ASNets in the selection phase of MCTS. This allows us to bias the navigation of the search space towards what an ASNet believes is promising.

Our empirical evaluation demonstrates that by combining MCTS with ASNets, we are able to ‘learn’ what the network did not learn during training, improve suboptimal learning, and be robust to changes in the environment or the domain. We can more reliably and effectively solve planning problems when combining MCTS with ASNets, and hence we achieve the best of both worlds.