Learning Heuristics for Planning with Hypergraph Networks


Planning is the fundamental ability of an intelligent agent to reason about what decisions it should make in a given environment to achieve a certain set of goals. State-of-the-art planners use forward-chaining state space search guided by a heuristic. This method of search incrementally builds the search tree from the initial state until a goal is reached, whilst the heuristic is used to guide the search algorithm to what the heuristic identifies as promising parts of the search space.

Deep Learning harnesses the power of neural networks to automatically learn powerful features and knowledge from experience. Although deep learning has gained immense popularity in fields including computer vision and natural language processing, there is still no consensus as to what deep learning architecture is best for reasoning and decision making tasks such as planning. Recent work in deep learning for planning is primarily concerned with learning which planner to apply for a given problem (planner selection), or learning generalised policies which are functions that select which action should be applied by an agent in a given state.

In contrast, this thesis focuses on how we may harness the power of deep learning to learn heuristic functions which exploit the structure of planning problems. We investigate how we may learn heuristics which generalise to problems of larger size than the problems a neural network was trained on within a single domain (i.e., a domain-dependent heuristic). Additionally, we explore the feasibility of learning domain-independent heuristics which are able to generalise across domains.

Our work makes three key contributions. Our first contribution is Hypergraph Networks (HGNs), our novel framework which generalises Graph Networks [Battaglia et al., 2018] to hypergraphs. A hypergraph is a generalisation of a normal graph in which a hyperedge may connect any number of vertices together. The HGN framework may be used to design new hypergraph deep learning models, and inherently supports combinatorial generalisation to hypergraphs with different numbers of vertices and hyperedges. Our second contribution is STRIPS-HGNs, an instance of a Hypergraph Network which is designed to learn heuristics by approximating shortest paths over the hypergraph induced by the delete relaxation of a STRIPS problem. STRIPS-HGNs use a powerful recurrent encode-process-decode architecture which allow them to incrementally propagate messages within the hypergraph in latent space. Our third and final contribution is our detailed empirical evaluation, which rigorously defines the Hypergraph Network configurations and training procedure we use in our experiments. We train and evaluate our STRIPS-HGNs on a variety of domains and show that they are able learn domain-dependent and domain-independent heuristics which potentially outperform $h^{max}$, $h^{add}$ and LM-cut.

Honours Thesis