Defining And Detecting Inconsistent System Behavior in Task-oriented Dialogues

Publication at Faculty of Mathematics and Physics |

2021

Abstract

We present experiments on automatically detecting inconsistent behavior of task-oriented dialogue systems from the context. We enrich the bAbI/DSTC2 data (Bordes et al., 2017) with automatic annotation of dialogue inconsistencies, and we demonstrate that inconsistencies correlate with failed dialogues.

We hypothesize that using a limited dialogue history and predicting the next user turn can improve inconsistency classiﬁcation. While both hypotheses are conﬁrmed for a memory-networks-based dialogue model, it does not hold for a training based on the GPT-2 language model, which benefits most from using full dialogue history and achieves a 0.99 accuracy score.

Keywords

defining detecting inconsistent system behavior task oriented dialogues