Live Demo
Command: x
Command pattern: x
Action sequence: x
ReaSCAN: Compositional Reasoning in Language Grounding
ReaSCAN is a synthetic navigation task that requires models to reason about surroundings over syntactically difficult languages.
Release Notes
- 11/28/2021: We release newer version of non-generalization testing sets for different command patterns as ReaSCAN-v1.1.zip.
- 07/29/2021: Our paper is accepted to NeurIPS2021 with OpenReview.
- 06/17/2021: We update model performance results by fixing known issues. We include more compositional splits as well.
- 06/07/2021: We submit our preprint to NeurIPS2021.
Getting Started
Step 1: Download ReaSCAN
We generated ReaSCAN using our pipeline with fixed random seeds. You can reproduce the version of ReaSCAN we use in the paper by running the pipeline. Additionally, we also update the version we use to a online folder where you can directly download and use as-it-is. Note that, the dataset files are really large. It may take a while to download them.
Our generated data is in ReaSCAN-v1.1.zip (Note that we updated our files to hotfix some of existing issues on 06/16/2021. We also included newer non-generalization testing sets on 11/28/2021), which is saved in a shared drive. The dataset consists subsets generated for different patterns (P1: Simple (similar to gSCAN), P2: 1-relative-clause, P3: 2-relative-clauses, P4: 3-relative-clauses) and different compositional splits (see our paper for details about each split).
Random splits that can be used for training your models,
ReaSCAN-compositional
: ReaSCAN all commands, containing train, dev and test sets.ReaSCAN-compositional-p1
: ReaSCAN Simple set, containing train, dev and test sets.ReaSCAN-compositional-p2
: ReaSCAN 1-relative-clause set, containing train, dev and test sets.ReaSCAN-compositional-p3
: ReaSCAN 2-relative-clauses set, containing train, dev and test sets.ReaSCAN-compositional-p1-test
: ReaSCAN Simple set, containing test set only. Model performance is reported in the paper.ReaSCAN-compositional-p2-test
: ReaSCAN 1-relative-clause set, containing test set only. Model performance is reported in the paper.ReaSCAN-compositional-p3-test
: ReaSCAN 2-relative-clauses set, containing test set only. Model performance is reported in the paper.ReaSCAN-compositional-p1-test-updated
: UPDATED ReaSCAN Simple set, containing test set only. Model performance is NOT reported in the paper.ReaSCAN-compositional-p2-test-updated
: UPDATED ReaSCAN 1-relative-clause set, containing test set only. Model performance is NOT reported in the paper.ReaSCAN-compositional-p3-test-updated
: UPDATED ReaSCAN 2-relative-clauses set, containing test set only. Model performance is NOT reported in the paper.ReaSCAN-compositional-p3-rd
: ReaSCAN 2-relative-clauses set with random distractors, containing train, dev and test sets.
Compositional splits that are designed to be zero-shot testing splits,
ReaSCAN-compositional-a1
: ReaSCAN A1 (novel color modifier) compositional split, containing test set only.ReaSCAN-compositional-a2
: ReaSCAN A2 (novel color attribute) compositional split, containing test set only.ReaSCAN-compositional-a3
: ReaSCAN A3 (novel size modifier) compositional split, containing test set only.ReaSCAN-compositional-b1
: ReaSCAN B1 (novel co-occurence of objects) compositional split, containing test set only.ReaSCAN-compositional-b2
: ReaSCAN B2 (novel co-occurence of relations) compositional split, containing test set only.ReaSCAN-compositional-c1
: ReaSCAN C1 (novel conjunctive clause length) compositional split, containing test set only.ReaSCAN-compositional-c2
: ReaSCAN C2 (novel relative clauses) compositional split, containing test set only.
You can also generate your own compositional splits by modifying couple lines in code/dataset/generate_ReaSCAN_splits.ipynb
.
Step 2: Loading ReaSCAN
Once you generate the dataset .txt
file (in json
format), you can simply load any dataset as,
import json
path_to_data = "data-compositional-splits.txt"
logger.info(f"Reading dataset from file: {p1_path_to_data}...")
data_json = json.load(open(path_to_data, "r"))
print(data_json["examples"].keys())
Leaderboard
This section contains the leaderboard for scores obtained by papers on ReaSCAN. To add scores please consider a pull request.
M-LSTM [1] | GCN-LSTM [2] | |
---|---|---|
Random | 79.04 +- 1.24 | 98.96 +- 0.59 |
A1: novel color modifier | 50.36 +- 4.03 | 92.25 +- 0.77 |
A2: novel color attribute | 14.65 +- 0.55 | 42.05 +- 4.55 |
A3: novel size modifier | 50.98 +- 3.69 | 87.46 +- 2.22 |
B1: novel co-occurrence of objects | 52.17 +- 1.63 | 69.74 +- 0.30 |
B2: novel co-occurrence of relations | 39.41 +- 1.53 | 52.80 +- 2.75 |
C1: novel conjunctive clause length | 49.68 +- 2.73 | 57.01 +- 7.99 |
C2: novel relative clauses | 25.74 +- 1.36 | 22.07 +- 2.66 |
Avg ReaSCAN Score | 40.43 | 60.48 |
[1] Laura Ruis, Jacob Andreas, Marco Baroni, Diane Bouchacourt, Brenden M. Lake. 2020. “A Benchmark for Systematic Generalization in Grounded Language Understanding “ in NeurIPS 2020.
[2] Tong Gao, Qi Huang, Raymond J. Mooney. 2020. “Systematic Generalization on gSCAN with Language Conditioned Embedding” in AACL-IJCNLP 2020.
Caveats: The random split here is the same one used in our paper. Numbers may change with updated random split.
Usage
If you are using ReaSCAN, please consider to cite our paper as,
@article{wu-etal-2021-reascan,
title={Rea{SCAN}: Compositional Reasoning in Language Grounding},
author={Wu, Zhengxuan and Kreiss, Elisa and Ong, Desmond C. and Potts, Christopher},
journal={NeurIPS 2021 Datasets and Benchmarks Track},
url={https://openreview.net/forum?id=Rtquf4Jk0jN},
year={2021}}