concepts.benchmark.clevr.dataset.CLEVRDatasetUnwrapped#
- class CLEVRDatasetUnwrapped[source]#
Bases:
FilterableDatasetUnwrapped
The unwrapped CLEVR dataset.
Methods
get_metainfo
(index)- __add__(other)#
- __getitem__(index)[source]#
Get a sample from the dataset.
- Returns:
scene: the scene annotations (raw dict).
objects: the bounding boxes of the objects (a Tensor of shape [N, 4]).
image_index: the index of the image (int).
image_filename: the filename of the image (str).
image: the image (a Tensor of shape [3, H, W]).
question_index: the index of the question (int).
question_raw: the raw question (str).
question_raw_tokenized: the tokenized raw question (list of str).
question: the tokenized question, and mapped to integers (a Tensor of shape [T]).
question_type: the type of the question (str).
answer: the answer to the question (bool, int, or str).
attribute_{attr_name}: the attribute concept id for each object (a Tensor of shape [N]).
attribute_relation_{attr_name}: the attribute relation concept id for each pair of objects (a Tensor of shape [N, N], then flattened to [N * N]).
relation_{attr_name}: the relational concept id for each pair of objects (a Tensor of shape [N, N, NR], then flattened to [N * N * NR]).
- Return type:
a dict of annotations, including
- Parameters:
index (int)
- __init__(scenes_json, questions_json, image_root, image_transform, vocab_json, output_vocab_json, question_transform=None, incl_scene=True, incl_raw_scene=False)[source]#
Initialize the CLEVR dataset.
- Parameters:
scenes_json (str) – the path to the scenes json file.
questions_json (str) – the path to the questions json file.
image_root (str) – the root directory of the images.
image_transform (Callable) – the image transform (torchvision transform).
vocab_json (str | None) – the path to the vocab json file. If None, the vocab will be built from the dataset.
output_vocab_json (str | None) – the path to the output vocab json file. If None, the output vocab will be built from the dataset.
question_transform (Callable | None) – the question transform (a callable). If None, no transform will be applied.
incl_scene (bool) – whether to include the scene annotations (e.g., objects, relationships, etc.).
incl_raw_scene (bool) – whether to include the raw scene annotations.
- __iter__()#
- __new__(**kwargs)#
- get_metainfo(index)#