Annotations are here:
dataset/annotation_files/jdocqa_train_all.json
dataset/annotation_files/jdocqa_validation_all.json
dataset/annotation_files/jdocqa_test_all.json
Annotation entries:
'pdf_category': Document category.
'pdf_name': PDF name.
'question_page_number': Where annotators found answer of the questions.
'answer_type': (1):Yes/No questions, (2)Factoid questions, (3)Numerical questions, (4)Open-ended questions.
'type_of_image': (1): Table, (2): Bar chart, (3): Line chart, (4): Pie chart, (5): Map, (6): Other figures, (7): Mixtured writing style from left to the right and from upside to the downside, (8): Drawings, (9): Others. Note that this enrty is for statistical purpose in our paper and some labels are missing.
'original_context': Extracted texts from PDF.
'context': Removed noises from 'original_context'.
'original_question': Annotated questions.
'question': Question query for models.
'original_answer': Annotated answers.
'no_reason': Unanswerable question-> 0, Answerable question-> 1, Multi page question -> 2. They can be jointly flagged such as `1,2`.
We renamed the several category names upon the paper for the interpretability.
Document
category in the PDF set asReport
in the paper.Kouhou
category in the PDF set asPamplet
in the paper.
Please download from here.