Name		Name	Last commit message	Last commit date
parent directory ..
annotation_files		annotation_files
README.md		README.md

README.md

Annotation files

Annotations are here:

dataset/annotation_files/jdocqa_train_all.json
dataset/annotation_files/jdocqa_validation_all.json
dataset/annotation_files/jdocqa_test_all.json

Annotation entries:

'pdf_category': Document category.
'pdf_name': PDF name.
'question_page_number': Where annotators found answer of the questions.
'answer_type': (1):Yes/No questions, (2)Factoid questions, (3)Numerical questions, (4)Open-ended questions.
'type_of_image': (1): Table, (2): Bar chart, (3): Line chart, (4): Pie chart, (5): Map, (6): Other figures, (7): Mixtured writing style from left to the right and from upside to the downside, (8): Drawings, (9): Others. Note that this enrty is for statistical purpose in our paper and some labels are missing.
'original_context': Extracted texts from PDF. 
'context': Removed noises from 'original_context'.
'original_question': Annotated questions.
'question': Question query for models.
'original_answer': Annotated answers.
'no_reason': Unanswerable question-> 0, Answerable question-> 1, Multi page question -> 2. They can be jointly flagged such as `1,2`.

pdf_category

We renamed the several category names upon the paper for the interpretability.

Document category in the PDF set as Report in the paper.
Kouhou category in the PDF set as Pamplet in the paper.

Annotated documents

Please download from here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

README.md

Annotation files

pdf_category

Annotated documents

Files

dataset

Directory actions

More options

Directory actions

More options

Latest commit

History

dataset

Folders and files

parent directory

README.md

Annotation files

pdf_category

Annotated documents