Query & annotation representation

Name

Each feature file is named as [File ID].pkl which corresponds to the file ID in Flickr30K Entities.

Feature organization

Each file is a dictionary containing information for query features and annotations. Each domain's name and content is listed below:
height: Height of each image.
width: Width of each image.
sen_ids: It is an N dimensional vector. N represents the number of queries for each image. The i-th element corresponds to the i-th query's sentence ID. (In Flickr30K Entities, each image has ~5 sentences, each sentence has several queries).
gt_pos_all: It is a list of length N. The i-th element of this list is also a list, recording the i-th query's positive proposals' IDs among the 100 proposals generated by Selective Search or Edge Box. The positive proposals are defined as the proposals with an Intersection of Union (IoU) larger than 0.5 for the corresponding ground truth bounding box of the i-th query.
pos_id: It is an N dimensional vector. The i-th element represent the proposal ID which covers most with ground truth bounding box for the i-th query. If the most covered proposal's IoU is less than 0.5, we replace the proposal ID as -1.
ss_box: It is a 100 x 4 matrix. Each row represents a proposal's coordinate information generated by Selective Search (For Referit Game dataset, the proposals are generated by Edge Box). Each row is in the formate of [xmin, ymin, xmax, ymax].
sen_lang_token: It is a list of length N. The i-th element of this list is also a list, which represents the word ID sequence of the i-th query.
bbx_reg: It is an N x 4 matrix. The i-th row represents the regression feature for the proposal whose ID is pos_id[i]. If there is no candidate proposal (pos_id[i] = -1), we set the i-th regression feature as an all-zero vector. The regression features' calculation method is the same as the Equation 2 in Faster-RCNN paper.
bbx_reg_all: It is a list of length N. The i-th element of the list is a len(gt_pos_all[i]) x 4 matrix. The j-th row of this matrix represents the regression feature of the proposal whose ID is gt_pos_all[i][j].
gt_box: It is an N x 4 matrix. The i-th row represents the ground truth bounding box annotation for the i-th query. The annotation is in the form of [xmin, ymin, xmax, ymax].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Query & annotation representation

Name

Feature organization

Files

README.md

Latest commit

History

README.md

File metadata and controls

Query & annotation representation

Name

Feature organization