This repository contains the figures and annotations for LineCap, a dataset of line charts scraped from scientific figures each accompanied with a crowd-sourced caption describing the trends of individual lines in the figure. For more information, please refer to our IEEE VIS 2022 paper, LineCap: Line Charts for Data Visualization Captioning Models.
The collected annotations are saved in csv
files with the following columns:
image
: name of corresponding image filenum_lines
: number of line plots in the figure (1 to 5)trend_1
: description of line 1's trendtrend_2
: description of line 2's trendtrend_3
: description of line 3's trendtrend_4
: description of line 4's trendtrend_5
: description of line 5's trendtrend_overall
: description of the overall figure trend
<data_split>_no_values.csv
does not include any references to x and y values read from the figure. It replaces any references to these values in the trend descriptions with the token _value_
.
LineCap uses special tokens to refer to axis and line labels:
xlabel
: this token is used to refer to the x-axis label.ylabel
: this token is used to refer to the y-axis label.Line n
: lines are referred to by their number (1 to 5) instead of their labels.
The green text in the figure below shows how special tokens substitute references to line and axis labels in this dataset.
- The lines are numbered from top to bottom based on the order of their label in the figure legend.
- In case of ties (i.e. when two line labels appear to be at the same height), lines are numbered from left to right then top to bottom. For example, in the figure below, the labels of the cubic and linear lines are at the same height in the legend. The cubic line is further to the left, therefore it is line 1 while the linear line is line 2.
- If the figure lines are labeled directly in the chart instead of a legend, we use the labels within the chart to number the lines from top to bottom and in case of ties from left to right. In the figure below, for example, the lines are directly labeled with arrows in the chart. The
Optimal code
line's label is at the top of the chart, therefore it is numbered as line 1. TheExtrapolated performance of the decoder
line has the second highest label in the chart, therefore it is line 2 while theOuter bound
line is line 3 and theUncoded transmission
line is line 4.
- If the figure lines are labeled both directly in the chart and in a legend, the legend is used for ordering the lines. For instance, in the figure below, the lines are labeled both in the chart with arrows and in the legend. The line orders in the chart and the legend do not match. In the chart, the
approximated-MDP
line has the second highest label but in the legend, theLyapunov
line has the second highest label. The lines should be numbered based on the legend in these cases. As such, theLyapunov
line is line 2 while theapproximated-MDP
line is line 3.
LineCap is for non-commercial use only and is released under CC BY-NC-SA 4.0. You agree to the terms of this license by using this dataset.
The figure images in LineCap are from SciCap which is a dataset released under CC BY-NC-SA 4.0.