Very impressive work! (Feel free to discuss about the paper here!) #1

Sissuire · 2022-11-14T02:10:57Z

Always the quality is entangled by aesthetic and technical effects, especially for UGC videos. The idea is quite clear and the performance is good!

After reading the work, some confusions come to my mind, and I'd like to discuss them with all the friends interested in this topic.

Throughout the work, it seems that the disentanglement results in the great improvement. Here's the confusion why disentanglement could improve the performance. Can we just believe that entangled representations with both aesthetic and technical features could restrict the task, and the disentangled ones work better?
I've seen different network structures adopted in the work (e.g., inflated-ConvNext, Swin Trasformer). The most popular model in VQA I thought must be ResNet-50. So, how the different networks affect the performance? Has anyone conducted a detailed experiment on the different network structures?

teowu · 2022-11-14T02:45:57Z

Hi Yongxu, thanks for the impressive questions! These are good questions to discuss about~

For the first question, the disentanglement affects more like masked representation learning, and the two decomposed views can be regarded as appropriate masks for this task. Similar strategies are widely attempted in high-level tasks, and our design in UGC-VQA finds it also successful.
For the second question, yeah, we will try with more backbones! This was also suggested by my co-authors and we planned to release results for different backbones in this repo. Stay tuned!

Further discussions you might contact my Wechat: haoningnanyangtu (also open for all friends in this topic).

Sissuire · 2022-11-14T05:29:03Z

@teowu Thanks for your kindly repely. This is just an open discussion, and as far as you know, is there any literature claiming that disentanglement representations perform better than the entangled ones? (thus we must do representation entanglement)

teowu · 2022-11-14T05:56:48Z

"thus we must do representation entanglement" Not a must lah.
I think our main goal is to learn aesthetic and technical quality opinions from the overall one, and for improvement this is just one way among all: adding heavier branches without View Decomposition might also work, but is against our wish on more explainable representation learning.

About disentanglement will enhance representations, I quite think this is a common idea in higher-level tasks (our related works also cited some) Most most recently, I read a paper in this year's NeurIPS sharing similar ideas, but I cannot find it now...should I find it I will ping its link here..

teowu · 2022-11-14T05:58:16Z

BTW I like this discussion so I ping it here (as if I were on OpenReview for ICLR or NeuRIPS lol)

Sissuire · 2022-11-14T06:01:06Z

@teowu Great thanks :)

allexeyj · 2023-07-22T04:21:58Z

@Sissuire @teowu
I am glad that there is an opportunity to discuss this wonderful article. I did not find some things in the article, please help.

Question regarding finetuning DOVER for VQA datasets which has only overall video quality. Let's take KoNViD-1k as an example. If you look in meta-data, you can see that there is only the overall video quality (hereinafter Qo). There is no technical (hereinafter Qt ) or aesthetic (hereinafter Qa ) video quality. But if you look at the labels for this dataset, you can see that there are three values for prediction, it seems that these are Qa, Qt and Qo. How did the authors get Qa and Qt if there were only Qo in the original dataset? How data in labels.txt was obtained?

And i found that some labels.txt files has following structure: -1, -1, MOS(Qo). If i have no Qa and Qt, but have Qo, i can just set Qa and Qt to -1 in labels.txt?

teowu · 2023-07-27T06:21:52Z

Hi Alex, the Q_o is all where this is. The DIVIDE-3k database (the only database with Q_a and Q_t, as we proposed) will be released soon. In https://github.com/VQAssessment/DOVER/blob/master/examplar_data_labels/KoNViD/labels.txt, the second and third value are video length and framerate, which is deprecated in other datasets, so left with -1 as placeholders.

teowu pinned this issue Nov 14, 2022

teowu changed the title ~~Very impressive work!~~ Very impressive work! (Feel free to discuss about the paper here!) Nov 14, 2022

teowu added the good first issue Good for newcomers label Nov 14, 2022

teowu closed this as completed Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very impressive work! (Feel free to discuss about the paper here!) #1

Very impressive work! (Feel free to discuss about the paper here!) #1

Sissuire commented Nov 14, 2022

teowu commented Nov 14, 2022 •

edited

Loading

Sissuire commented Nov 14, 2022

teowu commented Nov 14, 2022

teowu commented Nov 14, 2022

Sissuire commented Nov 14, 2022

allexeyj commented Jul 22, 2023 •

edited

Loading

teowu commented Jul 27, 2023

Very impressive work! (Feel free to discuss about the paper here!) #1

Very impressive work! (Feel free to discuss about the paper here!) #1

Comments

Sissuire commented Nov 14, 2022

teowu commented Nov 14, 2022 • edited Loading

Sissuire commented Nov 14, 2022

teowu commented Nov 14, 2022

teowu commented Nov 14, 2022

Sissuire commented Nov 14, 2022

allexeyj commented Jul 22, 2023 • edited Loading

teowu commented Jul 27, 2023

teowu commented Nov 14, 2022 •

edited

Loading

allexeyj commented Jul 22, 2023 •

edited

Loading