You can find an explaination of Multimodal learning in videos, schemes, images, podcasts. This is multimodal learning, it consists of several techniques that are able to combine the input from different modalities (images, videos, audio, text, etc.) at the same time. Multi refers to the simultaneous process. Modal refers to the modalities (images, audios, videos, etc.). Learning refers to the automated process involved (machine or deep learning).
The main objectives of this thesis are:
- Define and explore the state of the art in Multimodal learning.
- Investigate the deep learning techniques that can be designed or adapted for a multimodal scenario.
- Simulate the process on existing multimodal benchmark dataset to prove the effectiveness of the designed methodology.
References 📚:
Hu, R., & Singh, A. (2021). Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer. arXiv preprint arXiv:2102.10772. article
Lu, J., Batra, D., Parikh, D., & Lee, S. (2019). Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. arXiv preprint arXiv:1908.02265. article
Lin, X., Bertasius, G., Wang, J., Chang, S. F., Parikh, D., & Torresani, L. (2021). Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs. arXiv preprint arXiv:2101.12059. article
Interesting projects 💻:
Additional Material:
- BERT Can See Out of the Box
- Learning Visiolinguistic Representations with ViLBERT w/ Stefan Lee - Podcast
- Vilbert talk demo
News events such as protests, accidents or natural disasters represent a unique information access problem where traditional approaches fail. For example, immediately after an event, the corpus may be sparsely populated with relevant content. Even when, after a few hours, relevant content becomes available, it is often inaccurate or highly redundant. At the same time, crisis events demonstrate a scenario where users urgently need information, especially if they are directly affected by the event. The goal of the TREC Temporal Summarization Track is to develop systems for efficiently monitoring the information associated with an event over time. source TREC Temporal Summarization Track
The main objectives of this thesis are:
- Define and explore the state of the art in Temporal Summarization.
- Design a new temporal summarization framework able to process data streams from different sources.
- Simulate the process on existing benchmark dataset
- Create a web-based dashboard to present live updates about ongoing events.
image by: https://www.sciencedirect.com/science/article/pii/S1474034619306007?via%3Dihub
The long document summarization task entails the parsing, analysis and shortening of long (patent/scientific) content. It can be addressed both using extractive and abstractive techniques both in supervised and unsupervised manner.
The main objectives of this thesis are:
- Analyze the state-of-the-art methodologies for long document summarization.
- Define a preprocessing pipeline to manipulate long content.
- Design a new patent summarization architecture and compare it with state of the art.
References 📚:
Sharma, E., Li, C., & Wang, L. (2019, July). BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 2204-2213).
Shi, T., Keneshloo, Y., Ramakrishnan, N., & Reddy, C. K. (2021). Neural abstractive text summarization with sequence-to-sequence models. ACM Transactions on Data Science, 2(1), 1-37.
Meng, R., Thaker, K., Zhang, L., Dong, Y., Yuan, X., Wang, T., & He, D. (2021). Bringing Structure into Summaries: a Faceted Summarization Dataset for Long Scientific Documents. arXiv preprint arXiv:2106.00130.
Kryściński, W., Rajani, N., Agarwal, D., Xiong, C., & Radev, D. (2021). BookSum: A Collection of Datasets for Long-form Narrative Summarization. arXiv preprint arXiv:2105.08209.
image by: https://github.com/soobinseo/cycle-gan
Sentiment analysis is one of the most important task in several NLP pipelines. It consists in the analysis of text for classifying its sentiment being both positive or negative. Generative models such as GPT3 open a large set of possibilities in this scenario. This master thesis will cover both generative language and sentiment analysis.
The main objectives of this thesis are:
- Define and explore the state of the art in Language Generation.
- Analyze state of the art methodologies in Sentiment Anaysis.
- Simulate an innovative pipeline on existing benchmark dataset.
References 📚:
Wang, H., & Zhai, C. (2017). Generative models for sentiment analysis and opinion mining. In A practical guide to sentiment analysis (pp. 107-134). Springer, Cham. article
Gupta, R. (2019, May). Data augmentation for low resource sentiment analysis using generative adversarial networks. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7380-7384). IEEE. article