-
Notifications
You must be signed in to change notification settings - Fork 4
Data Science Hypothesis
Tempting, as neural models benefit largely from large datasets. However, our model may become biased towards our labeller then, and our accuracies may not be founded in truth. The only way that this could work is if we ran the SOTA methods over our personally labelled dataset and compared the accuracies, which seems to be a lot more effort than it's worth.
After deliberation, no. Stance detection is important to fake news detection because there is a 'topic' to be agreed or disagreed on. However, in our domain of deceptive opinions on products and services, stance would more or less simply become sentiment. Therefore, the benefits would be diminished.
Before discovering the low-hanging fruit that is Generative Adversarial Networks, we considered the option of focusing on cross-domain adaptation, by use of review-reviewer embeddings. However, there has been multiple papers on this approach, and thus the possibility of finding novelty in a method of cross-domain adaptation is low comparative to General Adversarial Networks, where only one paper has been produced, and very promising at that.
We propose the use of general adversarial networks [1] following the FakeGAN architecture [2] to aid in the detection of deceptive opinion spam [3]. Using the most important review and reviewer-centric features [4, 11] combined with extensive feature importance selection that has been shown to increase accuracy [4, 5], word and feature embeddings [14, 15], dimensionality reduction [12, 13], and transfer learning from kernels in the fake news detection [6] and email spam detection domains [7], we can hope to build on the cutting edge accuracy with a novel approach.
-
Building on FakeGAN [2] and the findings of Ott in 2013 [8], two generators, corresponding positive and negative sentiment deception, could be used to slow down convergence of accuracy and thus gain some more points.
-
Stance detection [6, 9], a feature found to be important in the domain of fake news detection [9], could be incorporated into our feature set for training.
-
The use of gradient boosted decision trees [10] in place of CNN's in our GAN [1] architecture.
References:
[1] Goodfellow et. al, 2014: 'General Adversarial Nets'
[2] Aghakhani et. al, 2018: 'Detecting Deceptive Reviews using Generative Adversarial Networks'
[3] Jindal and Liu, 2008: 'Opinion Spam and Analysis'
[4] Crawford et. al, 2015: 'Survey of review spam detection using machine learning techniques'
[5] Mukherjee et al, 2013: 'What Yelp Fake Review Filter Might Be Doing?'
[6] Ågren, 2018: 'Combating Fake News with Stance Detection using Recurrent Neural Networks'
[8] Ott et al, 2013: 'Negative Deceptive Opinion Spam'
[10] Hazim et al, 2018: 'Detecting opinion spams through supervised boosting approach'
[11] Mukherjee et al, 2012: 'Spotting fake reviewer groups in consumer reviews'
[15] Wang et al, 2016: 'Learning to Represent Review with Tensor Decomposition for Spam Detection'
- ACLSW 2019
- Our datasets
- Experiment Results
- Research Analysis
- Hypothesis
- Machine Learning
- Deep Learning
- Paper Section Drafts
- Word Embeddings
- References/Resources
- Correspondence with H. Aghakhani
- The Gotcha! Collection