Hate and Aggression Analysis in NLP with Explainable AI
Abstract
Social platforms such as Twitter and Facebook have now become only media to express their thoughts, and due to lack of censorship, it often embellishes themselves as an abode for hate towards minorities. People of color, Asian people, Muslims, women, transgenders, and LGBTQ+ communities are often the target of such online hate and aggression. Though several companies have incorporated considerable algorithms on their platforms, nevertheless due to being rather hard to often detect such comments still make it to the platforms, creating a negative space towards targeted people. This research involves the study and comparison of different hate and aggression detection algorithms with intent on two languages, i.e. English and German including machine learning models (linear SVC, logistic regression, multinomial naive Bayes and random forests) with their variations with feature engineering and bag of words and deep learning (CNN-GRU static, TCN static, Seq2Seq) with their variations vis-à-vis Word2Vec embedding. CNN+GRU static + Word2Vec embedding has outperformed all the other techniques with an accuracy of 68.29%.
The given study involves racial slurs, aggravated and use of harmful words targeted especially towards women and people of color. However, given the nature of the study they cannot be overlooked. The paper is solely for research purposes and does not support hate and aggressive speech in any manner towards anyone.