Abstract: This study introduces an annotated dataset for the study of antisemitic hate speech and attitudes towards Jewish people in Romanian, collected from social media. We performed two types of annotation: with three simple tags
(’Neutral’, ’Positive’, ’Negative’), and with five more refined tags (Neutral’, ’Ambiguous’, ’Jewish Community’, Solidarity’, ’Zionism’, ’Antisemitism’). We perform several experiments on this dataset: clusterization, automatic classification, using classical machine learning models and transformer-based models, and sentiment analysis. The three classes clusterization produced well grouped clusters, while, as expected, the five classes clusterization produced moderately overlapping groups, except for ’Antisemitism’, which is well away from the other four groups. We obtained a good F1-Score of 0.78 in the three classes classification task with Romanian BERT model and a moderate F1-score of 0.62 for the five classes classification task with a SVM model. The lowest negative
sentiment was contained in the ’Neuter’ class, while the highest was in ’Zionism’, and not in ’Antisemitism’, as expected. Also, the same ’Zionism’ category displays the highest level of positive sentiment.