Social Media Sentiment
Textual data from the main social media and forums in Hong Kong, including all original posts and replies, are extracted every day. Suicide-mentioned posts are being categorised by matching a list of ‘suicide-related’ keywords and phrases. The proportion of suicide mentioned posts out of total posts reflects the degree of suicide-related topics discussed online in Hong Kong.
The textual data are processed, and the sentiments of each post is computed using machine learning algorithms, and then categorised as ‘positive’, ‘neutral’ and ‘negative’. A post is categorised as ‘neutral’ if there is a tie in the number of positive and negative words. A post will not be rated if it is too short.
From all the ‘negative’ posts, the ‘anger’, ‘anxiety’, and ‘sadness’ emotions of each post are extracted by matching the widely adopted Linguistic Inquiry and Word Count (LIWC) dictionary (Chinese version). It is worth noting that the related LIWC lists of vocabulary have been revised to adopt the Hong Kong internet context, especially for the local cyber language, culture, and slangs to improve accuracy and sensitivity.
The sentiment line charts are under on-going development as natural language processing in Hong Kong sentiment analysis with local slangs and style is not yet mature. The sentiment line charts are recommended to only be used as a reference to reflect a partial view of Hong Kong.
Online social media data are supplied and scraped by: Meltwater