Extracting information from twitter data to identify types of assistance for victims of natural disasters: An Indonesian case study

Nurdeni; D. A.; Budi; I.; & Yunita; A.

Journal of Management Information and Decision Sciences (Print ISSN: 1524-7252; Online ISSN: 1532-5806)

Abstract

Extracting information from twitter data to identify types of assistance for victims of natural disasters: An Indonesian case study

Author(s): Nurdeni, D. A., Budi, I., & Yunita, A.

The number of persons exposed to disaster risk in Indonesia, with a total potential life of 255 million people, is shown in the disaster risk research by the Indonesian National Board for Disaster Management. Furthermore, calamities significantly impact Indonesia, such as loss of life, property loss, and public facilities damage. To limit risks, the response system is critical, especially during the emergency reaction time. However, assisting catastrophe victims is impeded by several factors, including delays in assisting, a lack of information about the victims' whereabouts, and uneven aid distribution.

To provide fast and trustworthy information, several information systems were developed by the Indonesian National Board for Disaster Management, such as DIBI, InAware, Geospatial, Petabencana.id, and InaRisk. The existing systems, on the other hand, do not show the disaster site in real time, nor do they tell what kind of assistance the victim requires at any particular time. To solve these challenges, this study develops a model that can categorize text data from Twitter linked to the type of support needed by disaster victims in real-time. The location of the actual victim is also displayed on a dashboard in the form of a map-based application.

This research employs text mining techniques to analyze Twitter data using a multi-label classification strategy using the Stanford NER method to extract geographical information. Naive Bayes, Support Vector Machine, and Logistic Regression using OneVsRest, Binary Relevance, Power-set Label, and Classifier Chain are the algorithms employed. N-Grams with TF-IDF weighting are used to represent text. With 82 percent precision, 70 percent recall, and 75 percent F1-score, this study's best model for multi-label classification is a combination of Support Vector Machine and Classifier Chain with UniGram+BiGram features. For location categorization, which is the input for geocoding algorithms, Stanford NER produces an F1-score of 83 percent. In a map-based dashboard, geocoding results in the form of spatial information is displayed. The practical implications of this study are showing the best model for extracting information from citizens’ Twitter data what the suitable assistance for victims, and visualizing the real-time locations. This study can be helpful for several stakeholders, such as the government. The Indonesian government, particularly in natural disaster management, will become more efficient if the assistance delivered to victims is more accurate.

Share this

Get the App