Research Article: 2022 Vol: 21 Issue: 4S
Pannee Suanpang, Suan Dusit University
Pitchaya Jamjuntr, King Mongkut’s University of Technology Thonburi
Phuripoj Kaewyong, Suan Dusit UniversityCitation Information: Suanpang, P., Jamjuntr, P., & Kaewyong, P. (2022). Tourism route optimization on malware detection with convolutional neural networks. Academy of Strategic Management Journal, 21(S4), 1-10.
Tourism Route, Optimization, Malware Detection, Convolutional Natural Network, Genetic Algorithm
Tourism route optimization will become a very substantial method of improving tourism planning for more appropriate matching with the travelers’ needs and budget especially in next normal tourism. This study aimed to support the tourist route planning of 50 scenic spots based on the researchers’ prototype of a recommendation system for considering the route’s distance, which was related to the transportation costs of the travel period. Optimal tourist routes in the tourist planning were suggested based on personal tourists’ level of satisfaction. The data for each location, distance, and transportation costs were collected. The obtained result was that the distance between the 50 attractions was 5272.51 kilometers when only considering the price. Moreover, this paper is linkage tourism route optimization on malware detection with convolutional neural network because of the increasing threats from malicious software in terms of both the numbers and complexity, security researchers have developed novel approaches for automatic detection and classification of malware instead of analyzing methods for malware files manually, which can be a time-consuming process. Furthermore, techniques were developed to avoid signature-based detection used by several antivirus companies. Therefore, deep learning techniques for malware classification were applied to identify the families of malware. In this paper, a convolutional neural network model was used for classifying static malware classification, and experiments were performed using the Malimg data set, which had malware images that were converted from portable executable malware binaries. The data set was divided into 25 malware families that had an accuracy of 96.46%.
Tourism is one of the world economy's most important industries, and will have much significance in the new normal in the post-COVID-19 pandemic period (Jermsittiparsert & Chankoson, 2019; Suanpang et al., 2021). With the advances of technology, digital disruption will transform the way of operating tourism businesses by using new technology to help support the business operations (Suanpang et al., 2021; Qian & Zhong, 2019). Currently, a recommendation system plays a huge role in tourism information, thus resulting in websites for providing tourism information services and developing various systems, which provide advantages for both tourists and entrepreneurs. As a consequence, tourists have more resources to search for tourism information, and entrepreneurs have the necessary tools to provide service information (Suanpang & Jamjuntr, 2021; Suanpang & Chunhapataragul, 2021; Xiang & Jianlong, 2016). However, it was discovered that the large amount of tourism information required tourists to consume a lot of time online. Therefore, the recommendation system would help to present tourism routes according to individual interests, which past researchers have already developed a recommendation system of personal travel information to be used to support tourists in planning their travel. Furthermore, the recommendation system would enable tourists to obtain products and services that would meet their needs with less time spent searching for that operator’s benefit from offering the right products and services (Qian & Zhong, 2019; Rui, 2012; Suanpang, Netwong & Chunhapataragul, 2021).
In addition to creating a marketing strategy to increase the number of tourists, this would increase the chances of the decisions for visiting to be much easier as well (Chienwattanasook & Jermsittiparsert, 2019). The analyzed tourism location of interest was stored in a user interest model and used in making recommendations in order to produce interesting tourism information. However, there was a great challenge for system developers, which was the insufficient information to draw conclusions about the interest for the recommendation system, which often occurred in systems that used attention-based analysis of data (Suanpang et al., 2021; Suanpang & Jamjuntr, 2021; Suanpang, Netwong & Chunhapataragul, 2021). Therefore, the researchers still continued to develop a recommendation system to provide personalized travel information to help present information about the location to match the interests of the users and present ideas for developing interest seeking techniques (Suanpang et al., 2021; Suanpang & Jamjuntr, 2021; Suanpang, Netwong & Chunhapataragul, 2021).
The global tourism development model has disintegrated due to the belief that resources in traditional scenic spots and scenic destinations extend to folk farming customs and industrial heritage, as well as social infrastructure (Rui, 2012; Xiang & Jianlong, 2016). However, with the rise in the number of available locations, it has become a difficult problem to solve for global tourism, which has made the study of related travel route planning methods the most recent research front in the tourism industry (Alhanjouri & Alfarra, 2011; Kenneth et al., 1992; Qian & Zhong, 2019; Xiong & Schneider, 1992).
Tourism route optimization is a noteworthy research topic, and many research methods for improving tourism route planning to better match with the travelers’ demands and cost-effectiveness have been proposed. Because tourist route optimization system is designed for use of any tourists from anywhere via internet in the result of the system is in the risk of malware infection from internet and also most commercial anti malware software is not effectively working in to protect the system from unknown malware therefore the development of anti-malware software by applying new method can increase malware protection of the system. Moreover, malware is unwanted software that has been created to destroy the system or interfere with the operation of the system. It is commonly known in various forms; such as, trojans, worms, etc. (Ronen et al., 2018). Additionally, it may be classified in many ways because the behavior of most malware has similar characteristics; such as, hiding from being detected and deactivating the security system (Suanpang et al., 2021).
In analyzing malware, classification of malware is important because in categorizing the various kinds would allow users to know how malware could contaminate personal computers, the risk level they pose, and how to prevent them. When malware is detected, it is assigned to the most appropriate family through a classification mechanism (Gibert et al., 2006). Likewise, there are numerous approaches for detecting malware in the wild; however, detecting malware is still a challenging task.
Therefore, the aim of this paper was to develop a prototype of a tourism application with a genetic algorithm to find a tourism route in gastronomy in Eastern Economic Corridor (ECC) and utilize a Convolution Neural Network (CNN). The study comprised two sections: tourism route optimization, and malware detection and classification. The concept of applying a genetic algorithm was to find a tourism route when visiting each city only once and returning nearby to the city of origin where the length of the tour was minimized.
A Genetic Algorithm (GA) is a reproduction process of the theory of evolution inspired by Darwin based on the natural selection genetic mechanism (Zhang et al., 2020). The chromosomes in the human body or all living things would evolve and have continuous inherited breeding, thus causing differences in appearance, skin, and character due to the chromosomes of the father's generation and the mother chromosomes being mixed together to create the offspring’s chromosomes. This containment of dominant genes and recessive genes is called selective breeding, and it is a method for the genetic process of living organisms (Chen et al., 2020; Zhang et al., 2020). The first simple GA version was proposed by Holland in 1975. The basic process of this algorithm was to simulate the natural evolution and search for an optimal solution. The natural selection process beginning with the program was used to input the problems’ characteristics as a set of parameters (variables). Each parameter was joined into a string to form a chromosome (solution). A set of individual chromosomes were called the initial populations that had to be solved by the GA. From the initial populations, the better chromosomes were generated by selection, crossover, and mutation. Finally, the best chromosome population that met the optimal target function was retained (Zhang et al., 2020). Recently, many scholars have applied the GA for the optimization of travel routes; such as, Ma (2016); Chen, et al., (2020); Zhang, et al., (2020). They employed the GA to identify the optimal travel routes that could match the demands of tourists. Moreover, Yu, et al., (2010) implemented the GA to transportation management.
Malware Detection with Convolutional Neural Networks
Several studies on malware classification have been performed using CNN architecture. Cui, et al., (2018) detected the code variants that were malicious after converting to grayscale images and used a simple CNN model. Kalash, et al., (2018) classified malware images by converting malware files into grayscale images by using two different data sets, Malimg (Nataraj et al., 2011) and Microsoft Windows Malicious Software Removal Tool (Ronen et al., 2018). They obtained 98.52% and 99.97% in accuracy, respectively. Yue (2017) also proposed a weighted softmax loss for CNNs for the classification of imbalanced malware images and achieved satisfactory classification results. Additionally, Gibert, et al., (2019) built a model consisting of three convolutional layers with one fully connected layer that was tested on two data sets, the Microsoft malware classification challenge data set and malimg data set, and proposed a malware classification model using a CNN that classified malware images. Their experiments were divided into two sets. The first set of experiments classified malware into nine families and obtained accuracy of 96.2% and 98.4%, respectively by considering the top-1 and top-2 ranked results. The second set of experiments classified malware into 27 families and obtained accuracy of 82.9% and 89% for the top-1 and top-2 ranked results, respectively. Moreover, Tobiyama, et al., (2016) proposed a malware process detection method by training a Recurrent Neural Network (RNN) to extract features of the process behavior, and then trained a CNN to classify the features extracted by the trained RNN. Vijayakumar, et al., (2019) also introduced a deep learning model based on the CNN and LSTM for malware family categorization. The experiments showed an accuracy of 96.3% on the Malimg data set. Furthermore, Su, et al., (2018) created one-channel grayscale images from executable binaries in two families and classified them into their related families using a lightweight CNN. They achieved accuracy of 94.0% and 81.8% for malware, respectively.
In this study, the tourist route planning of 50 scenic spots was created by the recommendation system. Next, the reproduction process of the GA to simulate the natural evolution and search for optimal tourist route planning to meet travelers’ demands were as follows:
Genetic Algorithm Processes
(1) Selection: The father's chromosome and the mother's chromosome also known as parents that were inherited could achieve satisfactory chromosome selection in order to accomplish the survival of life. According to Darwin's theory, this gave rise to many variations of choice. The most satisfying chromosome would lead to inheritance this led to many selection models producing the most satisfactory results; such as, the Roulette Wheel Selection, the Ranking Selection, Tournament Selection, Elitist Selection, and Steady-state Selection.
(2)Crossover: Is an important process. When a crossover occurs in genetics, this causes the change of diverse creatures that crossover is required as a long evolution of choice. The most suitable answer and the step for the crossover was to take two chromosomes and mix them together to get a new chromosome. Then, the easiest method was to randomize the position, conduct the crossover, and copy everything in front of the father's crossover location. After the mother's crossover position was combined, the first child would be born. Following this, everything would be copied that would be in front of the mother's crossover position and copied after the father's crossover position was combined to produce a child.
(3)Mutation: This occurred after the crossover. When completed, population randomization would be performed to change the outcome of the crossover, thus meaning the offspring at birth from the parental generation and then the offspring would carry out the next mutation process. The genetic mutation could provide new characteristics to the process of mutation once the position of mutation had been reached that would result in changes of the value at that random position. Mutation would be performed on the pattern with the result on a few pieces at random that would change the value from 1 to 0 or from 0 to 1.
(4) Parameter: Is a method used to create a number of chromosomes. A large number of chromosomes in each generation could make the GA slow down the process; such as, population size, crossover probability, or probability. The crossovers would have a 60%-95% probability of mutation or probability. Mutation would be set at 0%-1%, and a number of chromosomes would be used to form the next generation. The large number of chromosomes in each generation would then make the GA process be slower.
Steps of the Genetic Algorithm
The work can be described as follows: Step 1: start searching for the problems. Step 2: if the answer was still not found, but the specified number of rounds were reached, stop searching. Step 3: search until finding the target or the desired answer then stop searching. Step 4: find the answers that would begin to converge to be the most appropriate answer. For example, the answers obtained from each generation of population would not be changed or fixed, as consecutive numbers. In this paper, the researchers used a list of 51 tourism locations. The mathematical model for the tourism route problems is shown as follows:
Malware Detection with Convolutional Neural Networks
Malware visualization has been one of many research topics during the past few years. One of the proposed solutions has come from a study called Malware Images. For this current study, the researchers used a data set, which was downloaded from using the Malimg data set from Kaggle, which contained 9,339 malware images belonging to 25 families/classes. One of the malware data sets most often used to feed a CNN is the Malimg data set. These images are created by converting malware binary to a malware 8-bit vector and then a malware 8-bit vector to a grayscale image (Figure 1).
In the experiment, the researchers used 100 individuals of each generation, and a 1% mutation rate for a given gene through to 2,000 generations. Then, the progress of the shorter distances made by each generation on a timeline was tracked and the results were plotted. After 1,500 rounds of evaluation, it was found that the values were similar, and the shortest distance was found to be 5,272 kilometers.
Figure 2 show genetic algorithm process that is the relation between total distance (cost value) and each generation that is the total distance is not different after generation 1500. Figure 3 show tourism optimal route the 51 tourism locations which are:
(164,497),(146,465),(212,403),(291,384),(364,98),(342,81),(369,63),(452,113), (443,265),(457,324),(488,431),(497,453),(499,498),(366,408),(112,185),(107,192), (81,198),(65,377),(58,386),(51,389),(191,109),(244,6),(213,9),(186,68),(249,63), (324,168),(305,207),(204,384),(151,401),(128,404),(29,411),(38,391),(51,357), (332,31),(341,55),(405,42),(403,6),(475,76),(484,185),(422,351),(407,421), (329,161),(161,2),(124,11),(119,53),(35,41),(19,188),(157,432),(216,453),(334,498), (361,487).
The table 1 shows the 25 malware families with each malware image sample in the same malware family that are Adialer.C, Agent.FYI, Allaple.A, Allaple.L, Alueron.gen!, Autorun.K', C2LOP.P, C2LOP.gen!g, Dialplatform.B, Dontovo.A, Fakerean, Instantaccess, Lolyda.AA1, Lolyda.AA2, Lolyda.AA3, Lolyda.AT, Malex.gen!J, Obfuscator.AD, Rbot!gen, Skintrim.N, Swizzor.gen!E, Swizzor.gen!I, VB.AT. Wintrim.BX, Yuner.A which can be found at Malimg data set from Kaggle.
The researchers applied CNN to classify the testing image from the BIG 2015 data set for identifying the malware family after training. The CNNs were classified as deep learning, as they were different from the general machine learning where the user would have to extract characteristics manually before feeding as input to the learning neural network. Deep learning uses multiple hidden layers of Artificial Neural Networks (ANN) to increase the ability to solve complex problems. Most importantly, CNNs can think and better mimic the functions of the human brain.
Today, CNNs are often used to extract features from unconventional types of data. As such, they are not structured as a unique form (Unstructured Data) like digital images. Hence, there are three steps in the calculation according to the CNN architecture: (1) convolution stage, (2) detector stage, and (3) step pooling stage.
(1) Convolution Stage: The calculation in this step is based on the same principle as the calculation of spatial convolution (Spatial Convolution) used in the field of computing. The processing of digital images aims to calculate the spatial convolution with the image. Digital image is the extraction of features from digital input images. By calculating the convolution, the linear transformation of the input image corresponding to the spatial data from the filter (Filters) with the weight (Weight) of each layer would determine the details of the icon. Therefore, the convolution kernel could be trained or coached and based on the CNN input. The convolution process would start by defining the number of filters for use. For extracting the feature of an image, usually one filter could extract one feature by specifying the size of the sliding window or the size of the kernel (Kernel size) used for that filter. Furthermore, in this step, the spatial convolution between the filter and the input image using a sliding window technique or scanning input image would generate the feature map (Feature map) by configuring the strides so that the sliding window could be moved one position at a time to cover the entire input image.
(2) Detection Stage (Detector Stage): This step would serve to receive information from the process. The convolution stage would be converted into a non-linear form by using the excitation function; (Activation functions), such as Rectified Linear Units (ReLU) from the result of the convolution. Each position would be transformed with a non-linear transformation ReLU function for the efficiency of the results.
Tourism in the next normal is becoming more important in the post-COVID-19 era, especially the behavior of tourists that require certain travel planning to reach the destination with convenience, race, and safety. An important method for planning future travel is applying the theory of optimization to make the travel route recommended system save time and fuel with easy access to tourist attractions (Suanpang et al., 2021). This paper contributed to the optimization of tourism routes by choosing an evolutionary method of estimation, which was genetic algorithm to find the optimal gastronomy tourism route in EEC zone, Thailand for the prototype. The results showed that after 1,500 rounds of evaluation, the values were similar and the shortest tourism route was 5,272.51 kilometers, which related to the studies of Alhanjouri & Alfarra (2011); Xiang & Jianlong (2016); Kenneth, et al., (2019). Simultaneously, the security concern that the researchers had developed for malware detection and classification applied CNN. After training by using the Malimg data set, the model was tested by using the BIG 2015 data set, in which the result showed an accuracy of 96.46%. Automatic analysis, detection, and classification of malware were based on its visual representation that had several advantages over traditional signature-based antivirus software. One of the advantages of this method was that new variations of known malware families could be instantly detected. Thus, this method could prove valuable for the prototype, which could be a risk in the operation via online that could receive hundreds of malware daily and be associated with the result of the studies of Su, et al., (2018); Vijayakumar, et al., (2019). Finally, the future implementation of the prototype should be used to test and study by apply other type of algorithm to optimize the system and the implementation and security of the travel guide system for maximum efficiency.
This work was supported by Suan Dusit University, Thailand. This study was part of the research project no 65-FF-003 “Innovation of Smart Tourism to Promote Tourism in Suphan Buri Province”. It was funded by Suan Dusit University under the Ministry of Higher Education, Science, Research and Innovation, Thailand.
Alhanjouri, M., & Alfarra, B. (2011). Ant colony versus genetic algorithm based on travelling salesman problem. International Journal of Computer Technology, & Applications, 2, 570-578.
Chen, Y., Zheng, X., Fang, Z., Yu, Y., Kuang, Z., & Huang, Y. (2020). Research on optimization of tourism route based on genetic algorithm.Journal of Physics: Conference Series, 1575, 012027.
Chienwattanasook, K., & Jermsittiparsert, K. (2019). Factors affecting art museum visitors’ behavior: A study on key factors maximizing satisfaction, post-purchase intentions and commitment of visitors of art museums in Thailand. International Journal of Innovation, Creativity and Change, 6(2), 303-334.
Cui, Z., Xue, F., Cai, X., Cao, Y., Wang, G., & Chen, J. (2018). Detection of malicious code variants based on deep learning.IEEE Transactions on Industrial Informatics, 14(7), 3187–3196.
Gibert, D., Mateu, C., Planes, J., & Vicens, R. (2019). Using convolutional neural networks for classification of malware represented as images.Journal of Computer Virology and Hacking Techniques, 15(1), 15–28.
Jermsittiparsert, K., & Chankoson, T. (2019). Behavior of tourism industry under the situation of environmental threats and carbon emission: Time series analysis from Thailand.International Journal of Energy Economics and Policy, 9(6), 366-372.
Kalash, M., Rocha, M., Mohammed, N., Bruce, N., Wang, Y., & Iqbal, F. (2018). Malware classification with deep convolutional neural networks.In 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS), 1–5.
Ronen, R., Radu, M., Feuerstein, C., Yom Tov, E., & Ahmadi, M. (2018). Microsoft malware classification challenge. arXiv preprint arXiv:1802.10135.
Kenneth C., Gilbertf, R., & Hofstra, B. (1992). A new multiperiod multiple traveling salesman problem with heuristic and application to a scheduling problem.Decision Sciences, 23 (I), 250- 259.
Ma, X. (2016). Intelligent tourism route optimization method based on the improved genetic algorithm.2016International Conference on Smart Grid and Electrical Automation (ICSGEA), 124-127.
Nataraj, L., Karthikeyan, S., Jacob, G., & Manjunath, B. (2011). Malware images: Visualization and automatic classification.In Proceedings of the 8th International Symposium on Visualization for Cyber Security, 4.
Qian, X., & Zhong, X. (2019). Optimal individualized multimedia tourism route planning based on ant colony algorithms and large data hidden mining.Multimedia Tools and Applications, 78(15).
Rui, G. (2012). On the current situation and development trend of China's tourism industry. Tourism overview (industry Edition), 5, 82.
Seok, S., & Kim, H. (2016). Visualized malware classification based on convolutional neural network.Journal of The Korea Institute of Information Security & Cryptology, 26(1), 197–208.
Su, J., Danilo, V., Prasad, S., Daniele, S., Feng, Y., & Sakurai, K. (2018). Lightweight classification of IoT malware based on image recognition.In IEEE 2nd Annual Computer Software and Applications Conference, 2, 664–669.
Suanpang, P., & Jamjuntr, P. (2021a). A chatbot prototype by deep learning supporting tourism. Psychology and Education, 58(4), 1902-1911.
Suanpang, P., & Jamjuntr, P. (2021b). A comparative study of deep learning methods for time-Series forecasting tourism business recovery from the covid 19 pandemic crisis. Journal of Management Information and Decision Science, 24(S2), 1-10.
Suanpang, P., Netwong, T., & Chunhapataragul, T. (2021). Smart tourism destinations influence a tourist’s satisfaction and intention to revisit. Journal of Management Information and Decision Sciences, 24(S1), 1-10.
Suanpang, P., Sopha, C., Jakjarus, C., Leethong-in, P., Tahanklae, P., Panyavacharawongse, C., Phopun, N., & Prasertsut, N. (2021). Innovation for human capital development in the tour-ism and hospitality industry (Frist S- Curve) on the Eastern Economic Corridor (EEC) (Chon Buri - Rayong - Chanthaburi - Trat) to enrich international standards and prominence to High Value Services for stimulate Thailand to be Word Class Destination and support New Normal paradigm. Bangkok: Suan Dusit University, Thailand.
Tobiyama, S., Yamaguchi, Y.,Shimada, H., Ikuse, T., & Yagi, T. (2016). Malware detection with deep neural network using process behavior.In IEEE 40th Annual Computer Software and Applications Conference, 2, 577–582.
Vijayakumar, R., Mamoun, A., Soman, K., Poornachandran, P., & Venkatraman, S. (2019). Robust intelligent malware detection using deep learning.IEEE Access, 7, 46717–46738. Crossref,
Xiang, C., & Jianlong, Y. (2016). Research on the current situation of China's tourism development. Value engineering, 35 (06), 219-222.
Xiong, Y., & Schneider, J.B. (1992). Shortest path within polygon and best path around or through barriers.Journal of Urban Planning & Development, 118(2), 65- 79.
Yue, S. (2017). Imbalanced malware images classification: A CNN Based Approach. arXiv preprint arXiv:1708.08042, 2017.
Yu, B., Yang, Z., & Yao, J. (2010). Genetic algorithm for bus frequency optimization.Journal of Transportation Engineering, 136(6), 576-583.
Zhang, Y., Jiao, L., Yu, Z., Lin, Z., & Gan, M. (2020). A tourism route-planning approach based on comprehensive attractiveness.IEEE Access, 8, 39536-39547.
Received: 02-Jan-2022, Manuscript No. ASMJ-21-10048; Editor assigned: 04-Jan-2022, PreQC No. ASMJ-21-10048 (PQ); Reviewed: 09-Jan-2022, QC No. ASMJ-21-10048; Revised: 11-Jan-2022, Manuscript No. ASMJ-21-10048 (R); Published: 16-Jan-2022