Chat bots target popular chat networks to distribute spam and malware. In this paper, we first conduct a series of measurements on a large commercial chat network. Our measurements capture a total of 14 different types of chat bots ranging from simple to advanced. Moreover, we observe that human behavior is more complex than bot behavior.


Although IRC has existed for a long time, it has not gained mainstream popularity. The first-generation chat bots were deed to help operate chat rooms, or to entertain chat users, e.

Our experimental evaluation shows that the proposed classification system is highly effective in differentiating bots from humans. We observed four basic text obfuscation methods that chat bots use to evade filtering or detection.

Section 3 details our measurements of chat bots and humans. The effective detection system against chat bots is in great demand boy still missing.

Measurement and classification of humans and bots in internet chat

There are two main components in our classification system: 1 an entropy classifier and 2 a machine-learning classifier. The focus of our measurements is mainly on short term statistics, as these statistics are most likely to be useful in chat bot classification.

At the same time, Yahoo! However, the large user base and open nature of Internet chat make it an ideal target for malicious exploitation. Chay remainder of this paper is structured as follows. The different types of chat bots are determined by their triggering mechanisms and text obfuscation schemes.

However, the usage and behavior of bots in botnets are quite different from those of chat bots. The November worms attempted to send malicious links but were blocked by Yahoo! In the paper, we first perform a series of measurements on a large commercial chat network, Yahoo! By early October, chat bots were found in Yahoo! There are individual chat logs from 21 different chat rooms.

In hcat, some new features cuat make the IM systems more user-friendly have been back-ported to the chat systems.

For example, IRC, a classic chat system, implements a of IM-like features, such as presence and file transfers, in its current versions. By this method, a template with several synonyms for multiple words can lead to thousands of possible messages. Chat bots employ many text obfuscation techniques used by spam such as word padding and synonym substitution. Mostly insults and trolling, with lots of bad spelling and bad grammar.

Internet chat is also a unique networked application, because of its human-to-human interaction and low bandwidth consumption [ 9 ]. Based on the bog study, we propose a classification system to accurately distinguish chat bots from humans.

To create such datasets, we perform log-based classification by reading and labeling a large of chat logs. The former hot to message timing, and the latter relates to message content. Our measurements capture a total of 14 different types of chat bots ranging from simple to advanced. A few countermeasures have been used to defend against the abuse of chat bots, though none of them are very effective.

However, their evaluation is based on a chag of short e-mail spam messages, due to the lack of data on spim. The users connect to a chat server via chat clients that support a certain chat protocol, and they may browse and many chat rooms featuring a variety of topics.

Among chat bots, we further divide them into four different groups: periodic bots, random bots, responder bots, and replay bots. A response-based chatt sends messages based on programmed responses to specific content in messages posted by other users.

In addition, our examiner checks the content of URLs and typically observes multiple instances of the same chat bot, which further improve our classification accuracy. Our logs also include some examples of malware spreading via chat rooms. While the entropy classifier requires more messages for detection and, thus, is slower, it is more accurate to detect unknown chat bots.

While on-line systems are besieged with chat bots, no systematic investigation on chat bots has been conducted. Chat spam shares some similarities with spam. Our study is carried out on the Yahoo! In contrast, the machine-learning classifier is mainly based on message content for detection.

Although we did not perform detailed malware analysis on links posted in the chat rooms and Yahoo! The bots in botnets are malicious programs deed specifically to run on compromised hosts on the Internet, and they are used as platforms to launch a variety of illicit and criminal activities such as credential theft, phishing, distributed denial-of-service attacks, etc. To the best of our knowledge, we are the first in the large scale measurement and classification of chat bots.

Fourth, and most interestingly, chat bots replay human phrases entered by other chat users.

