Conversation Thread Extraction And Topic Detection In Text-Based Chat
Published 2010 · Computer Science
Abstract : Text-based chat systems are widely used within the Department of Defense, but the standard systems available do not provide robust capabilities for search, information retrieval, or information assurance. The objective of this research is to explore methods for the extraction of conversation threads from text-based chat systems in order to enable such tasks. As part of the research, we manually annotated over 20,000 Internet Relay Chat posts with conversation thread information and constructed a probabilistic model for automatically classifying posts according to conversation thread. We also provide an algorithm for extracting these conversation threads from the chat session in order to form discrete documents that may be used in a vector space model information retrieval system. We elaborate how this technique can be used to support search and data mining systems, as well as auditing tasks and guard functions in a security system. Using the developed probabilistic models, we have achieved classification results on par with those of human annotators.