
首页 / 新闻公告 / 中心新闻 /


学术 | 我中心研究员吕晓玲、王菲菲就文本主题的变点监测问题在《DATA MIN KNOWL DISC》发表论文


我中心研究员吕晓玲、王菲菲在《Data Mining and Knowledge Discovery》发表论文。该研究主要探讨了动态文本数据的主题挖掘问题,研究中提出了一种全新的Topic-CD模型,该模型融合了传统的主题建模和变点挖掘,可以用于文本主题的变点监测,从而帮助使用者更好的理解动态文本中主题的变化规律。


Topic Change-point Detection Using a Mixed Bayesian Model


Dynamic text documents, including news articles, user reviews, and blogs, are now commonly encountered in many fields. Accordingly, the topics underlying text streams also change over time. To grasp the topic changes in the increasing accumulation of text documents, there is a great need to develop automatic text analysis models to find the key changes in topics. To this end, this study proposes a topic change-point detection (Topic-CD) model. Different from previous studies, we define the change point of topics from the perspective of hyperparameters associated with topic-word distributions. This allows the model to detect change points underlying the whole topic set. Under this definition, the topic modeling and change point detection are combined in a unified framework and then performed simultaneously using a Markov chain Monte Carlo algorithm. In addition, the Topic-CD model is free from setting the number of change points in advance, which makes it more convenient for practical use. We investigate the performance of the Topic-CD model numerically using synthetic data and three real datasets. The results show that the Topic-CD model identifies the change points in topics well when compared with several state-of-the-art methods.





王菲菲,南粤风采26选5开奖统计学院副教授,研究上关注文本挖掘及其商业应用、社交网络分析、大数据建模等,研究论文发表于Journal of Econometric, Journal of Business and Econometric Statistics, Journal of Machine Learning Research, 中国科学(数学)等国内外高水平期刊上。主持并参与了国家自科基金项目、教育部社科重大项目、国家重点研发项目等多个课题。曾获南粤风采26选5开奖教师青年基本功大赛二等奖和线上教学优秀奖。
