Computing Reviews, the leading online review service for computing literature.

Computing Reviews

Today's Issue

Hot Topics

Browse

Recommended

My Account

Log In

Review

Help

Search

Learning author-topic models from text corpora
Rosen-Zvi M., Chemudugunta C., Griffiths T., Smyth P., Steyvers M. ACM Transactions on Information Systems28 (1):1-38,2010.Type:Article

Date Reviewed: Apr 30 2010

For large text corpora, the task of extracting and following information about topics, authors, and opinions is very challenging. Applications are numerous and relate to various domains, including social networks. The authors’ proposed model is a novel contribution to this research area. It is highly related to other probabilistic models, such as latent Dirichlet allocation (LDA) [1] and McCallum’s model [2]. In this paper, Rosen-Zvi et al. propose a new generative model for document collection. Their author-topic (AT) model differs from McCallum’s in the way that each author is associated with a distribution over topics. This approach leads to numerous applications such as word sense disambiguation and information retrieval (IR), which are described in detail. Although they present a well-grounded, detailed theoretical basis, the choice of fixing hyperparameters α and β could have been discussed in more depth. The paper lacks a formal and experimental comparison with a different type of approach, such as a graph-based one [3]. Also, the authors compare their approach with term frequency-inverse document frequency (tf-idf) as if it were an algorithm. In fact, tf-idf is a formula that (sometimes) gives a better representation of textual data, typically in an IR task. Hence, the comparison between AT models and tf-idf needs more in-depth investigation. In summary, the authors present an interesting and well-grounded model. That being said, potential readers should be fairly familiar with Bayesian statistics.

Reviewer: Julien Velcin	Review #: CR137949 (1009-0947)

1)	Blei, D.M.; Ng, A.Y.; Jordan, M.I.; , Latent Dirichlet allocation. Journal of Machine Learning Research 3, (2003), 993–1022.

2)	McCallum, A. Multi-label text classification with a mixture model trained by EM. In AAAI Workshop on Text Learning, 1999.

3)	Mei, Q.; Cai, D.; Zhang, D.; Zhai, C. Topic modeling with network regularization. In Proceedings of the 17th International Conference on World Wide Web ACM, 2008, 101–110.

Markup Languages (I.7.2 ... )

Clustering (H.3.3 ... )

Text Analysis (I.2.7 ... )

Information Search And Retrieval (H.3.3 )

Natural Language Processing (I.2.7 )

Would you recommend this review?

yes

no

Other reviews under "Markup Languages":	Date

XML: a manager’s guide Dick K., Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 2002. 336, Type: Book (9780201770063)	Jan 13 2003

Localization in XML: identifying multiple criteria that drive effective VE system design Savourel Y. Markup Languages 3(4): 387-393, 2001. Type: Article	Jul 23 2003

Learning XSLT Fitzgerald M., O’Reilly & Associates, Inc., Sebastopol, CA, 2003. Type: Book (9780596003272)	Dec 20 2004

more...

Tips

Help

Contact Us

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy