Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Learning author-topic models from text corpora
Rosen-Zvi M., Chemudugunta C., Griffiths T., Smyth P., Steyvers M. ACM Transactions on Information Systems28 (1):1-38,2010.Type:Article
Date Reviewed: Apr 30 2010

For large text corpora, the task of extracting and following information about topics, authors, and opinions is very challenging. Applications are numerous and relate to various domains, including social networks. The authors’ proposed model is a novel contribution to this research area. It is highly related to other probabilistic models, such as latent Dirichlet allocation (LDA) [1] and McCallum’s model [2].

In this paper, Rosen-Zvi et al. propose a new generative model for document collection. Their author-topic (AT) model differs from McCallum’s in the way that each author is associated with a distribution over topics. This approach leads to numerous applications such as word sense disambiguation and information retrieval (IR), which are described in detail. Although they present a well-grounded, detailed theoretical basis, the choice of fixing hyperparameters α and β could have been discussed in more depth. The paper lacks a formal and experimental comparison with a different type of approach, such as a graph-based one [3]. Also, the authors compare their approach with term frequency-inverse document frequency (tf-idf) as if it were an algorithm. In fact, tf-idf is a formula that (sometimes) gives a better representation of textual data, typically in an IR task. Hence, the comparison between AT models and tf-idf needs more in-depth investigation.

In summary, the authors present an interesting and well-grounded model. That being said, potential readers should be fairly familiar with Bayesian statistics.

Reviewer:  Julien Velcin Review #: CR137949 (1009-0947)
1) Blei, D.M.; Ng, A.Y.; Jordan, M.I.; , Latent Dirichlet allocation. Journal of Machine Learning Research 3, (2003), 993–1022.
2) McCallum, A. Multi-label text classification with a mixture model trained by EM. In AAAI Workshop on Text Learning, 1999.
3) Mei, Q.; Cai, D.; Zhang, D.; Zhai, C. Topic modeling with network regularization. In Proceedings of the 17th International Conference on World Wide Web ACM, 2008, 101–110.
Bookmark and Share
  Reviewer Selected
 
 
Markup Languages (I.7.2 ... )
 
 
Clustering (H.3.3 ... )
 
 
Text Analysis (I.2.7 ... )
 
 
Information Search And Retrieval (H.3.3 )
 
 
Natural Language Processing (I.2.7 )
 
Would you recommend this review?
yes
no
Other reviews under "Markup Languages": Date
XML: a manager’s guide
Dick K., Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 2002.  336, Type: Book (9780201770063)
Jan 13 2003
Localization in XML: identifying multiple criteria that drive effective VE system design
Savourel Y. Markup Languages 3(4): 387-393, 2001. Type: Article
Jul 23 2003
Learning XSLT
Fitzgerald M., O’Reilly & Associates, Inc., Sebastopol, CA, 2003. Type: Book (9780596003272)
Dec 20 2004
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy