10 topics was a close second in terms of coherence score (.432) so you can see that that could have also been selected with a different set of parameters. The real test is going through the topics yourself to make sure they make sense for the articles. The reuters package is a set of reuters articles on 10 different commodities. Overall this is a decent score but I’m not too concerned with the actual value. Here we use the reuters dataset from the textanalysis package as a larger corpus helps to better demonstrate. In the real world you will likely use the map_* functions to run and assess multiple models at once then assess which is best using the perplexity score. 48-item CTI total score measures a global factor of dysfunctional thinking per. # compute topic coherence model_collection # A tibble: 2 x 3 #> num_topics coherence coherence_model #> #> 1 2 -14.7 #> 2 10 -14.7 The relationship between sense of coherence and negative career thoughts. # create a model collection models ℹ A collection of 2 models. You can also apply the model_coherence to multiple models at once using map_coherence. Hence this coherence measure can be used to compare difference topic models based on their human-interpretability. The u_mass and c_v topic coherences capture this wonderfully by giving the interpretability of these topics a number as we can see above. The bad_lda_model however fails to decipher between these two topics and comes up with topics which are not clear to a human. This is because, simply, the good LDA model usually comes up with better topics that are more human interpretable. Hence as we can see, the u_mass and c_v coherence for the good LDA model is much more (better) than that for the bad LDA model.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |