```{r model-setup, include=F, cache=F, message=F} knitr::opts_chunk$set(comment=NA, error=T, cache=F, message=F) library("litdata") library("dplyr") options(java.parameters="-Xmx1g") library("mallet") ``` We can create a topic model of the novels corpus using the same steps you've seen before. How many topics? Jockers suggests the same as the number of novels, for "purely pedagogical" (144) reasons. We could decide to vary this parameter and try again, but let's go with it: ```{r initialize-model} n_topics <- 43 # same as the number of files, per Jockers novel_model <- MalletLDA(n_topics) seed <- 42 novel_model$model$setRandomSeed(as.integer(seed)) novel_model$setAlphaOptimization(20, 50) ``` Assuming the instances file has been created, we can load it: ```{r load-instances} novel_model$loadDocuments("novels.mallet") ``` And run the model: ```{r train-model} n_iterations <- 500 novel_model$train(n_iterations) novel_model$maximize(10) ``` We must save the results: ```{r export-state} write_mallet_state(novel_model, "novels_model_state.gz") ``` If we saved no other files, at this point we would lose the chunk names and hence which "documents" go with which novel. So we should record that metadata as well. It is stored on the MALLET model object and can be retrieved and saved with: ```{r export-chunk-names} novel_model$getDocumentNames() %>% writeLines("chunk_names.txt") ``` This file is simply the chunk names we created, in the order known to MALLET. We will use this when we read in the sampling state and transform document numbers back into chunk names. We still haven't stored all the information we could. We should file away the estimated model hyperparameters as well, and possibly the entire model object too, as demonstrated on [Homework 10](http://rci.rutgers.edu/~ag978/litdata/hw10). But in this case we won't use those parameters, so I won't save them.