There are several topics that appear in the topicDictionary.txt file which are not associated with any articles in training data. For example, tunisiaattack2015, turkeycoupattempt, sanbernardinoshooting, parisattacks, orlandoterrorattack.
Not surprising when these relate to events that occurred after the period covered by training data!
Can you confirm if unseen tags like these will occur in the test data and form part of accuracy score calculations?
If so, any thoughts to share on applying a supervised learning approach in this setting?