LLMs vs Tagmatic for Automated Text Classification

What's this comparison about?

We compared chatGPT performance with Tagmatic
We checked the important auto-tagging capabilities
We used some simple benchmarking tests
We repeated the tests with Bard
The results were illuminating

LLM and Tagmatic Comparison - summer 2023

LLMs such as chatGPT and Bard have impressive capabilities in many areas. This comparison includes the essential auto-tagging capabilities relied-upon by broadcasters, newsrooms, newspapers, magazines, and B2B publishers globally.

	LLMs	Tagmatic
Does it tell us what the text is about? Given a body of text, does the system respond with what that text is about?
Does it automatically predict tags using in-house tags (1000s in multiple classes)? Each business has its own custom vocabularies to maximize value and to focus on specialized domains. Is the system able to use this custom vocabulary for auto-tagging?
Does it learn new tags on-the-fly from an ever-changing vocabulary? As journalists, writers, and editors create new tags around their subject matter, is the system able to immediately start training in order to accurately predict these new tags in future?
Does it continuously monitor and optimize each tag’s accuracy? Based on a feedback loop from subject matter experts, is the system able to continuously optimizes each tag individually?
Does it follow and encourage a “house style” for tagging? Based on tagging activity fed-back into the system from editorial users, is the system able to supply tag predictions based on the learned "house style"?
Does it deliver accurate predictions consistently? Is the system able to reliably apply the same predictions each time it is asked the same question (assuming the relevant training corpora have not changed)?
Does it demonstrate explainability for each tag prediction? Is the system able to provide for each tag prediction the confidence, the last date-time the tag was optimized, and the details of the related training corpus source and size?
Does it produce trustworthy results? Is the system able to provide results that are free of incorrect or fictitious information (also known as “AI hallucinations”)?

About this feature comparison

This comparison was created from the outcomes of a Data Language "LLMs and Knowledge Graphs" hack event in April 2023.

The hack event looked at challenges getting information in to a knowledge graph, and getting intelligence out of a knowledge graph. chatGPT was very useful at filling in missing properties and missing concepts in a knowledge graph, and especially good at information summarization.

However, the Data Language teams had trouble keeping chatGPT "on the rails" when it came to executing reliable, structured tasks. Automated content classification is one such task that requires control, precision, explainability, and repeatability!

We expect that LLMs will become more controllable in terms of "AI hallucinations".

We will keep tabs on these capabilities, and update this comparison accordingly.