Clustify groups related documents into labeled clusters, providing an overview of the document set and allowing the user to review and categorize related documents together for greater efficiency and consistency. The user chooses whether to group documents that are conceptually similar, near-duplicates, or elements of an email thread. Version 3.1 adds the option to automatically ignore email headers, footers, email addresses, and other clutter that can reduce the quality of the results.
Email can be problematic for text analytics because the substantive part of the email is often short, but it may be accompanied by a large amount of unimportant text in the headers and footers. The extraneous text can result in software choosing less informative labels for the clusters, and can cause emails to be grouped together because they share long disclaimers in the footers, rather than because they have something important in common. Replies in an email thread can have header and footer text from parent emails embedded in the middle of the body, making it difficult to identify and remove the clutter. Clustify 3.1 handles this automatically. It ignores the clutter when analyzing documents, and ghosts it when displaying the documents for the user, so a reviewer can see everything with the proper emphasis.
"We constantly push Clustify to produce better results with minimal effort by the user, and this is a solid step in that direction," says Bill Dimm, the CEO of Hot Neuron. "Version 3.1 makes it even easier to get good, clean results without a lot of tweaking or data polishing by the user."
About Hot Neuron
Hot Neuron LLC is an information retrieval software and services company located in Havertown, Pa. Its Clustify software (http://www.cluster-
Clustify is a trademark of Hot Neuron LLC. Hot Neuron is a registered service mark of Hot Neuron LLC.