About our text analysis data science software

Collaborative text analytics for human and machine-learning

We provide dozens of multilingual, text mining, data science, human annotation, and machine-learning features. DiscoverText offers a range of simple to advanced cloud-based software tools empowering users to quickly and accurately evaluate large amounts of text data. Our customers sort unstructured free text common in market research,  as well as associated metadata, also found in customer feedback platforms, CRMs, chats, email, large scale HR or other surveys, public comment to government agencies, Twitter, RSS feeds, and other forms of text data. Find out why we are ranked #1 for text, metadata, and social network analysis support and trusted by hundreds of academic research groups

View our Scholarly and Published Mentions of DiscoverText

Collect, clean, and analyze text data

Unstructured text data is messy

Data scientists working on text analytics know cleaning data can be time consuming. Users of DiscoverText build reusable custom machine classifiers or “sifters”  to find the most (or least) relevant items before using other classifiers for sorting items into topic, sentiment, and other categories. DiscoverText combines hybrid data science methods (measurement, adjucation, iteration, replication) along with established e-discovery text analytics tools, to shorten a process that used to last weeks or months when words get sorted in spreadsheets. Our machine-learning sifters are created in hours or just a few minutes using crowdsourcing. We offer an API and support technical integrations with Twitter and SurveyMonkey. Academics trust DiscoverText to help them do better, more transparent research, resulting in more scholarly publications. Legal teams use our document redaction capability to remove names, metadata, email addresses, and other sensitive information to produce Bates-stamped and spreadsheet-indexed PDF collections.

Humans and machines classify text

Point-and-click software anyone can master

Humans are good at some things and computers are good at others. A consistent back and forth between humans and machines increases the ability of both to learn. Our text analytics software and data science methods originate in a decade of National Science Foundation-funded research into the measurements that accelerate machine-learning. Text classification is an old, hard problem, according to no less than Plato. Our proven method of adjudication creates gold standard training sets for machine-learning by ranking human annotators over time. Our patented CoderRank approach is critical for ensuring accurate, reliable results when the work of humans or machines is finally evaluated. 

eDiscovery tools that work

Advanced search and sampling techniques

Deduplication and automated clustering of near-duplicates gives users a high level sense of the data landscape. With Twitter data, these groupings are a roadmap to the digital footprint of viral Tweets. With public comment data, these groupings are form letters and modified forms. In large-scale surveys, duplicates and near duplicates are frequently held but independently expressed opinions among customers or employees. Our interactive machine classifier histograms allow data science teams to identify the items in a collection that add the most value when coded by humans. These text analytics tools enable purposive sampling that further accelerates the process of training machine classifiers.