About our text analysis data science software
Collaborative text analytics for human and machine-learning
We provide dozens of multilingual, text mining, data science, human annotation, and machine-learning features. DiscoverText offers a range of simple to advanced cloud-based software tools empowering users to quickly and accurately evaluate large amounts of text data. Our users work via a point and click graphical user interface in web browsers to sort unstructured free text common in market research, as well as associated metadata, also found in customer feedback platforms, CRMs, chats, email, large scale HR or other open-ended answers on surveys, public comment to government agencies, Twitter, RSS feeds, and other forms of text data. DiscoverText is GSA-approved small business. Students and professors get free access, training, and project support directly from the founder. Read more than 100 authenticated Capterra reviews to find out why we are ranked #1 by Predictive Analytics Today for text, metadata, and Twitter data analysis and trusted by hundreds of academic research groups.
Collect, clean, and analyze text data
Unstructured text data is messy
Data scientists working on text analytics and machine-learning know cleaning data can be time consuming. Users of DiscoverText build reusable custom machine classifiers or “sifters” to find the most (or least) relevant items before using other classifiers for sorting items into topic, sentiment, and other categories. DiscoverText combines hybrid data science methods (ex., crowdsourcing, measurement, adjudication, iteration, replication, annotator ranking) along with established e-discovery text analytics tools, to shorten a process that used to last weeks or months when words get sorted in spreadsheets. Our machine-learning sifters are created in hours or just a few minutes using crowdsourcing. We offer an API and support technical integration with Twitter. Academics trust DiscoverText to help them do better and more transparent scientific research resulting in scholarly publications. Legal teams use our document redaction capability to remove names, metadata, email addresses, and other sensitive information to produce Bates-stamped and spreadsheet-indexed PDF collections.
Humans and machines classify text
Point-and-click software anyone can master
Humans are good at some things and computers are good at others. A consistent back and forth between humans and machines increases the ability of both to learn. Our text analytics software and data science methods originate in a decade of National Science Foundation-funded research into the measurements that accelerate machine-learning. Text classification is an old, hard problem, according to no less than Plato. Our unique and proven method of adjudication creates gold standard training sets for machine-learning by ranking human annotators over time. A patented CoderRank approach is critical for ensuring accurate, reliable results when the work of humans or machines is finally evaluated. DiscoverText machine-learning is powered by uClassify.
eDiscovery tools that work
Advanced search and sampling techniques
Deduplication and automated clustering of near-duplicates gives users a high level sense of the data landscape. With Twitter data, these groupings are a roadmap to the digital footprint of viral Tweets. With public comment data, these groupings are form letters and modified forms. In large-scale surveys, duplicates and near duplicates are frequently held but independently expressed opinions among customers or employees. Our interactive machine classifier histograms allow data science teams to identify the items in a collection that add the most value when coded by humans. These text analytics tools enable purposive sampling that further accelerates the process of training machine classifiers.