Our Vanderbilt University team uses DiscoverText (DT) to support qualitative text analysis of 8,531 high school students’ responses about their in-school experiences of bullying. DiscoverText has offered us powerful ways to perform key steps throughout our coding process. Fundamentally, DT supports parsing our large data set into archives, buckets, and datasets. Thus, we are able to focus on key portions of our large data set to hone our initial hierarchical coding structure while retaining the ability to return to an untouched dataset for final coding. We use the diverse annotation tools in DiscoverText to mark singular problematic items for discussion at meetings. Our team was able to develop a complex coding structure with 58 codes (at one point we had 128), and begin coding in a month and a half. Undoubtedly, DiscoverText’s robust organizational and annotation tools, within an easy-to-use user interface, supported expediency. Following the development of our coding structure we employed DiscoverText’s analytic tools to better understand and improve our team’s inter-coder reliability. DT’s real-time coding analytics supports decision-making in meetings. Through the use of these tools, we raised our coding reliability from a .2 Kappa value to a .82 Kappa value after five training rounds. Given that four coders are using 58 hierarchical codes to code over 8,000 free-response items, the numbers represent a phenomenal increase in reliability. Presently, we are half way through coding the 8,531 items using overlapping coding patterns to ensure reliability. Out team members share their experiences below: “I am currently working with a research team that must code students’ responses about their bullying experiences. I had never coded before and was introduced to DiscoverText only a few months ago. Fortunately, I have found DiscoverText to be very user-friendly and easy to navigate. Despite my lack of formal coding experience, I have found the program to run smoothly and have already learned a great deal in such a short period of time. My favorite feature thus far would have to be the code-by-code comparisons. This allows us to discuss any discrepancies among the research team and to increase our reliability. I have enjoyed exploring the features of this program and look forward to discovering what more it can do.” – Abbie, undergraduate, Human and Organizational Development, honors track. “My team is using DiscoverText to code thousands of brief responses to a survey question about bullying. As someone who is new to qualitative research and coding programs, I have found DiscoverText easy to use. The coding process was very easy for me to learn, and I quickly became efficient at coding responses. Our initial looks at code comparisons have been fairly straightforward for me to figure out as well. As we move forward with more analysis, I anticipate other functions and features of DiscoverText will be similarly straightforward, and I will see more of the power of the program.” – Brian, master’s student, Human Development Counseling. “I’m working with DiscoverText as part of an academic research team analyzing high school students’ qualitative responses to questions about bullying. As we have been coding responses, we have found the coding process fairly smooth, although not without a few features that we would have done differently. Still, the process of coding is similar to that of other qualitative coding software (I’ve used NVivo). We haven’t yet gotten into any sophisticated filtering or analysis, but I’m expecting that it will be really useful. The biggest impression I’m left with after my three months of using DiscoverText is that it’s a powerful tool, and we’ve only scratched the surface of what it can do.” – Ben, doctoral student, Community Research and Action. Overall, DiscoverText enabled our team’s timely progress through a complex research process. Following coding, we intend to make use of DT’s meta data “tagging” capabilities such that we can meaningfully export coded response summaries to their “tagged” respective schools. Finally, we intend to continue to explore the useful capabilities of DT in our research. We find DiscoverText easy-to-use and helpful – our questions have been kindly answered by the Textifier support team or solved through processing the helpful support material on DT’s support site! Thanks a lot DiscoverText! Joseph H. Gardella
Document relevance is a key challenge for social media research. The specific problem of “word sense disambiguation” is widespread. If I am interested in “banks” where money is stored, I want to exclude mentions of river banks. If I am “Delta” airlines, I do not want to see social data about Delta faucets, Delta force, or those pesky river deltas. If I run a sports team like the Pittsburgh Penguins, the massive numbers of Facebook posts and Tweets about flightless but adorable birds are equally problematic. There are very few social media analytics projects that can easily avoid the challenge of sorting relevant and irrelevant documents. At Texifter, we have refined a powerful set of tools and techniques for doing word sense disambiguation. This 5-minute video uses the example of Governor Chris Christie to illustrate how the five pillars of text analytics can help anyone to identify and remove irrelevant documents from an ambiguous social data collection. The principles are very similar to spam filtering in email; we use the same mathematics. Using DiscoverText, we argue an individual or small collaborative team can create a custom machine classifier for the task in just a few hours. Someday, we hope to get this down to a few minutes.
A brief follow up on Texifter. We successfully migrated DiscoverText to Microsoft’s Azure. It was very smooth, though we are going through a period of diminished search and filtering capabilities while the data re-indexes. Otherwise, the other capabilities appear stable. We also launched a new beta product on Azure to allow users to get free estimates (and buy the data) self-serve from the full history of Twitter. The live prototype is “Sifter” (https://sifter.texifter.com). Finally, I have been elected a board member and Treasurer for the Big Boulder Initiative (https://bigboulderconf.com/about/). In that capacity, I will be playing a role helping to organize the social data industry association that will launch in June at Big Boulder. 2014 is looking good for Texifter. On January 31, 2014, the company re-acquired all assets and intellectual property related to DiscoverText, including the Sifter stack of language technologies for de-duplication, clustering, coding, and machine-learning, as well as the “CoderRank” patent. Going forward, we believe these tools can make a significant impact on the history of information.
We could not be happier about the initial response to the beta test of “Sifter” (https://sifter.texifter.com), a self-serve tool to get free estimates of the cost to pull samples from the complete (un-deleted) history of Twitter. Using the powerful Gnip-enabled Power Track operators, we have a few hundred early adopters testing out rules that allow them to pull highly selective samples going back to the very first day of Twitter. For information on pricing to license the Twitter data, please visit: https://sifter.texifter.com/Home/Pricing.
Our new website is finally up. We’ve worked hard to get a beautiful new site ready and we’re proud to show it off. Thanks for reading our blog. We have lots of great blog posts in the works. Please check back or contact us now to find out how we can help you.