We asked folks signing up for the 2nd GNIP beta using DiscoverText why they were doing it. Here is a nice Wordle showing some of the common themes: We also asked for job titles. No surprise the professors lead the w ay. The sign up remains open. Jump in and let us know if you like our Enterprise solution for social media analytics.
Texifter is launching a second beta test period using “Power Track for Twitter” fire hose filtering a service provided by GNIP. We have streamlined the process of providing Enterprise class access to the beta test. This beta includes access to an expanding set of tools for archiving, filtering, coding, validating and machine classifying text. You can train a custom machine classifier in about 30 minutes. Sign up for the beta test here. The GNIP Power Track, in partnership with Twitter, provides users with unrestricted, real-time filtering of the Twitter fire hose. This enriched feature for DiscoverText provides a valuable analytical tool to our users. Not only will the GNIP Power Track provide use rs with access to the full stream of fire hose data, it will also provide Klout scores, language data, re-tweet frequency, geographic coordinates, and all #hashtags where available in the results. Taken together, this quantity of data and rich metadata fields will allow users to perform valuable social media analysis within DiscoverText. For more information: info@DiscoverText.com
On November 9th, the Federal Emergency Management Agency(FEMA)conducted its first national test of the Emergency Alert System. In some communities this meant full involvement, with teams responding to mock emergencies, and managers monitoring the execution. In the deaf community, the response to monitor was regarding two Twitter hashtags, #SMEM, and #DEMX. The #SMEM hashtag is specific to the emergency response community, and was created over a year ago, and the #DEMX hastag is specific to the deaf community, but created specifically for this event. Monitoring the usage of these hashtags was Steph Jo Kent, a PhD. Candidate in Communications at the University of Massachusetts. Steph’s goal was to monitor the spread of these hashtags throughout the deaf community and emergency response community and how they crossed channels. In order to do this, she utilized DiscoverText, which is how I was lucky enough to become involved in the project. Monitoring these specific Tweets adds to the already diverse functionality of DiscoverText. To start the project, we simply used the Twitter API to harvest uses of #SMEM and #DEMX beginning on November 2. After the event on November 9, we continued to harvest uses of the hashtags. By early December, we had archived nearly 800 Tweets using the hashtag #DEMX, and nearly 8,000 Tweets using the hashtag #SMEM. From these two archives, it is possible to breakdown Tweets by time and person, giving us valuable information about key individuals and how they spread the hashtag. For Steph’s research, it was particularly valuable to isolate the crossover between the two hashtags. Using our search feature, we were able to isolate cases of crossover and bucket those results. This allows us to move from noisy data, to a more manageable and germane grouping of Tweets. From here, we utilized the newly optimized TopMetafeature to breakdown the occurrences by day and by user. We were able to discover which days and individuals produced the most Tweets. The information we found allowed us to better visualize how the Tweets broke down before and after the event. The results showed a small number of users producing the majority of Tweets, and that prior to the event, there was more usgage of the hashtags. Unfortunately, the mass crossover of Tweets that we had envisioned did not occur. There was a minimal amount of crossover, meaning the message did not travel well through the two communities. Steph has posted a detailed analysis of her findings on her blog, where she uses her expertise to analyze the project. In the future, this same methodology can be applied to hashtags that have been created for marketing or other purposes, such as hashtags for television shows and large events. There is valuable information in these hashtags; they reflect an emergent folksonomy that influences how ideas, links and memes spread over Twitter. Using the GNIP Power Track, these hashtags can be leveraged as metadata, broken down over time and used to display how well information did or did not travel. Overall, this was great experiment, and I am happy to have had the opportunity to collaborate with Steph, and to have participated in a project that has the power to influence the way social media is used to interact those in the deaf community.
Thanks to some excellent ground work by Joe Delfino and Sean Kelleher, Joe, Sean and I were able to make a pilgrimage to Google, Facebook and Reputation.com for a wildly exciting day of briefings with Q&A. While I’d love to share the details, I can’t! Big secret 😉 However, I can share a few pictures and stories from our day in Silicon Valley… Stu at Google – Take away message: “This was a great meeting!” Sean at Google – “I could move to California.” Joe at Google, after spending the week in the Bay Area attending the 2011 Sentiment Symposium, and the Text Analytics News Conference. “I am already (in my mind) living in California and running the west coast operation.” Stu and his well used Camaro. While running a bit behind schedule on the way to Reputation.com, it is alleged the driver took advantage of the fast moving California 101 freeway, the state’s liberal u-turn policy, certain optional passing strategies based on scenes from action and/or science fiction film, and his passengers stomachs. Joe at Facebook – Joe Delfino got us this meeting. Joe gets meetings. Joe is a meeting-getting animal. We like Joe. When my son saw this picture of his Dad at Facebook on Facebook, he said: “Wow Dad; you look really happy!” I sure was happy. We had come from Google feeling deeply engaged by one of the greatest companies in the history of capitalism and we were sitting in the lobby of another. We had lunch with a gracious host at the company cafeteria and a demo with a diverse group of Facebook sentiment analysts. After years of academic presentations, the freedom to present in jeans and a QDAP t-shirt was a perk that I could probably get used to. The meme ‘west coast office’ was heard frequently as we blazed out of Palo Alto and headed for Redwood City. After the long day in Silicon Valley, the team got stuck in 101 rush-hour traffic, slightly grouchy and despondent, but made it to a wonderful restaurant, Burma Superstar, in the Pacific Heights neighborhood for beer, food, and good company near a place where a Hobbit had been spied. By the time we had returned the Camaro and made it to the train to the SFO terminal for our red eye, we all realized the magnitude of the day we had. It was a huge lift for our confidence and an exciting glimpse into where Texifter is going. It is nearly certain that Texifter will be back on the West Coast soon.
DiscoverText is rolling-out an addition to its analytical toolkit: random sampling. The Web-service already offers an array of tools for text analytics and rigorous, team-based qualitative data analysis. These functions include the ability to code and annotate text, measure inter-rater reliability, adjudicate coder validity, attach memos to text, cluster duplicate and near-duplicate documents, share documents, and to classify text using an active-learning Naive-Bayesian classifier. While still in beta, random sampling is a key new addition. After DiscoverText users amass extraordinary amounts of social media data (for example via the Public Twitter API, the GNIP Powertrack, or the Facebook Social Graph), they can now more easily extract a random sample for analysis. The size of the sample is decided by the user in order to accommodate to iteration, experimentation and other scientific methods. The option is streamlined into the dataset creation process. On the new dataset creation page, you see a sample size prompt. This additional method for data prep and analysis augments current information retrieval techniques, such as search with advanced filtering. It also builds up our framework for expanding available NLP methods from straightforward Bayesian classification, which aims to analyze substantial quantities of data in their original bulk-form, to a menu of computationally intensive methods that can iterate more quickly and effectively against random data samples. For example, the LDA topic model tool we are releasing will be faster and more effective against smaller random samples. This new feature accommodates both an additional analytical approach as well as the opportunity to easily compare results between competing (or complimentary) analytic methods. We look forward to experimenting with this new tool and hearing about how random sampling will enhance the research of our users and users to come. Special Note to DT Users: We need to turn this feature on one account at a time while we are testing it. Drop us a line if you want to try the tool. We’ll keep you posted on the launch as more dataset modifications are pushed live. As always, if you have any questions, feel free to email us anytime at firstname.lastname@example.org. Your feedback is crucial. Sign up and try it out for yourself at discovertext.com.