The evolution of the API opens the door for third-party developers to access information on social media networks. In the best case, this provides a healthy, democratic flow of information. Yesterday, DiscoverText had “rate limits” imposed in terms of its access to Twitter data. As written, the Twitter API allows unauthenticated calls of 150 per hour, per IP address. Authorized calls (users logged on using their Twitter credentials, also known as OAuth) allow for up to 350 calls per hour, per person. In addition, the Twitter Search API has internal rate limiting mechanisms, but Twitter does not publish those specific limitations for fear of abuse. Going over any of these limits results in the user being presented with “Error 420”, which simply means that the user is being rate limited. This hampers the ability to harvest twitter feeds within DiscoverText. We have never had rate limit problems prior to this, but according to timestamps on articles posted on Twitter’s developer website, Twitter might have become more cognizant of those harvesting large amounts of data (not just us), and as a result, are cracking down on heavy users. At Texifter, we fully respect the rules and regulations of the Twitter API, and in no way seek to disobey or bend these set rules in our flagship software product, DiscoverText. On August 18, 2011, the same day we learned of the 420 errors, we performed emergency maintenance to better cope with Twitter rate limitations. We also wanted to more gracefully handle rate limitation errors and to ensure we abide by Twitter Terms of Service. With that said, in order to continue our ability to harvest information from Twitter and perform our cutting-edge research, we are currently exploring easier and more reliable ways to harvest data. The maintenance performed on DiscoverText stills allow 1500 items per fetch as determined by Twitter’s architecture on the public API. In addition, no extraneous error messages should result when DiscoverText is being rate limited. Some searches might be silently delayed for 5 minutes, however, these fetches will catch up as soon as they can. In the near future, look for new developments for DiscoverText. We’ve got big plans for our social media API fetching that will greatly enhance our user’s ability to receive timely and actionable social media feeds. We don’t want to reveal too much right this moment, but we’re sure you’ll like what we have in store and in traditional Texifter style, we’ll plan a large announcement when the time is right.
In a recent series of recommendations, the Administrative Conference of the United States (ACUS), announced findings under the auspices of “Legal Considerations in e-Rulemaking,” from the Committee on Rulemaking. Having spent more than decade working on e-Rulemaking, I was curious to see what was at the top of their list. It was a relief to find that in the Final Recommendations, Item 1, Section A reads:
Consider whether, in light of their comment volume, they could save substantial time and effort by using reliable comment analysis software to organize and review public comments.
The ACUS report continues:
(1) While 5 U.S.C. § 553 requires agencies to consider all comments received, it does not require agencies to ensure that a person reads each one of multiple identical or nearly identical comments. (2) Agencies should also work together and with the eRulemaking program management office (PMO), to share experiences and best practices with regard to the use of such software. [emphasis added]
At Texifter, we know quite a bit about best practices for sorting duplicate and near duplicate public comments. We have supported and trained Public Comment Analysis Toolkit (PCAT) and DiscoverText users at the USDA, NOAA, FCC, NLRB, SBA, USFWS, and Treasury departments. Our duplicate detection and near-duplicate clustering saves agencies from the expense of manually sorting non-substantive modified form letters . DiscoverText is now used in Europe by aviation regulators. How did we get here? More than 300 agency officials attended workshops, focus groups and interviews over a 10-year period. Algorithms were developed and tested. Interfaces were designed, built, tested and re-built. Agencies shared millions of public comments and guided us as we tailored a system to work with the bulk downloads from their email servers and the Federal Docket Management System, which gathers the nation’s public comments at Regulations.gov. If “reliable comment analysis software” is needed, Texifter’s flagship product DiscoverText has to be considered a guiding light for some of the key ACUS findings.
Texifter manages the Coding Analysis Toolkit (CAT), which is a free, open source, Web-based and FISMA-compliant system launched in the fall of 2007 and hosted by the University of Pittsburgh. CAT is the precursor to PCAT and DiscoverText. This is a big day for the CAT team, as we are on the brink of recording the 1 millionth coding choice in the system. Why do people like to use this software? Certainly the price helps. However, over the years we have engineered CAT to make some of the most common coding and validation tasks easier. CAT uses a simple keystroke coding interface and predefined text spans to limit the pain caused by using a mouse. More important to regular CAT users are the on-board tools for easily calculating multi-coder reliability. CAT simplifies the process of assigning the same coding task to a group of coders who can code asynchronously via the Web. When the coding is done, it is a simple matter to generate table of rater reliability stats for better understanding how different coders use the various codes. When pre-testing a new coding scheme, this on-the-fly measure of reliability is a key learning and training tool we use in QDAP all the time. Probably the most important innovation introduced by CAT is the adjudication module. The 118,850 adjudication choices recorded to date by CAT users grew out of our practice of comparing multi-coder experiments pen and paper. Aside from using lots of paper in big experiments, we found ourselves with a time-consuming challenge to transform our validation choices back into the software we were using at the time. Validation in CAT allows an expert or consensus team to review coding choices one at a time and score them as valid or invalid. The system reports validity as a percentage by code, coder and project. This (often iterative) step is absolutely critical when training a coding team.