Using SQUID to do TDD against request-limited web APIs

TL;DR: Go grab the conf file and play around with it.

I had an idea to take advantage of the Meetup API. According to the documentation they have all the data I need (no page scraping – yeah) One catch is a limit of 100 requests per hour.

The latest development fad is Test Driven Development. This means that you write automated tests before you write the ‘real’ code. It also means accumulating those tests as you go so you have some coverage against breaking something you had working earlier. However, running all-the-tests all-the-time sounds like it could chew through the API limit pretty quickly.

Based on that statement alone, the TDD faithful would perhaps bombard me with gentle reminders that I really ought to be mocking and stubbing everything so I’m not hitting the web service at all. I don’t disagree, but: 1. I have to develop the real API interface some time. 2. Mocking would basically involve manually hitting every URL I needed and regurgitating that data on demand. But that’s basically what an off-the-shelf HTTP cache does.

Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times by caching and reusing frequently-requested web pages. Squid has extensive access controls and makes a great server accelerator. It runs on most available operating systems, including Windows and is licensed under the GNU GPL.”

Squid looks like just the thing. All I should need to do is tweak a few parameters to make it remember API data forever, and I’ll be good to go. And, if I knew the first thing about AJAX and JSONP before I started, it might have been that simple.

Finding the basic parameter – ‘refresh_pattern’ wasn’t too hard. However, I quickly ran into some complications. The Meetup sample library (which used jQuery for the heavy lifting) was passing some dynamic big-number parameters in the query to prevent it from getting cached. Consider:

http;//api.meetup.com/members/?callback=jsonp1268480306079&_=1268480306105&relation=self&key=sekretgarbage&format=json

Squid, at least in the 2.x series, has a feature, storeurl_rewrite_program, which allows you to filter out such tags for caching purposes. After much trial and error, I actually got this part working. However, it took me a while to decide that it was actually working, because the test failed to retrieve any data.

After poking around in the source for a bit, and trying out some manual requests, it became apparent that the ‘callback’ parameter was actually part of the JSONP protocol. JSONP is a hack to get around the Same Origin Policy.

The Same Origin Policy is an early attempt to secure the web. In order to prevent a malicious site from grabbing data while you are logged into another account, the browser prevents almost any contact between a script and a domain other than the one from which was loaded. An easy win for security, and a huge roadblock for every developer of dynamic web sites.

One workaround is JSONP, which basically wraps the data response in some code, usually a single function call. This allows the data to be loaded via a script tag, which is not bound by the same-origin policy. Presumably, since JSONP requires the cooperation of the target server, it will only be used for non-sensitive data, so this method hasn’t had the same limits applied to it.

The important fact at the moment is that every API response is different – it had a gobbldygook generated function name around the data. I could make Squid give the same file back again and again, but that file would call a function from a previous life – which would either not exist or execute the wrong callback. It is, in short, violently uncacheable.

In this case, I was using a piece of code from Meetup that would basically be incorporated as source into my project, so I just took out the ‘callback’ parameter to get plain JSON. With that removed, I could actually take out the storeurl_rewrite_program.

Now that I’ve sat down to write this out, it has occurred to me that my test browser, Safri, is somehow ignoring the same origin policy, or it wouldn’t work at all.

I wrapped the conf file and a readme into a code repository for easy access. The readme has some more detail on the Squid settings, which I don’t feel the need to repeat again.

Posted Monday, May 24th, 2010 under Essay.

Comments are closed.