Ben Buckman's Tech Blog brought to you by New Leaf Digital Discoveries in the midst of my digital life en Node.js and Arduino: Building the Nodeuino example <p>I've wanted to start building robots with an <a href="">Arduino</a> for years. I've had several fits and starts - building the examples in the <a href=""><em>Getting Started with Arduino</em> book</a>, starting to learn basic electronics from <a href=""><em>Make: Electronics</em></a>. But I never put enough time into it to get much further.</p> <p>I even included my electronics kit in the limited space we had <a href="">moving to Argentina</a>. But it mostly sat on a shelf and got dusty.</p> <p>Until I started seeing all the buzz around Node.js and Arduino - <a href="">nodeuino</a>, <a href="">node.js-powered robots at nodeconf</a>, <a href="">Rick Waldron's</a> awesome <a href="">johnny-five project</a>&#8230; and realized I'm way too late to the party and need to get moving.</p> <p>So I made it my weekend project yesterday, and built the walkLED example in the <a href="">Nodeuino documentation</a>. It uses 6 LEDs and 2 switches, I only have 4 LEDs and 1 switch that doesn't fit on a breadboard, so I simplified it. Here's a little video showing it working. It pans from the circuit diagram in the nodeuino docs, to the circuit in the <a href="">Fritzing</a> circuit-diagramming app, to my circuit built on the breadboard, to the terminal showing the node.js app, to the speed being controlled by the potentiometer.</p> <iframe width="560" height="315" src="" frameborder="0" allowfullscreen></iframe> <p>None of this is original yet - it's all from the example circuit and code created by <a href="">Sebastian Müller</a> (<a href=""></a>). My task now is to understand that code and build on it. Then I want to build a circuit that reads some public Twitter feed and shows the number of new tweets on a digital LCD display.</p> arduino node.js Sun, 08 Jul 2012 15:12:15 +0000 ben 7923 at Understanding MapReduce in MongoDB, with Node.js, PHP (and Drupal) <p><em>(Note, some of the details in this post may be outdated. For example, MongoDB now supports <a href="">Aggregation</a> queries which simplify some of these use cases. However, the MapReduce concepts are probably still the same.)</em></p> <p>MongoDB's <a href="">query language</a> is good at extracting whole documents or whole elements of a document, but on its own it can't <a href="">pull specific items</a> from deeply embedded arrays, or calculate relationships between data points, or calculate aggregates. To do that, MongoDB uses an <a href="">implementation</a> of the <a href="">MapReduce</a> methodology to iterate over the dataset and extract the desired data points. Unlike SQL <code>joins</code> in relational databases, which essentially create a massive combined dataset and then extract pieces of it, MapReduce <em>iterates</em> over each document in the set, "reducing" the data piecemeal to the desired results. The <a href="">name</a> was popularized by Google, which needed to scale beyond SQL to index the web. Imagine trying to build the data structure for Facebook, with near-instantaneous calculation of the significance of every friend's friend's friend's posts, with SQL, and you see why MapReduce makes sense.</p> <p>I've been using MongoDB for two years, but only in the last few months starting using MapReduce heavily. MongoDB is also introducing a new <a href="">Aggregation</a> framework in 2.1 that is supposed to simplify many operations that previously needed MapReduce. However, the latest <a href="">stable release</a> as of this writing is still 2.0.6, so Aggregation isn't officially ready for prime time (and I haven't used it yet).</p> <p>This post is not meant to substitute the copious <a href="">documentation</a> and examples you can find across the web. After reading those, it still took me some time to wrap my head around the concepts, so I want to try to explain those as I came to understand them.</p> <h2>The Steps</h2> <p>A MapReduce operation consists of a <code>map</code>, a <code>reduce</code>, and optionally a <code>finalize</code> function. Key to understanding MapReduce is understanding what each of these functions iterates over.</p> <h3>Map</h3> <p>First, <code>map</code> runs for every document retrieved in the initial query passed to the operation. If you have 1000 documents and pass an empty query object, it will run 1000 times.</p> <p>Inside your <code>map</code> function, you <code>emit</code> a key-value pair, where the key is whatever you want to group by (_id, author, category, etc), and the value contains whatever pieces of the document you want to pass along. The function doesn't <code>return</code> anything, because you can <code>emit</code> multiple key-values per <code>map</code>, but a function can only <code>return</code> 1 result.</p> <p>The purpose of <code>map</code> is to extract small pieces of data from each document. For example, if you're counting articles per author, you could emit the author as the key and the number 1 as the value, to be summed in the next step.</p> <h3>Reduce</h3> <p>The <code>reduce</code> function then receives each of these key-value(s) pairs, for each key emitted from <code>map</code>, with the values in an array. Its purpose is to reduce multiple values-per-key to a single value-per-key. At the end of each iteration of your <code>reduce</code> function, you <code>return</code> (not <code>emit</code> this time) a single variable.</p> <p>The number of times <code>reduce</code> runs for a given operation isn't easy to predict. (I asked about it on <a href="">Stack Overflow</a> and the consensus so far is, there's no simple formula.) Essentially <code>reduce</code> runs as many times as it needs to, until each key appears only once. If you emit each key only once, reduce never runs. If you emit most keys once but one special key twice, reduce will run <em>once</em>, getting <code>(special key, [ value, value ])</code>.</p> <p>A rule of thumb with <code>reduce</code> is that the returned value's structure has to be the same as the structure emitted from <code>map</code>. If you emit an object as the value from <code>map</code>, every key in that object has to be present in the object returned from <code>reduce</code>, and vice-versa. If you return an integer from <code>map</code>, return an integer from <code>reduce</code>, and so on. The basic reason is that (as noted above), <code>reduce</code> shouldn't be necessary if a key only appears once. The results of an entire map-reduce operation, run back through the same operation, should return the same results (that way huge operations can be sharded and map/reduced many times). And the output of any given <code>reduce</code> function, plugged back into <code>reduce</code> (as a single-item array), <em>needs to return the same value as went in</em>. (In CS lingo, <code>reduce</code> has to be idempotent. The documentation <a href="">explains</a> this in more technical detail.)</p> <p>Here's a simple JS test, using Node.js' <a href="">assertion API</a>, to verify this. To use it, have your mapReduce operation export their methods for a separate test script to import and test:</p> <div class="geshifilter"><pre class="javascript geshifilter-javascript"><span class="co1">// this should export the map, reduce, [finalize] functions passed to MongoDB.</span> <span class="kw2">var</span> mr <span class="sy0">=</span> require<span class="br0">&#40;</span><span class="st0">'./mapreduce-query'</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; <span class="co1">// override emit() to capture locally</span> <span class="kw2">var</span> emitted <span class="sy0">=</span> <span class="br0">&#91;</span><span class="br0">&#93;</span><span class="sy0">;</span> &nbsp; <span class="co1">// (in global scope so map can access it)</span> global.<span class="me1">emit</span> <span class="sy0">=</span> <span class="kw2">function</span><span class="br0">&#40;</span>key<span class="sy0">,</span> val<span class="br0">&#41;</span> <span class="br0">&#123;</span> emitted.<span class="me1">push</span><span class="br0">&#40;</span><span class="br0">&#123;</span>key<span class="sy0">:</span>key<span class="sy0">,</span> value<span class="sy0">:</span>val<span class="br0">&#125;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span><span class="sy0">;</span> &nbsp; <span class="co1">// reduce input should be same as output for a single object</span> <span class="co1">// dummyItems can be fake or loaded from DB</span> mr.<span class="me1">map</span>.<span class="me1">call</span><span class="br0">&#40;</span>dummyItems<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; <span class="kw2">var</span> reduceRes <span class="sy0">=</span> mr.<span class="me1">reduce</span><span class="br0">&#40;</span>emitted<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span>.<span class="me1">key</span><span class="sy0">,</span> <span class="br0">&#91;</span> emitted<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span>.<span class="me1">value</span> <span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">;</span> assert.<span class="me1">deepEqual</span><span class="br0">&#40;</span>reduceRes<span class="sy0">,</span> emitted<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span>.<span class="me1">value</span><span class="sy0">,</span> <span class="st0">'reduce is idempotent'</span><span class="br0">&#41;</span><span class="sy0">;</span></pre></div> <p>A simple MapReduce example is to count the number of posts per author. So in <code>map</code> you could <code>emit('author name', 1)</code> for each document, then in <code>reduce</code> loop over each value and add it to a total. Make sure <code>reduce</code> is adding the actual number in the value, not just 1, because that won't be idempotent. Similarly, you can't just <code>return values.length</code> and assume each value represents 1 document.</p> <h3>Finalize</h3> <p>Now you have a single reduced value per key, which get run through the <code>finalize</code> function once per key.</p> <p>To understand <code>finalize</code>, consider that this is essentially the same as not having a <code>finalize</code> function at all:</p> <div class="geshifilter"><pre class="javascript geshifilter-javascript"><span class="kw2">var</span> finalize <span class="sy0">=</span> <span class="kw2">function</span><span class="br0">&#40;</span>key<span class="sy0">,</span> value<span class="br0">&#41;</span> <span class="br0">&#123;</span> <span class="kw1">return</span> value<span class="sy0">;</span> <span class="br0">&#125;</span></pre></div> <p><code>finalize</code> is not necessary in every MapReduce operation, but it's very useful, for example, for calculating averages. You can't calculate the average in <code>reduce</code> because it can run multiple times per key, so each iteration doesn't have enough data to calculate with.</p> <p>The final results returned from the operation will have one value per key, as returned from <code>finalize</code> if it exists, or from <code>reduce</code> if <code>finalize</code> doesn't exist.</p> <h2>MapReduce in PHP and Drupal</h2> <p>The <a href="">MongoDB library for PHP</a> does not include any special functions for MapReduce. They can be run simply as a generic <a href=""><code>command</code></a>, but that takes a lot of code. I found a <a href="">MongoDB-MapReduce-PHP</a> library on Github which makes it easier. It works, but hasn't been updated in two years, so I forked the library and created <a href="">my own version</a> with what I think are some improvements.</p> <p>The original library by <a href="">infynyxx</a> created an abstract class <code>XMongoCollection</code> that was meant to be <a href="">sub-classed</a> for every collection. I found it more useful to make <code>XMongoCollection</code> directly instantiable, as an extended <em>replacement</em> for the basic <code>MongoCollection</code> class. I added a <code>mapReduceData</code> method which returns the data from the MapReduce operation. For my Drupal application, I added a <code>mapReduceDrupal</code> method which wraps the results and error handling in Drupal API functions.</p> <p>I could then load every collection with <code>XMongoCollection</code> and run <code>mapReduce</code> operations on it directly, like any other query. Note that the actual functions passed to MongoDB are still written in Javascript. For example:</p> <div class="geshifilter"><pre class="php geshifilter-php"><span class="co1">// (this should be statically cached in a separate function)</span> <span class="re0">$mongo</span> <span class="sy0">=</span> <span class="kw2">new</span> Mongo<span class="br0">&#40;</span><span class="re0">$server_name</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="co1">// connection</span> <span class="re0">$mongodb</span> <span class="sy0">=</span> <span class="re0">$mongo</span><span class="sy0">-&gt;</span><span class="me1">selectDB</span><span class="br0">&#40;</span><span class="re0">$db_name</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="co1">// MongoDB instance</span> &nbsp; <span class="co1">// use the new XMongoCollection class. make it available with an __autoloader.</span> <span class="re0">$collection</span> <span class="sy0">=</span> <span class="kw2">new</span> XMongoCollection<span class="br0">&#40;</span><span class="re0">$mongodb</span><span class="sy0">,</span> <span class="re0">$collection_name</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; <span class="re0">$map</span> <span class="sy0">=</span> <span class="co3">&lt;&lt;&lt;MAP function() { // doc is 'this' emit(this.category, 1); } MAP</span><span class="sy0">;</span> &nbsp; <span class="re0">$reduce</span> <span class="sy0">=</span> <span class="co3">&lt;&lt;&lt;REDUCE function(key, vals) { // have `variable` here passed in `setScope` return something; } REDUCE</span><span class="sy0">;</span> &nbsp; <span class="re0">$mr</span> <span class="sy0">=</span> <span class="kw2">new</span> MongoMapReduce<span class="br0">&#40;</span><span class="re0">$map</span><span class="sy0">,</span> <span class="re0">$reduce</span><span class="sy0">,</span> <span class="kw3">array</span><span class="br0">&#40;</span> <span class="coMULTI">/* limit initial document set with a query here */</span> <span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; <span class="co1">// optionally pass variables to the functions. (e.g. to apply user-specified filters)</span> <span class="re0">$mr</span><span class="sy0">-&gt;</span><span class="me1">setScope</span><span class="br0">&#40;</span><span class="kw3">array</span><span class="br0">&#40;</span><span class="st_h">'variable'</span> <span class="sy0">=&gt;</span> <span class="re0">$variable</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; <span class="co1">// 2nd param becomes the temporary collection name, so tmp_mapreduce_example. </span> <span class="co1">// (This is a little messy and could be improved. Stated limitation of v1.8+ not supporting &quot;inline&quot; results is not entirely clear.)</span> <span class="co1">// 3rd param is $collapse_value, see code</span> <span class="re0">$result</span> <span class="sy0">=</span> <span class="re0">$collection</span><span class="sy0">-&gt;</span><span class="me1">mapReduceData</span><span class="br0">&#40;</span><span class="re0">$mr</span><span class="sy0">,</span> <span class="st_h">'example'</span><span class="sy0">,</span> <span class="kw4">FALSE</span><span class="br0">&#41;</span><span class="sy0">;</span></pre></div> <h2>MapReduce in Node.js</h2> <p>The <a href="">MongoDB-Native driver for Node.js</a>, now an official 10Gen-sponsored project, includes a <a href=""><code>collection.mapReduce()</code> method</a>. The syntax is like this:</p> <div class="geshifilter"><pre class="javascript geshifilter-javascript">&nbsp; <span class="kw2">var</span> db <span class="sy0">=</span> <span class="kw2">new</span> mongodb.<span class="me1">Db</span><span class="br0">&#40;</span>dbName<span class="sy0">,</span> <span class="kw2">new</span> mongodb.<span class="me1">Server</span><span class="br0">&#40;</span>mongoHost<span class="sy0">,</span> mongoPort<span class="sy0">,</span> <span class="br0">&#123;</span><span class="br0">&#125;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span> db.<span class="kw3">open</span><span class="br0">&#40;</span><span class="kw2">function</span><span class="br0">&#40;</span>error<span class="sy0">,</span> dbClient<span class="br0">&#41;</span> <span class="br0">&#123;</span> <span class="kw1">if</span> <span class="br0">&#40;</span>error<span class="br0">&#41;</span> <span class="kw1">throw</span> error<span class="sy0">;</span> dbClient.<span class="me1">collection</span><span class="br0">&#40;</span>collectionName<span class="sy0">,</span> <span class="kw2">function</span><span class="br0">&#40;</span>err<span class="sy0">,</span> collection<span class="br0">&#41;</span> <span class="br0">&#123;</span> collection.<span class="me1">mapReduce</span><span class="br0">&#40;</span>map<span class="sy0">,</span> reduce<span class="sy0">,</span> <span class="br0">&#123;</span> out <span class="sy0">:</span> <span class="br0">&#123;</span> inline <span class="sy0">:</span> <span class="nu0">1</span> <span class="br0">&#125;</span><span class="sy0">,</span> query<span class="sy0">:</span> <span class="br0">&#123;</span> ... <span class="br0">&#125;</span><span class="sy0">,</span> <span class="co1">// limit the initial set (optional)</span> finalize<span class="sy0">:</span> finalize<span class="sy0">,</span> <span class="co1">// function (optional)</span> verbose<span class="sy0">:</span> <span class="kw2">true</span> <span class="co1">// include stats</span> <span class="br0">&#125;</span><span class="sy0">,</span> <span class="kw2">function</span><span class="br0">&#40;</span>error<span class="sy0">,</span> results<span class="sy0">,</span> stats<span class="br0">&#41;</span> <span class="br0">&#123;</span> <span class="co1">// stats provided by verbose</span> <span class="co1">// ...</span> <span class="br0">&#125;</span> <span class="br0">&#125;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span><span class="br0">&#41;</span><span class="sy0">;</span></pre></div> <p>It's mostly similar to the <a href="">command-line syntax</a>, except in the CLI, the results are <em>returned</em> from the <code>mapReduce</code> function, while in Node.js they are passed (asynchronously) to the callback.</p> <h3>MapReduce in Mongoose</h3> <p><a href="">Mongoose</a> is a modeling layer on top of the MongoDB-native Node.js driver, and in the latest 2.x release does not have its own support for MapReduce. (It's supposed to be coming in 3.x.) But the underlying collection is still available:</p> <div class="geshifilter"><pre class="javascript geshifilter-javascript"><span class="kw2">var</span> db <span class="sy0">=</span> mongoose.<span class="me1">connect</span><span class="br0">&#40;</span><span class="st0">'mongodb://dbHost/dbName'</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="co1">// (db.connection.db is the native MongoDB driver)</span> &nbsp; <span class="co1">// build a model (`Book` is a schema object)</span> <span class="co1">// model is called 'Book' but collection is 'books'</span> mongoose.<span class="me1">model</span><span class="br0">&#40;</span><span class="st0">'Book'</span><span class="sy0">,</span> Book<span class="sy0">,</span> <span class="st0">'books'</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; ... &nbsp; <span class="kw2">var</span> Book <span class="sy0">=</span> db.<span class="me1">model</span><span class="br0">&#40;</span><span class="st0">'Book'</span><span class="br0">&#41;</span><span class="sy0">;</span> Book.<span class="me1">collection</span>.<span class="me1">mapReduce</span><span class="br0">&#40;</span>...<span class="br0">&#41;</span><span class="sy0">;</span></pre></div> <p>(I actually think this is a case of Mongoose being <em>better</em> without its own abstraction on top of the existing driver, so I hope the new release doesn't make it more complex.)</p> <h2>In sum</h2> <p>I initially found MapReduce very confusing, so hopefully this helps clarify rather than increase the confusion. Please write in the comments below if I've misstated or mixed up anything above.</p> Wed, 20 Jun 2012 19:06:53 +0000 ben 7884 at Quick tip: Share a large MongoDB query object between the CLI and Node.js <p>I was writing a very long MongoDB query in JSON that needed to work both in a <a href="">Mongo CLI</a> script and in a Node.js app. Duplicating the JSON for the query across both raised the risk of one changing without the other. So I dug around for a way to share them and came up with this:</p> <p>Create a <span class="geshifilter"><code class="text geshifilter-text">query.js</code></span> file, like so (the query is just an example, substitute your own):</p> <div class="geshifilter"> <pre class="javascript geshifilter-javascript"><span class="co1">// dual-purpose, include me in mongo cli or node.js</span> <span class="kw2">var</span> module <span class="sy0">=</span> module <span class="sy0">||</span> <span class="br0">&#123;</span><span class="br0">&#125;</span><span class="sy0">;</span> <span class="co1">// filler for mongo CLI</span> &nbsp; <span class="co1">// query here is for mongo cli. module.exports is for node.js</span> <span class="kw2">var</span> query <span class="sy0">=</span> module.<span class="me1">exports</span> <span class="sy0">=</span> <span class="br0">&#123;</span> <span class="st0">'$and'</span> <span class="sy0">:</span> <span class="br0">&#91;</span> <span class="br0">&#123;</span> <span class="st0">'$or'</span> <span class="sy0">:</span> <span class="br0">&#91;</span> <span class="br0">&#123;</span> someField<span class="sy0">:</span> <span class="br0">&#123;</span> <span class="st0">'$exists'</span><span class="sy0">:</span> <span class="kw2">false</span> <span class="br0">&#125;</span> <span class="br0">&#125;</span><span class="sy0">,</span> <span class="br0">&#123;</span> someOtherField<span class="sy0">:</span> <span class="nu0">0</span> <span class="br0">&#125;</span> <span class="br0">&#93;</span> <span class="br0">&#125;</span><span class="sy0">,</span> &nbsp; <span class="br0">&#123;</span> <span class="st0">'$or'</span> <span class="sy0">:</span> <span class="br0">&#91;</span> <span class="br0">&#123;</span> <span class="st0">'endDate'</span> <span class="sy0">:</span> <span class="br0">&#123;</span> <span class="st0">'$lt'</span> <span class="sy0">:</span> <span class="kw2">new</span> Date<span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#125;</span> <span class="br0">&#125;</span><span class="sy0">,</span> <span class="br0">&#123;</span> <span class="st0">'endDate'</span> <span class="sy0">:</span> <span class="br0">&#123;</span> <span class="st0">'$exists'</span> <span class="sy0">:</span> <span class="kw2">false</span> <span class="br0">&#125;</span> <span class="br0">&#125;</span> <span class="br0">&#93;</span> <span class="br0">&#125;</span> &nbsp; <span class="co1">// ... </span> <span class="br0">&#93;</span> <span class="br0">&#125;</span><span class="sy0">;</span></pre></div> <p>Then in your mongo CLI script, use</p> <div class="geshifilter"> <pre class="javascript geshifilter-javascript">load<span class="br0">&#40;</span><span class="st0">'./query.js'</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="co1">// sets query var</span> &nbsp; db.<span class="me1">items</span>.<span class="me1">find</span><span class="br0">&#40;</span> query<span class="sy0">,</span> &nbsp; <span class="br0">&#123;</span> <span class="co1">// fields to select ...</span> <span class="br0">&#125;</span> <span class="br0">&#41;</span> .<span class="me1">sort</span><span class="br0">&#40;</span><span class="br0">&#123;</span> <span class="co1">// etc...</span> <span class="br0">&#125;</span><span class="br0">&#41;</span><span class="sy0">;</span></pre></div> <p>In your Node.js app, use <span class="geshifilter"><code class="text geshifilter-text">var query = require('./query.js');</code></span> and plug the same query into your MongoDB driver or Mongoose <span class="geshifilter"><code class="text geshifilter-text">find</code></span> function. Duplication avoided!</p> mongodb node.js Thu, 24 May 2012 18:10:29 +0000 ben 7812 at Using Node.js to connect with the eBay APIs on <p>We recently rolled out a new space on <a href="">Antiques Near Me</a> called <a href="">ANM Picks</a>, which uses eBay's APIs to programmatically find high-quality antique auctions using the same metrics that Sean (my antique-dealer business partner) uses in his own business. We'll cover the product promotion elsewhere, but I want to write here about how it works under the hood.</p> <p>The first iteration a few weeks ago simply involved an RSS feed being piped into the sidebar of the site. The site is primarily a Drupal 6 application, and Drupal has <a href="">tools</a> for handling feeds, but they're very heavy: They make everything a "node" (Drupal content item), and all external requests have to be run in series using PHP cron scripts on top of a memory-intensive Drupal process - i.e. they're good if you want to pull external content permanently into your CMS, but aren't suited for the kind of ephemeral, 3rd-party data that eBay serves. So I built a Node.js app that loaded the feed periodically, reformatted the items, and served them via Ajax onto the site.</p> <p>The RSS feed gave us very limited control over the filters, however. To really curate the data intelligently, I needed to use eBay's APIs. They have <a href="">dozens of APIs</a> of varying ages and protocols. Fortunately for our purposes, all the services we needed spoke JSON. No single API gave us all the dimensions we needed (that would be too easy!), so the app evolved to make many requests to several different APIs and then cross-reference the results.</p> <p>Node.js is great for working with 3rd-party web services because its IO is all asynchronous. Using <a href="">async</a>, my favorite Node.js flow-control module, the app can run dozens of requests in <a href="">parallel</a> or in a more finely-tuned <a href=""><code>queue</code></a>. (Node.js obviously didn't invent asynchronous flow control - Java and other languages can spawn off threads to do this - but I doubt they do it as easily from a code perspective.)</p> <p>To work with the eBay APIs, I wrote a client library which I published to npm (<code>npm install ebay-api</code>). Code is <a href="">on Github</a>. (Ironically, someone else published another eBay module as I was working on mine, independently; they're both incomplete in different ways, so maybe they'll converge eventually. I like that in the Node.js ecosystem, unlike in Drupal's, two nutcrackers are considered better than one.) The module includes a <code>ebayApiGetRequest</code> method for a single request, <code>paginateGetRequest</code> for a multi-page request, a parser for two of the services, a <code>flatten</code> method that simplifies the data structure (to more easily query the results with MongoDB), and some other helper functions. There are <a href="">examples</a> in the code as well.</p> <p>Back to my app: Once the data is mashed up and filtered, it's saved into MongoDB (using <a href="">Mongoose</a> for basic schema validation, but otherwise as free-form JSON, which Mongo is perfect for). A subset is then loaded into memory for the sidebar Favorites and <a href="">ANM Picks</a>. (The database is only accessed for an individual item if it's no longer a favorite.) All this is frequently reloaded to fetch new items and flush out old ones (under the eBay TOS, we're not allowed to show anything that has passed).</p> <p>The app runs on a different port from the website, so to pipe it through cleanly, I'm using <a href="">Varnish</a> as a proxy. Varnish is already running in front of the Drupal site, so I added another <code>backend</code> in the VCL and activate it based on a subdomain. Oddly, trying to toggle by URL (via <code>req.url</code>) didn't work - it would intermittently load the wrong backend without a clear pattern - so the subdomain (via ` was second best. It was still necessary to deal with <a href="">Allow-Origin issues</a>, but at least the URL scheme looked better, and the cache splits the load.</p> <p>Having the data in MongoDB means we can multiply the visibility (and affiliate-marketing revenue) through social networks. Each item now has a Facebook <a href="">Like widget</a> which points back to a unique URL on our site (proxied through Drupal, with details visible until it has passed). The client-side JS subscribes to the widgets' events and pings our app, so we can track which items and categories are the most popular, and (by also tracking clicks) make sure eBay is being honest. We're tuning the algorithm to show only high-quality auctions, so the better it does, the more (we hope) they'll be shared organically.</p> <p>Comments? Questions? Interested in using the eBay APIs with Node.js? Feel free to email me or comment below.</p> antiquesnearme APIs mongodb node.js Wed, 23 May 2012 02:58:23 +0000 ben 7801 at JSConf Argentina Over on my <a href="">travel blog</a> I wrote about attending <a href="">JSConf Argentina</a> this past weekend. They flew in a dozen Node.js luminaries, I connected with the Javascript community here in Buenos Aires, as well as people in San Francisco (where I'll probably be moving later this year). It was awesome, <a href="">check out the post</a> (with photos of the very cool venue). javascript node.js Tue, 22 May 2012 20:55:24 +0000 ben 7796 at Liberate your Drupal data for a service-oriented architecture (using Redis, Node.js, and MongoDB) <p>Drupal's basic content unit is a "node," and to build a single node (or to perform any other Drupal activity), the codebase has to be bootstrapped, and everything needed to respond to the request (configuration, database and cache connections, etc) has to be initialized and loaded into memory from scratch. Then <span class="geshifilter"><code class="text geshifilter-text">node_load</code></span> runs through the NodeAPI hooks, multiple database queries are run, and the node is built into a single PHP object.</p> <p>This is fine if your web application runs entirely through Drupal, and always will, but what if you want to move toward a more flexible <a href="">Service-oriented architecture</a> (SOA), and share your content (and users) with other applications? For example, build a mobile app with a Node.js backend like <a href="">LinkedIn did</a>; or calculate analytics for business intelligence; or have customer service reps talk to your customers in real-time; or integrate with a ticketing system; or do anything else that doesn't play to Drupal's content-publishing strengths. Maybe you just want to make your data (which is the core of your business, not the server stack) technology-agnostic. Maybe you want to migrate a legacy Drupal application to a different system, but the cost of refactoring all the business logic is prohibitive; with an SOA you could change the calculation and get the best of both worlds.</p> <p>The traditional way of doing this was setting up a web service in Drupal using something like the <a href="">Services module</a>. External applications could request data over HTTP, and Drupal would respond in JSON. Each request has to wait for Drupal to bootstrap, which uses a lot of memory (every enterprise Drupal site I've ever seen has been bogged down by legacy code that runs on every request), so it's slow and doesn't scale well. Rather than relieving some load from Drupal's LAMP stack by building a separate application, you're just adding more load to both apps. To spread the load, you have to keep adding PHP/Apache/Mysql instances horizontally. Every module added to Drupal compounds the latency of Drupal's hook architecture (running thousands of <span class="geshifilter"><code class="text geshifilter-text">function_exists</code></span> calls for example), so the stakeholders involved in changing the Drupal app has to include the users of every secondary application requesting the data. With a Drupal-Services approach, other apps will always be second-class citizens, dependent on the legacy system, not allowing the "loose coupling" principle of SOA.</p> <p>I've been shifting <a href="">my own work</a> from Drupal to <a href="">Node.js</a> over the last year, but I still have large Drupal applications (such as <a href="">Antiques Near Me</a>) which can't be easily moved away, and frankly don't need to be for most use cases. Overall, I tend to think of Drupal as a legacy system, burdened by <a href="">too much cruft</a> and inconsistent architecture, and no longer the best platform for most applications. I've been giving a lot of thought to ways to keep these apps future-proof without rebuilding all the parts that work well as-is.</p> <p>That led me to build what I've called the <strong>"Drupal Liberator"</strong>. It consists of a Drupal module and a Node.js app, and uses <a href="">Redis</a> (a very fast key-value store) for a middleman queue and <a href="">MongoDB</a> for the final storage. Here's how it works:</p> <ul> <li><p>When a node (or user, or other entity type) is saved in Drupal, the module encodes it to JSON (a cross-platform format that's also native to Node.js and MongoDB), and puts it, along with metadata (an md5 checksum of the JSON, timestamp, etc), into a Redis <a href="">hash</a> (a simple key-value object, containing the metadata and the object as a JSON string). It also notifies a Redis <a href="">pub/sub channel</a> of the new hash key. (This uses 13KB of additional memory and 2ms of time for Drupal on the first node, and 1KB/1ms for subsequent node saves on the same request. If Redis is down, Drupal goes on as usual.)</p></li> <li><p>The Node.js app, running completely independently of Drupal, is listening to the pub/sub channel. When it's pinged with a hash key, it retrieves the hash, <span class="geshifilter"><code class="text geshifilter-text">JSON.parse</code></span>'s the string into a native object, possibly alters it a little (e.g., adding the checksum and timestamp into the object), and saves it into MongoDB (which also speaks JSON natively). The data type (node, user, etc) and other information in the metadata directs where it's saved. Under normal conditions, this whole process from <span class="geshifilter"><code class="text geshifilter-text">node_save</code></span> to MongoDB takes less than a second. If it were to bottleneck at some point in the flow, the Node.js app runs asynchronously, not blocking or straining Drupal in any way.</p></li> <li><p>For redundancy, the Node.js app also polls the hash namespace every few minutes. If any part of the mechanism breaks at any time, or to catch up when first installing it, the timestamp and checksum stored in each saved object allow the two systems to easily find the last synchronized item and continue synchronizing from there.</p></li> </ul> <p>The result is a read-only clone of the data, synchronized almost instantaneously with MongoDB. Individual nodes can be loaded without bootstrapping Drupal (or touching Apache-MySql-PHP at all), as fully-built objects. New apps utilizing the data can be built in any framework or language. The whole Drupal site could go down and the data needed for the other applications would still be usable. Complex queries (for node retrieval or aggregate statistics) that would otherwise require enormous SQL joins can be built using <a href="">MapReduce</a> and run without affecting the Drupal database.</p> <p>One example of a simple use case this enables: Utilize the CMS backend to edit your content, but publish it using a thin MongoDB layer and client-side templates. (And outsource comments and other user-write interactions to a service like Disqus.) Suddenly your content displays much faster and under higher traffic with less server capacity, and you don't have to worry about <a href="">Varnish</a> or your Drupal site being "<a href="">Slashdotted</a>".</p> <p>A few caveats worth mentioning: First, it's read-only. If a separate app wants to modify the data in any way (and maintain data integrity across systems), it has to communicate with Drupal, or a synchronization bridge has to be built in the other direction. (This could be the logical next step in developing this approach, and truly make Drupal a co-equal player in an SOA.)</p> <p>Second, you could have Drupal write to MongoDB directly and cut out the middlemen. (And indeed that might make more sense in a lot of cases.) But I built this with the premise of an already strained Drupal site, where adding another database connection would slow it down even further. This aims to put as little additional load on Drupal as possible, with the "Liberator" acting itself as an independent app.</p> <p>Third, if all you needed was instant node retrieval - for example, if your app could query MySql for node ID's, but didn't want to bootstrap Drupal to build the node objects - you could leave them in Redis and take Node.js and MongoDB out of the picture.</p> <p>I've just started exploring the potential of where this can go, so I've run this mostly as a proof-of-concept so far (successfully). I'm also not releasing the code at this stage: If you want to adopt this approach to evolve your Drupal system to a service-oriented architecture, I am <a href="">available as a consultant</a> to help you do so. I've started building separate apps in Node.js that tie into Drupal sites with <a href="">Ajax</a> and found the speed and flexibility very liberating. There's also a world of non-Drupal developers who can help you leverage your data, if it could be easily liberated. I see this as opening a whole new set of doors for where legacy Drupal sites can go.</p> drupal mongodb node.js redis SOA Sun, 29 Apr 2012 15:06:20 +0000 ben 7767 at Cracking the cross-domain/Allow-Origin nut <p><em>Update/preface: None of this is relevant in IE, because IE doesn't respect cross-origin rules. You have to use JSONP for IE (so you might as well use JSONP everywhere.)</em></p> <p>The other day, I was setting up an Ajax feed loader - a node.js app pulling an RSS feed every few minutes and parsing it, and a Drupal site requesting a block of HTML from the node.js app via jQuery-ajax - and ran into a brick wall of <a href="">cross-domain/origin</a> (aka Cross Domain Resource Sharing) issues. This occurs any time you ajax-load something on a different subdomain or port from the main page it's loading into. (Even if you pipe the feed through the primary domain, using Varnish for example, if you use a separate hostname for your development site, or a local server, it'll break on those.)</p> <p>In theory, it should be very simple to add an <strong>Access-Control-Allow-Origin</strong> header to the source app - the node.js app in this case - and bypass the restriction. In practice, it's not nearly so easy.</p> <p>To get at the root of the problem and eliminate quirks in the particular app I was building, I set up 2 local virtualhosts with apache, and tried every combination until it worked.</p> <p>Here are some problems I ran into, and solutions, to save the next person with this issue some time:</p> <ul> <li>Access-Control-Allow-Origin is supposed to allow multiple domains to be set - as in <code></code> - but no combination of these (separating with spaces, commas, or comma+space) actually worked. The solution to this is either allow all domains with <code>*</code> or dynamically set the domain to the origin on a per request basis. (In a typical node.js HTTP server for example, that's found at <code>req.headers.origin</code> - but that only exists if it's called via Ajax in another page.) The latter solution is fine when the source domain is always known, or every request hits the backend, but can be problematic if you're trying to run it on multiple endpoints, or through Varnish.</li> <li>Chrome seems to have some bugs handling these situations, producing inconsistent results with the same environment.</li> <li>The minimal working solution in the Apache experiment turned out to require, besides a valid <code>Access-Control-Allow-Origin</code>, this one header: <code>Access-Control-Allow-Headers: X-Requested-With</code>. (<a href="">Apparently</a> that's used only by Ajax/XmlHttpRequest requests, and without the server explicitly allowing that request header, the request fails.)</li> <li>Before making the <code>GET</code> request for the content itself, some browsers make an <code>OPTIONS</code> request to verify the cross-domain permissions. Several other people running into these problems recommending including this header: <code>Access-Control-Allow-Methods: OPTIONS, GET, POST</code>. In the Apache experiment it wasn't necessary, but I put it in the final node.js app and it can't hurt.</li> <li>Also from other people's recommendations, a more verbose version of <code>Access-Control-Allow-Headers</code> is possible, if not all necessary: <code>Access-Control-Allow-Headers: Content-Type, Depth, User-Agent, X-File-Size, X-Requested-With, If-Modified-Since, X-File-Name, Cache-Control</code></li> </ul> <p>Taking the lessons from the Apache experiment back to the node.js app, I used this code. It's written as an <a href="">express</a> middleware function (make sure to run it before <code>app.router</code> or any individual routes) The <code>_</code> character refers to the <a href="">underscore</a> library.</p> <div class="geshifilter"><pre class="javascript geshifilter-javascript">app.<span class="kw2">use</span><span class="br0">&#40;</span><span class="kw2">function</span><span class="br0">&#40;</span>req<span class="sy0">,</span> res<span class="sy0">,</span> next<span class="br0">&#41;</span> <span class="br0">&#123;</span> <span class="kw2">var</span> headers <span class="sy0">=</span> <span class="br0">&#123;</span> <span class="st0">'Cache-Control'</span> <span class="sy0">:</span> <span class="st0">'max-age:120'</span> <span class="co1">// cache for 2m (in varnish and client)</span> <span class="br0">&#125;</span><span class="sy0">;</span> &nbsp; <span class="co1">// allowed origin?</span> <span class="kw1">if</span> <span class="br0">&#40;</span><span class="sy0">!</span>_.<span class="me1">isUndefined</span><span class="br0">&#40;</span>req.<span class="me1">headers</span>.<span class="me1">origin</span><span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span> <span class="co1">// validate (primary, secondary, local-dev)</span> <span class="kw1">if</span> <span class="br0">&#40;</span>req.<span class="me1">headers</span>.<span class="me1">origin</span>.<span class="me1">match</span><span class="br0">&#40;</span><span class="co2">/domain\.com/</span><span class="br0">&#41;</span> <span class="sy0">||</span> req.<span class="me1">headers</span>.<span class="me1">origin</span>.<span class="me1">match</span><span class="br0">&#40;</span><span class="co2">/secondary\.domain\.com/</span><span class="br0">&#41;</span> <span class="sy0">||</span> req.<span class="me1">headers</span>.<span class="me1">origin</span>.<span class="me1">match</span><span class="br0">&#40;</span><span class="co2">/domain\.local/</span><span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span> headers <span class="sy0">=</span> _.<span class="me1">extend</span><span class="br0">&#40;</span>headers<span class="sy0">,</span> <span class="br0">&#123;</span> <span class="st0">'Access-Control-Allow-Origin'</span><span class="sy0">:</span> req.<span class="me1">headers</span>.<span class="me1">origin</span> <span class="sy0">,</span> <span class="st0">'Access-Control-Allow-Methods'</span><span class="sy0">:</span> <span class="st0">'GET, POST, OPTIONS'</span> <span class="sy0">,</span> <span class="st0">'Access-Control-Allow-Headers'</span><span class="sy0">:</span> <span class="st0">'Content-Type, X-Requested-With, X-PINGOTHER'</span> <span class="sy0">,</span> <span class="st0">'Access-Control-Max-Age'</span><span class="sy0">:</span> <span class="nu0">86400</span> <span class="co1">// 1 day</span> <span class="br0">&#125;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span> <span class="br0">&#125;</span> &nbsp; _.<span class="me1">each</span><span class="br0">&#40;</span>headers<span class="sy0">,</span> <span class="kw2">function</span><span class="br0">&#40;</span>value<span class="sy0">,</span> key<span class="br0">&#41;</span> <span class="br0">&#123;</span> res.<span class="me1">setHeader</span><span class="br0">&#40;</span>key<span class="sy0">,</span> value<span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; next<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span><span class="br0">&#41;</span><span class="sy0">;</span></pre></div> ajax http node.js Fri, 13 Apr 2012 19:44:54 +0000 ben 7734 at New Varnish "Book" I just found out about this so I thought I'd pass it along: It appears the creators of <a href="">Varnish</a> have created a for-profit venture <a href="">"Varnish Software"</a> to market Varnish as a service, and as part of that effort they wrote an e-book, <a href=""><em>The Varnish Book</em></a>. One of the challenges with Varnish has always been the lack of thorough "how-to" documentation, so this is good news. (See my December post, <a href=""><em>Making sense of Varnish caching rules</em></a>.) varnish Wed, 04 Apr 2012 14:34:28 +0000 ben 7717 at Reducing coordinate density in KML files (with a node.js script) <p>Lately I've been tracking my <a href="">bicycle rides</a> with an Android GPS tracking app. The app exports to multiple formats, including Google Maps and KML. I wanted to take all my rides for a week and overlay them on a single map, but the coordinate density was too high - thousands of points for each ride - so GMaps was paginating the maps to reduce the density, and the overlays didn't work.</p> <p>So I needed some way to reduce the coordinate density of the KML files, taking 1 out of every N points, so the route looked the same on the map but with less unnecessary detail.</p> <p>I tried to find an existing tool to do this, but couldn't find one. So I started writing one as a <a href="">node.js</a> script (I'm trying to do everything platform-neutral these days in node). First I tried to actually parse the KML using various XML parsers - but the parsers stripped the line breaks between coordinates, so the format broke, and I realized I didn't really need to parse the format at all. I just needed to eliminate some of the lines.</p> <p>The result is a very simple, functional <strong><a href="">KML Coordinate Density Reducer</a></strong>. It reads each line of a KML file, uses regex to determine if it's a coordinate line or not, and if it is a coordinate line, strip all but every Nth line, as specified in the shell parameters.</p> <p>Using the script, I reduced each route from thousands of points to a few hundred, imported them all <a href=";msa=0&amp;ll=-34.575702,-58.418427&amp;spn=0.086925,0.142479">into a single map</a>, and can view or embed the routes all at once.</p> <p><em>Update</em>: Someone wrote an adaptation that <a href="">also checks the distance</a> between points. (See <a href="!/drupol/status/164228686919376898">tweet</a>.)</p> gmaps node.js Tue, 31 Jan 2012 00:48:41 +0000 ben 7607 at Git trick: Cleanup end-of-line changes <p>I was working on a site recently that was moved from one VPS hoster to another, and the ops team that moved it somehow introduced an end-of-line change to every file. I didn't want to commit these junk changes, so I wrote this little snippet to clean them. For each <em>modified</em> file (not new ones), it counts the number of changed lines, excluding EOL changes, and if there are none, it checks out a clean copy of the file:</p> <div class="geshifilter"> <pre class="bash geshifilter-bash"><span class="kw2">git</span> st <span class="re5">--porcelain</span> <span class="sy0">|</span> <span class="kw2">grep</span> <span class="re5">-r</span> <span class="st0">&quot;^ M&quot;</span> <span class="sy0">|</span> <span class="kw2">awk</span> <span class="st_h">'{ print $2 }'</span> <span class="sy0">|</span> <span class="kw1">while</span> <span class="kw2">read</span> FILE; <span class="kw1">do</span> <span class="re2">LINES</span>=<span class="sy0">`</span><span class="kw2">git</span> <span class="kw2">diff</span> <span class="re5">--ignore-space-at-eol</span> <span class="st0">&quot;<span class="es2">$FILE</span>&quot;</span> <span class="sy0">|</span> <span class="kw2">wc</span> -l<span class="sy0">`</span>; <span class="kw1">if</span> <span class="br0">&#91;</span><span class="br0">&#91;</span> <span class="st0">&quot;<span class="es2">$LINES</span>&quot;</span> <span class="re5">-eq</span> <span class="st0">&quot;0&quot;</span> <span class="br0">&#93;</span><span class="br0">&#93;</span>; <span class="kw1">then</span> <span class="kw2">git</span> checkout <span class="st0">&quot;<span class="es2">$FILE</span>&quot;</span>; <span class="kw3">echo</span> <span class="st0">&quot;<span class="es2">$FILE</span>&quot;</span>; <span class="kw1">fi</span>; <span class="kw1">done</span></pre></div> bash git Mon, 30 Jan 2012 16:39:07 +0000 ben 7605 at Why Node.js? Why clients should ask for it, and developers should build with it <p>Following my <a href="">post about my new node.js apps</a>, and since I would like to turn <a href="">New Leaf Digital</a> into a node.js shop, I should write a little about why you, perhaps a potential client with a web project, might want to have it built in node.</p> <p>Node.js is only two years old, but already sustains a vast <a href="">ecosystem</a> of add-on modules, <a href=";ie=UTF-8&amp;q=node.js+tutorials">tutorials</a>, and <a href="">meetups</a>. The energy in the community is palpable and is based on strong fundamentals. Working in Node brings out the <a href="">best parts</a> of web development. Node is built in javascript, a language every developer knows (at least a little bit), so the learning curve is not a deterrent. That's important to consider as a client because, unlike other systems that have <a href="">peaked</a> in their appeal to developers, you can build a Node.js application today and know its platform will be supported for the long haul.</p> <p>Node is truly lightweight: Unlike bloated Swiss army knife frameworks that try to solve every problem out of the box at the expense of performance and comprehension, a Node app starts as a completely blank slate and is only as complex as you make it. So you'll get more bang for your server capacity buck from the get-go. (I've worked on several Drupal projects involving performance, getting each page to load faster by eliminating cruft and bottlenecks. In Node that whole way of thinking is flipped on its head.) Every tiny operation of your app is also light: the whole system is built on a philosophy of "asynchronous" input/output. Think of a node app as a juggler: while each ball is arcing through the air, it's catching and throwing other balls. Interactions don't "block" the flow like a traditional web application. So you don't run out of capacity until you're <em>really</em> out of capacity, and a bottleneck in one part of the system won't bring down the rest of it.</p> <p>This asynchronous I/O also makes node.js especially suited to applications involving file handling, interaction with external web services (as in <a href="">Flashcards</a>), or real-time user interaction (as in <a href="">Interactive Lists</a>). These are much harder to scale on traditional platforms, because the operations make other processes wait around while they're off doing their work.</p> <p>Node.js is also perfectly positioned to work with new database technologies, like <a href="">MongoDB</a>, which offer a flexibility not available with traditional SQL/relational databases. Node.js and MongoDB both "speak" the same language natively - javascript - so building or working with JSON APIs is easy. Architectures can be "rapidly prototyped" and changed on the fly as the application concept evolves.</p> <p><strong>So what is node.js <em>not</em> good for?</strong> If you want a robust content management system out of the box for a news publication, for example, you probably want to stick with a platform like Drupal. If you want a simple blog with easy content creation tools and comments and photos, you're still safe with Wordpress. If you're building software for banks to transfer money across the globe, there are probably battle-hardened, traditional ways to do that.</p> <p>But for almost any other web app, node.js might just be the best toolkit to build with. So please <a href="">let me know</a> when you're plotting your next big web idea!</p> node.js Tue, 17 Jan 2012 15:00:00 +0000 ben 7565 at Building a suite of apps in node.js <p>I just launched a suite of <a href="">node.js</a> apps at <strong><a href=""></a></strong>. Included for public consumption are my <a href="">Spanish Flashcards</a> app, refactored so each user has their own flashcards, and a new Interactive Lists app, an expansion of a <a href="">proof of concept</a> I built using websockets. They're connected with a common layout and a shared authentication layer using <a href="">Facebook Connect</a>.<a href=""><img src="" style="float:right; margin:1em 0 1em 1em;" /></a></p> <p>The main purpose of building these apps was to learn a complete node.js stack (more on that below) and gain experience coding and troubleshooting node.js apps.</p> <p>The second purpose was to demonstrate production node.js code to prospective clients. <a href="">New Leaf Digital</a> is now officially a half-Drupal, half-Node.js shop (and happy to switch more than half to the latter if there is work to be had).</p> <p>The third purpose (besides using the apps myself) was to allow others to use the apps, as well as to learn from the code. Anyone can <a href="">login</a> using their Facebook ID and all the code is on <a href="">Github</a> (under a CC license).</p> <h3>What do the apps do?</h3> <p><strong>Spanish Flashcards</strong> lets you create flashcards of English-Spanish word translations. Randomly play your flashcards until you get them all right, and look up new words with the <a href="">WordReference API</a>.</p> <p><strong>Interactive Lists</strong> lets you create to-do lists, shopping lists, or any other kinds of list, and share your lists with your friends. As you add and remove items from the list, everyone else sees it immediately in real-time. Imagine a scavenger hunt in which everyone is tracking the treasure on their phones, or a family trip to the mall.</p> <p><strong>Auth</strong> (under the hood): a common authentication layer using Facebook Connect, which the other 2 user-facing apps (and the parent app) share.</p> <h3>How they're built</h3> <p>The stack consists of: <a href="">node.js</a> as the engine, <a href="">Express</a> for the web framework, <a href="">Jade</a> for templates, <a href="">Mongoose</a> for MongoDB modeling, <a href=""></a> for real-time two-way communication, <a href="">everyauth</a> + <a href="">mongoose-auth</a> for 3rd party authentication, <a href="">connect-mongodb</a> for session storage, <a href="">async</a> for readable control flow, <a href="">underscore</a> for language add-ons, <a href="">http-proxy</a> for a flexible router. Plus <a href="">connect-less</a> and <a href="">Bootstrap</a> for aesthetics. <a href="">Forever</a> keeps it running.</p> <p>To bring the 4 apps (parent/HQ, auth, flashcards, lists) together, there were a few options: a parent app proxying to child apps running independently; virtual hosts (requiring separate DNS records); or using Connect/Express's "mounting" capability. Mounted apps were the most complex option, but offered the best opportunity to learn the deep innards of Express, and the proxy solution was <a href="">unclear</a> at the time, so I went with mounted apps.</p> <p>Along the way I refactored constantly and hit brick walls dozens of times. In the end it all works (so far!), and the code makes sense. Since the parent app is a whole standalone server hogging its port, I added a thin proxy on top which points the subdomain to the app, keeping other subdomains on port 80 open for the future.</p> <p>The app mounting functionality of Express.js is incredibly robust: using the same <code>app.use()</code> syntax as middleware, you can <code>app.use(anotherApp)</code>, or even <code>app.use('/path', anotherApp)</code> to load a whole app at a sub-path. (Then the sub-app's routes all change relative to that "mount point".)</p> <p>Of course in practice, mounting multiple apps is extremely complex. It's also not the most stable approach: a fatal error in any part of the suite will bring down the whole thing. So on a suite of "real" production apps, I wouldn't necessarily recommend the mounting model, but it's useful to understand. And when it works, it's very elegant.</p> <h3>Coming soon</h3> <p>Over the next few weeks, I'll be writing a series of blog posts about specific lessons learned from building these apps. In the meantime, I hope some people <a href="">make good use</a> of them, and please <a href="">report bugs</a> if you find any.</p> <p>Next I'm going to write about <em><a href="">Why Node.js?</a></em> - why you, perhaps a potential client, or perhaps another developer, should consider building your next app in Node.</p> node.js Tue, 17 Jan 2012 14:00:00 +0000 ben 7564 at How to install xhprof on OSX (Snow Leopard or Lion) and XAMPP <p><strong>Note: The same instructions apply to <a href="">XDebug</a> and possibly every other PECL module in this environment.</strong></p> <p><a href="">XHProf</a> is a very neat PHP profiler. Documentation and general installation instructions are not hard to come by, but none of the examples I found had the right steps for my particular setup, Mac OSX Lion with <a href="">XAMPP</a> running php 5.3. (Should apply also to OSX Snow Leopard.) So this post fills in that hole; for everything else regarding xhprof look elsewhere.</p> <p>The problem is the "architecture" that xhprof compiles to; OSX and xhprof are 64-bit but XAMPP is apparently 32-bit. (If you're using MAMP instead of XAMPP, you'll have a similar problem; <a href="">here</a> is a solution for that.)</p> <p>After trying multiple variations, the solution that worked was adapted from this <a href="">memcached-on-xampp post</a>):</p> <ol> <li>Download the latest copy of xhprof</li> <li><span class="geshifilter"><code class="text geshifilter-text">sudo phpize</code></span> (not sure if this is actually necessary. You may run into problems here; find the solution to that elsewhere.)</li> <li>If you've been trying other methods that failed, clean up the old junk: <span class="geshifilter"><code class="text geshifilter-text">sudo make clean</code></span></li> <li>The most important part:<br /> <span class="geshifilter"><code class="text geshifilter-text">sudo MACOSX_DEPLOYMENT_TARGET=10.6 CFLAGS='-O3 -fno-common -arch i386 -arch x86_64' LDFLAGS='-O3 -arch i386 -arch x86_64' CXXFLAGS='-O3 -fno-common -arch i386 -arch x86_64' ./configure --with-php-config=/Applications/XAMPP/xamppfiles/bin/php-config-5.3.1</code></span></li> <li><span class="geshifilter"><code class="text geshifilter-text">sudo make</code></span></li> <li><span class="geshifilter"><code class="text geshifilter-text">sudo make install</code></span></li> <li>Add to your php.ini:<br /> <div class="geshifilter"> <pre class="text geshifilter-text">xhprof extension = xhprof.output_dir = &quot;/Applications/XAMPP/xhprof-logs&quot;</pre></div> </li> </ol> <p>Then run <span class="geshifilter"><code class="text geshifilter-text">php -i | grep xhprof</code></span> and if it worked, it should tell you which version you're running. If it fails, it will say, <em> mach-o, but wrong architecture in Unknown on line 0</em>.</p> <p>Good luck!</p> <p><em>Update:</em> It's worth mentioning, you'll probably also need to install <a href="">Graphviz</a> so xhprof can find the <span class="geshifilter"><code class="text geshifilter-text">dot</code></span> utility to generate graphics.</p> apache osx xhprof Sat, 14 Jan 2012 20:55:21 +0000 ben 7559 at Unconventional unit testing in Drupal 6 with PhpUnit, upal, and Jenkins <p>Unit testing in Drupal using the standard <a href="">SimpleTest</a> approach has long been one of my pain points with Drupal apps. The main obstacle was setting up a realistic test "sandbox": The SimpleTest module builds a virtual site with a temporary database (within the existing database), from scratch, for every test suite. To accurately test the complex interactions of a real application, you need dozens of modules enabled in the sandbox, and installing all their database schemas takes a long time. If your site's components are exported to <a href="">Features</a>, the tests gain another level of complexity. You could have the test turn on every module that's enabled on the real site, but then each suite takes 10 minutes to run. And that still isn't enough; you also need settings from the variables table, content types real nodes and users, etc.</p> <p>So until recently, it came down to the choice: make simple but unrealistic sandboxes that tested minutia but not the big-picture interactions; or build massive sandboxes for each test that made the testing workflow impossible. After weeks of trying to get a SimpleTest environment working on a Drupal 6 application with a lot of custom code, and dozens of hours debugging the tests or the sandbox setups rather than building new functionality, I couldn't justify the time investment, and shelved the whole effort.</p> <p>Then Moshe Weizman pointed me to his alternate <a href="">upal</a> project, which aims to bring the <a href="">PHPUnit</a> testing framework to Drupal, with backwards compatibility for SimpleTest assertions, but not the baggage of SimpleTest's Drupal implementation. Moshe <a href="">recently introduced upal</a> as a proposed testing framework for Drupal 8, especially for core. Separately, a few weeks ago, I started using upal for a different purpose: as a unit testing framework for custom applications in Drupal 6.</p> <p>I <a href="">forked the Github repo</a>, started a backport to D6 (copying from SimpleTest-6 where upal was identical to SimpleTest-7), and fixed some of the holes. More importantly, I'm taking a very different approach to the testing sandbox: I've set up an entirely separate test site, copied wholesale from the dev site (which itself is copied from the production site). This means:</p> <ul> <li>I can visually check the test sandbox at any time, because it runs as a virtualhost just like the dev site.</li> <li>All the modules, settings, users, and content are in place for each test, and don't need to be created or torn down.</li> <li>Rebuilding the sandbox is a single operation (with shell scripts to sync MySql, MongoDB, and files, manually triggered in Jenkins)</li> <li>Cleanup of test-created objects occurs (if desired) on a piecemeal basis in <code>tearDown()</code> - <code>drupalCreateNode()</code> (modified) and <code>drupalVariableSet()</code> (added) optionally undo their changes when the test ends.</li> <li><code>setUp()</code> is not needed for most tests at all.</li> <li><code>dumpContentsToFile()</code> (added) replicates SimpleTest's ability to save <code>curl</code>'d files, but on a piecemeal basis in the test code.</li> <li>Tests run <em>fast</em>, and accurately reflect the entirety of the site with all its actual interactions.</li> <li>Tests are run by the <a href="">Jenkins</a> continuous-integration tool and the results are visible in Jenkins using the JUnit xml format.</li> </ul> <h3>How to set it up (with Jenkins, aka Hudson)</h3> <p><em>(Note: the following are not comprehensive instructions, and assume familiarity with shell scripting and an existing installation of Jenkins.)</em></p> <ol> <li>Install upal from <a href="">Moshe's repo</a> (D7) or <a href="">mine</a> (D6). (Some of the details below I added recently, and apply only to the D6 fork.)</li> <li>Install PHPUnit. The <code>pear</code> approach is easiest.</li> <li>Upgrade drush: the notes say, "You currently need 'master' branch of drush after 2011.07.21. Drush 4.6 will be OK -" - this seems to correspond to the HEAD of the <code>7.x-4.x</code> branch in the <a href="">Drush repository</a>.</li> <li>Set up a webroot, database, virtualhost, DNS, etc for your test sandbox, and any scripts you need to build/sync it.</li> <li>Configure phpunit.xml. Start with upal's readme, then (D6/fork only) add <code>DUMP_DIR</code> (if wanted), and if HTTP authentication to the test site is needed, UPAL_HTTP_USER and UPAL_HTTP_PASS. In my version I've split the DrupalTestCase class to its own file, and renamed drupal_test_case.php to upal.php, so rename the "bootstrap" parameter accordingly. ** (note: the upal notes say it must run at a URL ending in /upal - this is no longer necessary with this approach.)</li> <li>PHPUnit expects the files to be named .php rather than .test - however if you explicitly call an individual .test file (rather than traversing a directory, the approach I took), it might work. You can also remove the <code>getInfo()</code> functions from your SimpleTests, as they don't do anything anymore.</li> <li>If Jenkins is on a different server than the test site (as in my case), make sure Jenkins can SSH over.</li> <li>To use <code>dumpContentsToFile()</code> or the XML results, you'll want a dump directory (set in phpunit.xml), and your test script should wipe the directory before each run, and rsync the files to the build workspace afterwards.</li> <li>To convert PHPUnit's JUnit output to the format Jenkins understands, you'll need the <a href="">xUnit</a> plugin for Jenkins. Then point the Jenkins job to read the XML file (after rsync'ing if running remotely). [Note: the last 3 steps have to be done with SimpleTest and Jenkins too.]</li> <li>Code any wrapper scripts around the above steps as needed.</li> <li>Write some tests! (Consult the <a href="">PHPUnit documentation</a>.)</li> <li>Run the tests!</li> </ol> <h3>Some issues I ran into (which you might also run into)</h3> <ol> <li>PHPUnit, unlike SimpleTest, stops a test function after the first failure. This isn't a bug, it's <a href="">expected behavior</a>, even with <code>--stop-on-failure</code> disabled. I'd prefer it the other way, but that's how it is.</li> <li>Make sure your test site - like any dev site - does not send any outbound mail to customers, run unnecessary feed imports, or otherwise perform operations not meant for a non-production site.</li> <li>In my case, Jenkins takes 15 minutes to restart (after installing xUnit for example). I don't know why, but keep an eye on the Jenkins log if it's taking you a while too.</li> <li>Also in my case, Jenkins runs behind an Apache reverse-proxy; in that case when Jenkins restarts, it's usually necessary to restart Apache, or else it gets stuck thinking the proxy endpoint is down.</li> <li>I ran into a bug with Jenkins stopping its shell script commands arbitrarily before the end. I worked around it by moving the whole job to a shell script on the Jenkins server (which in turn delegates to a script on the test/dev server).</li> </ol> <p>There is a pending <a href="">pull request</a> to pull some of the fixes and changes I made back into the original repo. In the pull request I've tried to separate what are merely fixes from what goes with the different test-site approach I've taken, but it's still a tricky merge. Feel free to help there, or make your own fork with a separate test site for D7.</p> <p>I now have a working test environment with PHPUnit and upal, with all of the tests I wrote months ago working again (minus their enormous <code>setUp()</code> functions), and I've started writing tests for new code going forward. Success!</p> <p><em>(If you are looking for a professional implementation of any of the above, please <a href="">contact me</a>.)</em></p> <p><em>Recent related post: <a href="">Making sense of Varnish caching rules</a></em></p> drupal jenkins phpunit simpletest Wed, 14 Dec 2011 21:53:31 +0000 ben 7480 at Making sense of Varnish caching rules <p><a href="">Varnish</a> is a reverse-proxy cache that allows a site with a heavy backend (such as a Drupal site) and mostly consistent content to handle very high traffic load. The &#8220;cache&#8221; part refers to Varnish storing the entire output of a page in its memory, and the &#8220;reverse proxy&#8221; part means it functions as its own server, sitting in front of Apache and passing requests back to Apache only when necessary.</p> <p>One of the challenges with implementing Varnish, however, is the complex &#8220;VCL&#8221; protocol it uses to process requests with custom logic. The <a href="">syntax</a> is unusual, the <a href="">documentation</a> relies heavily on complex examples, and there don&#8217;t seem to be any books or other comprehensive resources on the software. A recent link on the project site to <a href="">Varnish Training</a> is just a pitch for a paid course. Searching more specifically for Drupal + Varnish will bring up many good results - including <a href="">Lullabot&#8217;s fantastic tutorial</a> from April, and older examples for Mercury - but the latest stable release is now 3.x and many of the examples (written for 2.x) <a href="">don&#8217;t work</a> as written anymore. So it takes a lot of trial and error to get it all working.</p> <p>I&#8217;ve been running Varnish on <a href=""></a>, partly to keep our hosting costs down by getting more power out of less [virtual] hardware. A side benefit is the site&#8217;s ability to respond very nicely if the backend Apache server ever goes down. They&#8217;re on separate VPS's (connected via internal private networking), and if the Apache server completely explodes from memory overload, or I simply need to upgrade a server-related package, Varnish will display a themed &#8220;We&#8217;re down for a little while&#8221; message.</p> <p>But it wasn&#8217;t until recently that I got Varnish&#8217;s primary function, caching, really tuned. I spent several days under the hood recently, and while I don&#8217;t want to rehash what&#8217;s already been well covered in <a href="">Lullabot&#8217;s tutorial</a>, here are some other things I learned:</p> <h3>Check syntax before restarting</h3> <p>After you update your VCL, you need to restart Varnish - using <span class="geshifilter"><code class="text geshifilter-text">sudo /etc/init.d/varnish restart</code></span> for instance - for the changes to take effect. If you have a syntax error, however, this will take down your site. So check the syntax first (change the path to your VCL as needed):<br /> <span class="geshifilter"><code class="text geshifilter-text">varnishd -C -f /etc/varnish/default.vcl &gt; /dev/null</code></span></p> <p>If there are errors, it will display them; if not, it shows nothing. Use that as a visual check before restarting. (Unfortunately the exit code of that command is always 0, so you can&#8217;t do check-then-restart as simply as <span class="geshifilter"><code class="text geshifilter-text">check-varnish-syntax &amp;&amp; /etc/init.d/varnish restart</code></span>, but you could <span class="geshifilter"><code class="text geshifilter-text">grep</code></span> the output for the words &#8220;exit 1&#8221; to accomplish the same.)</p> <h3>Logging</h3> <p>The <span class="geshifilter"><code class="text geshifilter-text">std.log</code></span> function allows you to generate arbitrary messages about Varnish&#8217;s processing. Add <span class="geshifilter"><code class="text geshifilter-text">import std;</code></span> at the top of your VCL file, and then <span class="geshifilter"><code class="text geshifilter-text">std.log(&quot;DEV: some useful message&quot;)</code></span> anywhere you want. The &#8220;DEV&#8221; prefix is an arbitrary way of differentiating your logs from all the others. So you can then run in the shell, <span class="geshifilter"><code class="text geshifilter-text">varnishlog | grep &quot;DEV&quot;</code></span> and watch only the information you&#8217;ve chosen to see.</p> <p>How I use this:<br /> - At the top of <span class="geshifilter"><code class="text geshifilter-text">vcl_recv()</code></span> I put <span class="geshifilter"><code class="text geshifilter-text">std.log(&quot;DEV: Request to URL: &quot; + req.url);</code></span>, to put all the other logs in context.<br /> - When I <span class="geshifilter"><code class="text geshifilter-text">pipe</code></span> back to apache, I put <span class="geshifilter"><code class="text geshifilter-text">std.log(&quot;DEV: piping &quot; + req.url + &quot; straight back to apache&quot;);</code></span> before the <span class="geshifilter"><code class="text geshifilter-text">return (pipe);</code></span><br /> - On blocked URLs (cron, install), the same<br /> - On static files (images, JS, CSS), I put <span class="geshifilter"><code class="text geshifilter-text">std.log(&quot;DEV: Always caching &quot; + req.url);</code></span><br /> - To understand all the regex madness going on with cookies, I log <span class="geshifilter"><code class="text geshifilter-text">req.http.Cookie</code></span> at every step to see what&#8217;s changed.</p> <p>Plug some of these in, check the syntax, restart Varnish, run <span class="geshifilter"><code class="text geshifilter-text">varnishlog|grep PREFIX</code></span> as above, and watch as you hit a bunch of URLs in your browser. Varnish&#8217;s internal logic will quickly start making more sense.</p> <h3>Watch Varnish work with your browser</h3> <p><img src="" alt="Varnish headers in Chrome Inspector" style="float: right; margin: 0 0 1em 1em;"><br /> The Chrome/Safari Inspector and Firebug show the headers for every request made on a page. With Varnish running, look at the Response Headers for one of them: you&#8217;ll see &#8220;Via: Varnish&#8221; if the page was processed through Varnish, or &#8220;Server:Apache&#8221; if it went through Apache. (Using Chrome, for instance, login to your Drupal site and the page should load via Apache (assuming you see page elements not available to anonymous users), then open an Incognito window and it should run through Varnish.)</p> <h3>Add hit/miss headers</h3> <ul> <li>When a page is supposed to be cached (not <span class="geshifilter"><code class="text geshifilter-text">pipe</code></span>'d immediately), Varnish checks if there is an existing hit or miss. To watch this in your Inspector, use this logic:</li> </ul> <div class="geshifilter"> <pre class="text geshifilter-text">sub vcl_deliver { std.log(&quot;DEV: Hits on &quot; + req.url + &quot;: &quot; + obj.hits); &nbsp; if (obj.hits &gt; 0) { set resp.http.X-Varnish-Cache = &quot;HIT&quot;; } else { set resp.http.X-Varnish-Cache = &quot;MISS&quot;; } &nbsp; return (deliver); }</pre></div> <p>Then you can clear the caches, hit a page (using the browser technique above), see &#8220;via Varnish&#8221; and a MISS, hit it again, see a HIT (or not), and know if everything is working.</p> <h3>Clear Varnish when aggregated CSS+JS are rebuilt</h3> <p>If you have CSS/JS aggregation enabled (as recommended), your HTML source will reference long hash-string files. Varnish caches that HTML with the hash string. If you clear only those caches (&#8220;requisites&#8221; via Admin Menu or <span class="geshifilter"><code class="text geshifilter-text">cc css+js</code></span> via Drush), Varnish will still have the <em>old</em> references, but the files will have been deleted. Not good. You could simply never use that operation again, but that&#8217;s a little silly.</p> <p>The heavy-handed solution I came up with (I welcome alternatives) is to wipe the Varnish cache when CSS+JS resets. That operation is not hook-able, however, so you have to patch core. In common.js, <span class="geshifilter"><code class="text geshifilter-text">_drupal_flush_css_js()</code></span>, add:</p> <p> <div class="geshifilter"> <pre class="text geshifilter-text">if (module_exists('varnish') &amp;&amp; function_exists('varnish_purge_all_pages')) { varnish_purge_all_pages(); }</pre></div> </p> <p>This still keeps Memcache and other in-Drupal caches intact, avoiding an unnecessary &#8220;clear all caches&#8221; operation, but makes sure Varnish doesn&#8217;t point to dead files. (You could take it a step further and purge only URLs that are Drupal-generated and not static; if you figure out the regex for that, please share.)</p> <h3>Per-page cookie logic</h3> <p>On <a href=""></a> we have a cookie that remembers the last location you searched, which makes for a nicer UX. That cookie gets added to Varnish&#8217;s page &#8220;hash&#8221; and (correctly) bypasses the cache on pages that take that cookie into account. The cookie is not relevant to the rest of the site, however, so it should be ignored in those cases. How to handle this?</p> <p>There are <a href="">two ways</a> to handle cookies in Varnish: strip cookies you know you don&#8217;t want, as in this old <a href="">Pressflow example</a>, or leave only the cookies you know you <em>do</em> want, as in Lullabot&#8217;s <a href="">example</a>. Each strategy has its pros and cons and works on its own, but it&#8217;s not advisable to <em>combine</em> them. I&#8217;m using Lullabot&#8217;s technique on this site, so to deal with the location cookie, I use <span class="geshifilter"><code class="text geshifilter-text">if-else</code></span> logic: if the cookie is available but <em>not</em> needed (determined by regex like <span class="geshifilter"><code class="text geshifilter-text">req.url !~ &quot;PATTERN&quot; || ...</code></span>), then strip it; otherwise keep it. If the cookie logic you need is more varied but still linear, you could create a series of <span class="geshifilter"><code class="text geshifilter-text">elsif</code></span> statements to handle all the use cases. (Just make sure to roast a huge pot of coffee first.)</p> <h3>Useful add-ons to varnish.module</h3> <ul> <li>Added <span class="geshifilter"><code class="text geshifilter-text">watchdog('varnish', ...)</code></span> commands in varnish.module on cache-clearing operations, so I could look at the logs and spot problems.</li> <li>Added a block to varnish.module with a &#8220;Purge this page&#8221; button for each URL, shown only for admins. (I saw this in an Akamai module and it made a lot of sense to copy. I&#8217;d be happy to post a patch if others want this.)</li> <li>The <a href="">Expire</a> offers plug-n-play intelligence to selectively clear Varnish URLs only when necessary (clearing a landing page of blog posts only if a blog post is modified, for example.) Much better than the default behavior of aggressive clearing &#8220;just in case&#8221;.</li> </ul> <p>I hope this helps people adopt Varnish. I am also available via my consulting business <a href="">New Leaf Digital</a> for paid implementation, strategic advice, or support for Varnish-aided sites.</p> antiquesnearme drupal varnish Mon, 12 Dec 2011 20:52:31 +0000 ben 7477 at Parse Drupal watchdog logs in syslog (using node.js script) <p>Drupal has the option of outputting its <a href="">watchdog</a> logs to syslog, the file-based core Unix logging mechanism. The log in most cases lives at /var/log/messages, and Drupal's logs get mixed in with all the others, so you need to <code>cat /var/log/messages | grep drupal</code> to filter.</p> <p>But then you still have a big text file that's hard to parse. This is probably a "solved problem" many times over, but recently I had to parse the file specifically for 404'd URLs, and decided to do it (partly out of convenience but mostly to learn how) using <a href="">Node.js</a> (as a scripting language). Javascript is much easier than Bash at simple text parsing.</p> <p>I put the code in a <a href="">Gist, <em>node.js script to parse Drupal logs in linux syslog (and find distinct 404'd URLs)</em></a>. The last few lines of URL filtering can be changed to any other specific use case you might have for reading the logs out of syslog. (This could also be used for reading non-Drupal syslogs, but the mapping applies keys like "URL" which wouldn't apply then.)</p> <p>Note the comment at the top: to run it you'll need node.js and 2 NPM modules as dependencies. Then take your filtered log (using the <code>greg</code> method above) and pass it as a parameter, and read the output on screen or output with <code>&gt;</code> to another file.</p> drupal linux node.js Tue, 29 Nov 2011 18:06:33 +0000 ben 7422 at Migrating a static grid to a responsive, semantic grid with LessCSS <p>The layout of <a href="">Antiques Near Me</a> (a startup I co-founded) has long been built using the sturdy <a href="" style="font-weight:bold"></a> grid system (implemented in Drupal 6 using the <a href="">Clean</a> base theme). Grids are very helpful: They allow layouts to be created quickly; they allow elements to be fit into layouts easily; they keep dimensions consistent; they look clean. But they have a major drawback that always bothered me: the <span class="geshifilter"><code class="text geshifilter-text">grid-X</code></span> classes that determine an element's width are in the HTML. That mixes up markup/content and layout/style, which should ideally be completely separated between the HTML and CSS.</p> <p>The rigidity of an in-markup grid becomes especially apparent when trying to implement <span style="font-weight:bold">&quot;responsive&quot; design</span> principles. I'm not a designer, but the basic idea of responsive design for the web, as I understand it, is that a site's layout should adapt automagically to the device it's viewed in. For a nice mobile experience, for example, rather than create a separate mobile site - which I always thought was a poor use of resources, duplicating the content-generating backend - the same HTML can be used with <span style="font-weight:bold"><span class="geshifilter"><code class="text geshifilter-text">@media</code></span> queries</span> in the CSS to make the layout look "native".</p> <p>(I've put together some <a href="" style="font-weight:bold">useful links on Responsive Design and @media queries</a> using Delicious. The best implementation of a responsive layout that I've seen is on the site of <a href="">FourKitchens</a>.)</p> <p>Besides the 960 grid, I was using <a href="" style="font-weight:bold">LessCSS</a> to generate my styles: it supports variables, mix-ins, nested styles, etc; it generally makes stylesheet coding much more intuitive. So for a while the thought simmered, why not move the static 960 grid into Less (using mixins), and apply the equivalent of <span class="geshifilter"><code class="text geshifilter-text">grid-X</code></span> classes directly in the CSS? Then I read this article in Smashing on <a href="">The Semantic Grid System</a>, which prescribed pretty much the same thing - using Less with a library called <a href="" style="font-weight:bold"></a> - and I realized it was time to actually make it happen.</p> <p>To make the transition, I <a href="">forked</a> and made some modifications: I added .alpha and .omega mixins (to cancel out side margins); for nested styles, I ditched's <span class="geshifilter"><code class="text geshifilter-text">.row()</code></span> approach (which seems to be buggy anyway) and created a <span class="geshifilter"><code class="text geshifilter-text">.nested-column</code></span> mixin instead. I added <span class="geshifilter"><code class="css geshifilter-css"><span class="kw1">clear</span><span class="sy0">:</span><span class="kw2">both</span></code></span> to the <span class="geshifilter"><code class="text geshifilter-text">.clearfix</code></span> mixin (seemed to make sense, though maybe there was a reason it wasn't already in).</p> <p>To maintain the dimensions and classes (as an intermediary step), I made a transitional-960gs.less stylesheet with these rules: <span class="geshifilter"><code class="css geshifilter-css"><span class="co1">@columns: 16; @column-width: 40; @gutter-width: 20;</span></code></span>. Then I made equivalents of the .grid_X classes (as <a href="">Clean</a>'s implementation had them) with an <span class="geshifilter"><code class="text geshifilter-text">s_</code></span> prefix:</p> <div class="geshifilter"> <pre class="css geshifilter-css"><span class="re1">.s_container</span><span class="sy0">,</span> <span class="re1">.s_container_16</span> <span class="br0">&#123;</span> <span class="kw1">margin-left</span><span class="sy0">:</span> <span class="kw2">auto</span><span class="sy0">;</span> <span class="kw1">margin-right</span><span class="sy0">:</span> <span class="kw2">auto</span><span class="sy0">;</span> <span class="kw1">width</span><span class="sy0">:</span> <span class="co1">@total-width;</span> .clearfix<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span> <span class="re1">.s_grid_1</span> <span class="br0">&#123;</span> .column<span class="br0">&#40;</span><span class="nu0">1</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span> <span class="re1">.s_grid_2</span> <span class="br0">&#123;</span> .column<span class="br0">&#40;</span><span class="nu0">2</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span> ... <span class="re1">.s_grid_16</span> <span class="br0">&#123;</span> .column<span class="br0">&#40;</span><span class="nu0">16</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span></pre></div> <p>The <span class="geshifilter"><code class="text geshifilter-text">s_grid_X</code></span> classes were purely transitional: they allowed me to do a search-and-replace from grid_ to s_grid_ and remove the stylesheet, before migrating all the styles into semantic equivalents. Once that was done, the s_grid_ classes could be removed.</p> <p> and also implement their columns a little differently, one with padding and the other with margins, so what was actually a 1000px-wide layout with became a 960px layout with To compensate for this, I made a wrapper mixin applied to all the top-level wrappers:</p> <div class="geshifilter"> <pre class="css geshifilter-css"><span class="re1">.wide-wrapper</span> <span class="br0">&#123;</span> .s_container<span class="sy0">;</span> <span class="kw1">padding-right</span><span class="sy0">:</span> <span class="re3">20px</span><span class="sy0">;</span> <span class="kw1">padding-left</span><span class="sy0">:</span> <span class="re3">20px</span><span class="sy0">;</span> .clearfix<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span></pre></div> <p>With the groundwork laid, I went through all the grid_/s_grid_ classes in use and replaced them with purely in-CSS semantic mixins. So if a block had a grid class before, now it only had a semantic ID or class, with the grid mixins applied to that selector.</p> <p>Once the primary layout was replicated, I could make it "respond" to @media queries, using a responsive.less sheet. For example:</p> <div class="geshifilter"> <pre class="css geshifilter-css"><span class="coMULTI">/* iPad in portrait, or any screen below 1000px */</span> <span class="co1">@media only screen and (max-device-width: 1024px) and (orientation: portrait), screen and (max-width: 999px) {</span> ... <span class="br0">&#125;</span> &nbsp; <span class="coMULTI">/* very narrow browser, or iPhone -- note that &lt;1000px styles above will apply here too! note: iPhone in portrait is 320px wide, in landscape is 480px wide */</span> <span class="co1">@media only screen and (max-device-width: 480px), only screen and (-webkit-min-device-pixel-ratio: 2), screen and (max-width: 499px) {</span> ... <span class="br0">&#125;</span> &nbsp; <span class="coMULTI">/* iPhone - portrait */</span> <span class="co1">@media only screen and (max-device-width: 480px) and (max-width: 320px) {</span> ... <span class="br0">&#125;</span></pre></div> <p>Some vitals tools for the process:</p> <ul> <li><a href=""></a> (for Mac), or even better, the new <a href="">CodeKit</a> by the same author compiles and minifies the Less files instantly, so the HTML can refer to normal CSS files.</li> <li>The iOS Simulator (part of XCode) and Android Emulator (with the Android SDK), to simulate how your responsive styles work on different devices. (Getting these set up is a project in itself).</li> <li>To understand what various screen dimensions looked like, I added a simple viewport debugger to show the screen size in the corner of the page (written as a Drupal6/jQuery document-ready "behavior"; fills a #viewport-size element put separately in the template):<br /> <div class="geshifilter"> <pre class="javascript geshifilter-javascript">Drupal.<span class="me1">behaviors</span>.<span class="me1">viewportSize</span> <span class="sy0">=</span> <span class="kw2">function</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span> <span class="kw1">if</span> <span class="br0">&#40;</span><span class="sy0">!</span>$<span class="br0">&#40;</span><span class="st0">'#viewport-size'</span><span class="br0">&#41;</span>.<span class="me1">size</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="kw1">return</span><span class="sy0">;</span> &nbsp; Drupal.<span class="me1">fillViewportSize</span> <span class="sy0">=</span> <span class="kw2">function</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span> $<span class="br0">&#40;</span><span class="st0">'#viewport-size'</span><span class="br0">&#41;</span>.<span class="me1">text</span><span class="br0">&#40;</span> $<span class="br0">&#40;</span>window<span class="br0">&#41;</span>.<span class="me1">width</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="sy0">+</span> <span class="st0">'x'</span> <span class="sy0">+</span> $<span class="br0">&#40;</span>window<span class="br0">&#41;</span>.<span class="me1">height</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#41;</span> .<span class="me1">css</span><span class="br0">&#40;</span><span class="st0">'top'</span><span class="sy0">,</span> $<span class="br0">&#40;</span><span class="st0">'#admin-menu'</span><span class="br0">&#41;</span>.<span class="me1">height</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span><span class="sy0">;</span> Drupal.<span class="me1">fillViewportSize</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span> $<span class="br0">&#40;</span>window<span class="br0">&#41;</span>.<span class="me1">bind</span><span class="br0">&#40;</span><span class="st0">'resize'</span><span class="sy0">,</span> <span class="kw2">function</span><span class="br0">&#40;</span>event<span class="br0">&#41;</span><span class="br0">&#123;</span> Drupal.<span class="me1">fillViewportSize</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span><span class="sy0">;</span></pre></div> </li> </ul> <p>After three days of work, the layout is now entirely semantic, and the stylesheet is gone. On a wide-screen monitor it looks exactly the same as before, but it now adapts to narrower screen sizes (you can see this by shrinking the window's width), and has special styles for iPad and iPhone (portrait and landscape), and was confirmed to work on a popular Android tablet. It'll be a continuing work in progress, but the experience is now much better on small devices, and the groundwork is laid for future tweaks or redesigns.</p> <p>There are some <strong>downsides</strong> to this approach worth considering:</p> <ul> <li>Mobile devices still load the full CSS and HTML needed for the "desktop" layout, even if not all the elements are shown. This is a problem for performance.</li> <li>The stylesheets are enormous with all the mixins, compounding the previous issue. I haven't examined in depth how much of a problem this actually is, but I'll need to at some point.</li> <li>The contents of the page can only change as much as the <em>stylesheets</em> allow. The <em>order</em> of elements can't change (unless their visible order can be manipulated with CSS floats).</li> </ul> <p>To mitigate these and modify the actual content on mobile devices - to reduce the performance overhead, load smaller images, or put less HTML on the page - would probably require backend modifications that detect the user agent (perhaps using <a href="">Browscap</a>). I've been avoiding that approach until now, but with most of the work done on the CSS side, a hybrid backend solution is probably the next logical step. (For the images, <a href="">Responsive Images</a> could also help on the client side.)</p> <p>See the <a href="">new layout</a> at work, and my <a href="">links on responsive design</a>. I'm curious to hear what other people do to solve these issues.</p> <p><em>Added:</em> It appears the javascript analog to media queries is <a href="">media query lists</a>, which are event-able. And here's an approach with <a href="">media queries and CSS transition events</a>.</p> css drupal lesscss responsive-design Fri, 25 Nov 2011 16:59:38 +0000 ben 7407 at Generate pager (previous/next) links for an old Blogspot blog using Node.js <p>I have an <a href="">old Blogspot blog</a> that still gets a lot of traffic, but it was very hard to navigate without links from post to post. The template uses an old version of their templating language, and doesn&#8217;t have any tags available to generate pager links within Blogger.</p> <p>So I wrote a <a href="">node</a> app called <a href="">Blogger Pager</a> (code on <a href="">Github</a>) to generate the links, loaded client-side via AJAX.</p> <p><br/></p> <h3>How it works</h3> <ol> <li>Export your blog from Blogspot using the Export functionality. You&#8217;ll get a big XML file.</li> <li>Check out this code on a server with <a href="">node.js</a> installed.</li> <li>Put the exported XML file into the root of this app, as blog-export.xml; or change the path in app.js.</li> <li>Run the app (<span class="geshifilter"><code class="text geshifilter-text">node app.js</code></span>, or with <a href="">forever</a>).</li> <li>The module in posts.js will parse the XML file and generate an in-memory array of all the post URLs and titles. (Uses the <a href="">xml2js</a> library, after trying 3 others that didn&#8217;t work as well/easily.)</li> <li>The module in server.js will respond to HTTP requests (by default on port 3003, set in server.js): <ul> <li><em>/pager</em> handles JSONP requests with a <em>?url</em> parameter, returning a JSON object of the surrounding posts.</li> <li><em>/posts</em> returns an HTML page of all the parsed posts.</li> </ul> </li> <li>The client-side script depends on <a href="">jQuery</a>, so make sure your blog template is loading that: <ul> <li>e.g. <span class="geshifilter"><code class="text geshifilter-text">&amp;lt;script src='//' /&amp;gt;</code></span></li> </ul> </li> <li>In your blog template, load the client-side script in this app, exposed at <em>/js/blog-pager-client.js</em>.</li> <li>Change the URL (<span class="geshifilter"><code class="text geshifilter-text">var url</code></span>&#8230;) in the client-side script to the URL of your node app.</li> <li>Save the template, load a post page. (To debug, comment out the <span class="geshifilter"><code class="text geshifilter-text">return</code></span> in <span class="geshifilter"><code class="text geshifilter-text">bloggerPagerLog()</code></span> and open the browser console.)</li> <li>Customize the generated HTML in the client-side <span class="geshifilter"><code class="text geshifilter-text">addPagerForPost()</code></span> function or style with CSS. </li> </ol> <p><br/></p> <h3>Known Limitations</h3> <ol> <li>Only works with a blog <em>export</em>; if your blog is still getting new content, this won&#8217;t read the RSS. </li> </ol> <p><br/></p> <p>Enjoy!<br /> <a href="" title=""></a></p> node.js Fri, 25 Nov 2011 16:30:52 +0000 ben 7406 at Tilt, 3D Dom Inspector for Firefox HTML pages appear in the browser in 2 dimensions, but there are actually 2 additional dimensions to the DOM: the hierarchy of elements, and the z-index. A new DOM inspector for Firefox called Tilt displays the DOM in 3 dimensions, showing the hierarchy of elements (not sure about the z-index). This is what the homepage of <a href=""></a> looks like in 3D. Pretty cool. <br/><br/> <a href=""><img src="" width="300" /></a> antiquesnearme Mon, 31 Oct 2011 00:21:06 +0000 ben 7350 at Great list of new dev tools from Smashing Magazine <p>Here's a really <a href="">superb list</a> of "coding tools and javascript libraries for web developers" from Smashing Magazine. Some of the ones I'll be playing with in the next few days:</p> <ul> <li><p><a href="">Tilt</a>, Firefox DOM inspection in <em>3D</em>.</p></li> <li><p><a href="">Money.js</a>, an open source API and JS library for currency exchange rates (the code on this will also be good for improving my Node.js techniques)</p></li> <li><p><a href="">Bootstrap development toolkit</a></p></li> <li><p><a href="">IE VMs</a> - might cover more versions of the evil browser than the VirtualBox I use now.</p></li> <li><p>a bunch of remote debuggers for mobile, haven't tried them yet</p></li> <li><p><a href="">Less App</a> - I've been using this one for a long time</p></li> <li><p><a href="">Has.js</a> - test your JS environment for available constructs.</p></li> </ul> <p>and a whole bunch of others. After reading that post, I also added Smashing Magazine to my RSS reader.</p> tools Sun, 30 Oct 2011 23:42:43 +0000 ben 7348 at Exploring the node.js frontier <p>I have spent much of the last few weeks learning and coding in <a href="">Node.js</a>, and I'd like to share some of my impressions and lessons-learned for others starting out. If you're not familiar yet, Node.js is a framework for building server-side applications with asynchronous javascript. It's only two years old, but already has a <a href="">vast ecosystem</a> of plug-in "modules" and higher-level frameworks built on top of it.</p> <p>My first application is a simple web app for learning Spanish using flashcards. The <a href="">code is open on Github</a>. The app utilizes basic CRUD (Create-Retrieve-Update-Delete) functionality (of "Words" in this case), form handling, authentication, input validation, and an end-user interface - i.e. the basic components of a web app. I'm using <a href="">MongoDB</a> for the database and <a href="">Express.js</a> (which sites on top of <a href="">Connect</a>, on top of Node) as the web framework. For templating I learned <a href="">Jade</a>, and for easier CSS I'm using <a href="">LessCSS</a>.</p> <p>In the process of building it, I encountered numerous challenges and questions, some solved and many still open; found some great resources; and started to train my brain to think of server-side code asynchronously.</p> <h3>Node is a blank slate</h3> <p>Node "out of the box" isn't a web server like Apache; it's more of a language, like Ruby. You start with a blank slate, on top of which you can code a daemon, an IRC server, a process manager, or a blog - there's no automatic handling of virtualhosts, requests, responses, webroots, or any of the components that a LAMP stack (for example) assumes you want. The node community is building infrastructural components that can be dropped in, and I expect that the more I delve into the ecosystem, the more familiar I'll become with those components. At its core, however, Node is simply an API for asynchronous I/O methods.</p> <h3>No more linear flow</h3> <p>I'm used to coding in PHP, which involves linear instructions, each of them "blocking." Take this linear pseudocode snippet for CRUD operations on a "word" object, for example:</p> <div class="geshifilter"> <pre class="text geshifilter-text">if (new word) { render an empty form } else if (editing existing word) { load the word populate the form render the form } else if (deleting existing word) { delete the word redirect back to list }</pre></div> <p>This is easy to do with "blocking" code. Functions <em>return</em> values, discrete input-output functions can be reused in multiple situations, the returned values can be evaluated, each step follows from the previous one. This is convenient but limits performance: in a high-traffic PHP-MySql application, this flow takes up a server process, and if the database is responding slowly under the load, the whole process waits; concurrent processes quickly hog all the server's memory, and a bottleneck in one part of the stack stalls the whole application. In node, the rest of the operations in the "event loop" continue to run, waiting patiently for the database (or any other I/O) callback to respond. </p> <p>Coding that way is not so easy, however. If you try to load the word, for instance, you run the query with an asynchronous callback. There is no <em>return</em> statement on the query function. The rest of the code has be nested inside that callback, or else the code will keep running and will never get the response. So that bit would look more like this:</p> <div class="geshifilter"> <pre class="text geshifilter-text">load the word ( function(word) { populate the form render the form });</pre></div> <p>But deeply nested code isn't as intuitive as linear code, and it can make function portability very difficult. Suppose you have to run 10 database queries to populate the form - nesting them all inside each other gets very messy, and what if the logic needs to be more conditional, requiring a different nesting order in different cases?</p> <p>There are ways of handling these problems, of course, but I'm just starting to learn them. In the case of the simple "load the word" scenario, Express offers the <span class="geshifilter"><code class="text geshifilter-text">app.param</code></span> construct, which parses parameters in the URL before executing the route callback. So the <span class="geshifilter"><code class="text geshifilter-text">:word</code></span> token tells the app to load a word with a given ID into the <strong>request</strong> object, then it renders the form.</p> <h3>No more ignoring POST and GET</h3> <p>In PHP, if there's a form on a page, the same piece of code processes the page whether its HTTP method is POST or GET. The <span class="geshifilter"><code class="text geshifilter-text">$_REQUEST</code></span> array even combines their parameters. Express doesn't like that, however - there is an <span class="geshifilter"><code class="text geshifilter-text">app.all()</code></span> construct that ignores the method, but the framework seems to prefer separate <span class="geshifilter"><code class="text geshifilter-text">app.get()</code></span> and <span class="geshifilter"><code class="text geshifilter-text"></code></span> routing. (There's apparently some <a href="">controversy/confusion</a> over the additional method PUT, but I steered clear of that for now.)</p> <p>Back to the "word form" scenario: I load the form with GET, but submit the form with POST. That's two routes with essentially duplicate code. I could simply save an entry on POST, or render the form with GET - but what if I want to validate the form, then it needs to render the form when a POST fails validation - so it quickly becomes a mess. Express offers some solutions for this - <span class="geshifilter"><code class="text geshifilter-text">res.redirect('back')</code></span> goes back to the previous URL - but that seems like a hack that doesn't suit every situation. You can see how I handled this <a href="">here</a>, but I haven't yet figured out the best general approach to this problem.</p> <h3>New code needs a restart</h3> <p>In a PHP application, you can edit or deploy the code directly to the webroot, and as soon as it's saved, the next request uses it. With node, however, the javascript is loaded into memory when the app is run using the <span class="geshifilter"><code class="text geshifilter-text">node</code></span> command, and it runs the same code until the application is restarted. In its simplest use, this involves a Ctrl+C to stop and <span class="geshifilter"><code class="text geshifilter-text">node app.js</code></span> to restart. There are several pitfalls here:</p> <ul> <li>Sessions (and any other in-app memory items) are lost every time you restart. So anyone using your app is suddenly logged out. For sessions, this is resolved with a database or other external session store; I can imagine other scenarios where this would be more challenging.</li> <li>An uncaught runtime bug can crash the app, and if it's running autonomously on a server, there's nothing built-in to keep it running. One approach to this is a process manager; I'm using <a href="">forever</a>, which was built especially for node, to keep processes running and restart them easily when I deploy new code. Others have built tools within Node that abstract an individual app's process through a separate process-managing app.</li> </ul> <h3>When should the database connect?</h3> <p>Node's architectural philosophy suggests that nothing should be loaded until it's needed. A database connection might not be needed on an empty form, for instance - so it makes sense to open a database connection per request, and only when needed. I tried this approach first, using a "<a href="">route middleware</a>" function to connect on certain requests, and separated the database handling into its own module. That failed when I wanted to keep track of session IDs with MongoDB (using <a href="">connect-mongo</a>) - because a database connection is then needed on every request, and the examples all opened a connection at the top of the app, in the global scope. I switched to the latter approach, but I'm not sure which way is better.</p> <h3>Javascript can get very complicated</h3> <ul> <li>As logic flows through nested callbacks, <strong>variable scope</strong> is constantly changing. <span class="geshifilter"><code class="text geshifilter-text">var</code></span> and <span class="geshifilter"><code class="text geshifilter-text">this</code></span> have to be watched very carefully.</li> <li>Writing functions that work portably across use cases without simple <span class="geshifilter"><code class="text geshifilter-text">return</code></span> statements is tricky. (One nice Node convention that covers many of these scenarios is the <span class="geshifilter"><code class="text geshifilter-text">callback(error, result)</code></span> concept, allowing calling functions to know if the result came back successfully in a standard way.)</li> <li>Passing logic flow across node's "modules" is also tricky. Closures are helpful here, passing the <span class="geshifilter"><code class="text geshifilter-text">app</code></span> object to route modules, for instance. But in many cases, it wasn't clear how to divide the code in a way that was simultaneously logical, preserved variable scope, and worked portably with callbacks.</li> <li>Everything - functions, arrays, classes - is an object. Class inheritance is done by instantiating another class/object and then modifying the new object's <span class="geshifilter"><code class="text geshifilter-text">prototype</code></span>. The same object can have the equivalent of "static" functions (by assigning them directly to the object) or instantiated methods (by assigning them to <span class="geshifilter"><code class="text geshifilter-text">prototype</code></span>). It's easy to get confused.</li> <li>Javascript is a little clunky with handling empty values. The standard approach still seems to be <span class="geshifilter"><code class="javascript geshifilter-javascript"><span class="kw1">if</span> <span class="br0">&#40;</span><span class="kw1">typeof</span> x <span class="sy0">==</span> <span class="st0">&quot;undefined&quot;</span><span class="br0">&#41;</span></code></span> which, at the very least, is a lot of code to express <span class="geshifilter"><code class="javascript geshifilter-javascript"><span class="kw1">if</span> <span class="br0">&#40;</span>x<span class="br0">&#41;</span></code></span>. I used <a href="">Underscore.js</a> to help with this and other basic object manipulation shortcuts.</li> <li>Because Express processes the request until there's a clear end to the response, and because everything is asynchronous, it's easy to miss a scenario in the flow where something unpredictable happens, no response is sent, and the client/user's browser simply hangs waiting for a response. I don't know if this is bad on the node side - the hanging request probably uses very little resources, since it's not actively doing anything - but it means the code has to handle a lot of possible error scenarios. Unlike in blocking code, you can't just put a catch-all <span class="geshifilter"><code class="text geshifilter-text">else</code></span> at the end of the flow to handle the unknown.</li> </ul> <h3>What my Flashcards app does now</h3> <p>The Spanish Flashcards app currently allows words (with English, Spanish, and part of speech) to be entered, shown in a list, put into groups, and randomly cycled with only one side shown, as a flashcard quiz.<br /> The app also integrates with the <a href="">WordReference API</a> to lookup a new word and enter it - however, as of now, there's a bug in the English-Spanish API that prevents definitions from being returned. So I tested it using the English-French dictionary, and hope they'll fix the Spanish one soon.<br /> It's built now to require login, with a single password set in a plain-text config.js file.</p> <h3>Next Steps for the app</h3> <p>I'd like to build out the flashcard game piece, so it remembers what words have been played, lets the player indicate if he got the answer right or wrong, and prioritizes previously-wrong or unseen words over ones that the player already knows.</p> <h3>Where I want to go with Node.js</h3> <p>I've been working primarily with Drupal for several years, and I want to diversify for a number of reasons: I've become very <a href="">frustrated</a> with the direction of Drupal core development, and don't want all my eggs in that basket. Web applications are increasingly requiring real-time, high-concurrency, noSQL infrastructure, which Node is well-suited for and my LAMP/Drupal skillset is not. And maybe most importantly, I find the whole architecture of Node to be fascinating and exciting, and the open-source ecosystem around it is growing organically, extremely fast, and seemingly without top-down direction.</p> <h3>Some resources that helped me</h3> <p>(All of these and many more are in my <a href="">node.js tag on Delicious</a>.)</p> <ul> <li><a href="">Nodejitsu docs</a> - tutorials on conventions and some best practices.</li> <li><a href="">Victor Kane's node intro</a>and <a href="">Lit</a> app, which taught me a lot about CRUD best practices.</li> <li>The API documentation for <a href="">node.js</a>, <a href="">connect</a>, and <a href="">express.js</a>.</li> <li><a href="">HowToNode</a> - seems like a generally good node resource/blog.</li> <li><a href="">NPM</a>, the Node Package Manager, is critical for sharing portable components, and serves as a central directory of Node modules.</li> <li><a href="">2009 talk by Ryan Dahl</a>, the creator of Node.js, introducing the framework.</li> <li><a href="">Forms</a> and <a href="">express-form</a>, two libraries for handling form rendering and/or validation. (I tried the former and decided not to use it, but they try to simplify a very basic problem.)</li> </ul> <p>Check out the code for my <a href="">Spanish Flashcards</a> app, and if you're into Node yourself and want to learn more of it together, <a href="">drop me a line!</a></p> express.js mongodb node.js Mon, 17 Oct 2011 02:24:25 +0000 ben 7315 at Brainstorming: Building an advertising system for <p> One of our first revenue-generating features for <a href="">Antiques Near Me</a> (a startup antique sale-finding portal which I co-founded) is basic sponsorship ads, which we&#39;ve simply been calling &quot;featured listings.&quot; On the <a href="">Boston portal</a>, for example, the featured listing would be an antiques business in Boston, displayed prominently, clearly a sponsor, but organic (not a spammy banner ad).</p> <p> I&#39;ve been brainstorming how to build it, and the options span quite a range. I&#39;ll lay out some of my considerations:</p> <ul> <li> The primary stack running the site is Drupal 6 in LAMP, cached behind Varnish to prevent server-side bottlenecks. We could build the whole system in Drupal, with Drupal-based ecommerce to sell it, and render the ads as part of the page. But if advertisers want to see stats (e.g. how many impressions/clicks has their sponsorship generated), a server-side approach has no single place to track impressions.</li> <li> The ad placement logic doesn&#39;t have to be fancy - we want the sponsorships to be exclusive for a given time period - so we don&#39;t need all the fancy math of DFP or&nbsp;<a href="">OpenX</a>&nbsp;for figuring out what ad to place where. But the system will eventually need to handle variable pricing, variable time frames, potential &quot;inventory&quot; to check for availability, and other basic needs of an ad system.</li> <li> We&#39;re running AdSense ads through Google&#39;s free <a href="">DFP</a> service, so we could set up placements and ad units for each sponsor there. But that&#39;s a manual process, and we want the ad &quot;real estate&quot; to scale (eventually for each city and antiques category); so in the long-run it has to be automated. That requires DFP API integration. I&#39;ve signed up for access to that API, and the PHP library looks robust, but the approval process is opaque, and I&#39;m not sure this is the right approach.</li> <li> A hybrid Drupal-DFP approach, with flexible ad placements in DFP and client-side variables passed in from Drupal to differentiate the stats, sounds nice. But it&#39;s not clear if this is feasible; information I&#39;ve gotten from a big-biz AdOps guy suggests it&#39;s not with the free edition.</li> <li> I could build a scalable, in-house, back-end solution using Node.js and MongoDB. In theory this can handle a lot more concurrent traffic (each request being very small and quick) than Drupal/LAMP. Mongo is already in use on the site and I&#39;ve wanted to learn Node for a while. This would require learning Node well enough to deploy this comfortably; with a custom bridge between Drupal (still handling the UI and transactions) and Node. This could take a while to roll out, and adds another moving piece to an already complex stack.</li> <li> Maybe there&#39;s another 3rd party off-the-shelf service to handle this, that could be easily bridged with Drupal?</li> </ul> <p> I&#39;m curious how other sites handle similar requirements. Any ideas?</p> antiquesnearme Sun, 21 Aug 2011 21:02:07 +0000 ben 7214 at Launched: KickinKitchen.TV, a kids' cooking show <p>Yesterday we (<a href="">New Leaf Digital</a>) launched the new site for <a href="" style="font-weight:bold">KickinKitchen.TV</a>, a brilliant new kids' cooking show. The site (designed by another firm) is built on Drupal 7, and invites kids to interact with the show by sharing recipes, photos and videos, entering contests, and spreading the word about the show and healthy eating on social networks.</p> <p>On the technical side, the site involved a robust content architecture, access rules allowing editors and users to share content creation, COPPA compliance for the young audience, <a href="">new Mailchimp-Profile2 integration</a> for real-time tracking of content submissions, and some fun jQuery <a href="">animations</a>.</p> <p>I got my start programming from the BASIC page in <em>3-2-1 Contact</em> magazine as a child, and I love to cook, so I think the mission of this show is vital, and I look forward to seeing the website empower those goals.</p> Tue, 16 Aug 2011 05:03:34 +0000 ben 7208 at Mac shell trick: have Growl notify you when something happens <p>Let's say you're waiting for some event to happen, and that event is detectable in your unix terminal. Examples: a DNS record to propagate (use <span class="geshifilter"><code class="text geshifilter-text">ping</code></span> and <span class="geshifilter"><code class="text geshifilter-text">grep</code></span>), a blog post to appear in an aggregator (<span class="geshifilter"><code class="text geshifilter-text">curl</code></span> and <span class="geshifilter"><code class="text geshifilter-text">grep</code></span>), etc. Instead of checking it every minute, you can have your terminal run a loop until it finds the condition you've set, and when it does, it'll notify you with <a href="">Growl</a>.</p> <p>I'll use the DNS-test example here:</p> <p>To check if is mapped to, we can run,<br /> <span class="geshifilter"><code class="bash geshifilter-bash"><span class="kw2">ping</span> <span class="re5">-c1</span> <span class="sy0">|</span> <span class="kw2">grep</span> <span class="st0">&quot;;</span>; <span class="kw3">echo</span> <span class="re4">$?</span></code></span></p> <p>Or to check for the appearance of some HTML on a web page, we can do,<br /> <span class="geshifilter"><code class="bash geshifilter-bash">curl <span class="re5">-s</span> http:<span class="sy0">//</span> <span class="sy0">|</span> <span class="kw2">grep</span> <span class="st0">&quot;some content&quot;</span>; <span class="kw3">echo</span> <span class="re4">$?</span></code></span></p> <p>The <span class="geshifilter"><code class="text geshifilter-text">$?</code></span> at the end checks the <strong>exit code</strong> - 0 being success, non-0 being error. </p> <p>Now put that in a loop, running every 30 seconds, with a growl popup when it succeeds:</p> <div class="geshifilter"> <pre class="bash geshifilter-bash"><span class="kw1">while</span> <span class="kw2">true</span>; <span class="kw1">do</span> <span class="re2">FOUND</span>=<span class="sy0">`</span><span class="kw2">ping</span> <span class="re5">-c1</span> <span class="sy0">|</span> <span class="kw2">grep</span> <span class="st0">&quot;;</span>; <span class="kw3">echo</span> <span class="re4">$?</span><span class="sy0">`</span>; <span class="kw1">if</span> <span class="br0">&#91;</span><span class="br0">&#91;</span> <span class="st0">&quot;<span class="es2">$FOUND</span>&quot;</span> <span class="re5">-eq</span> <span class="st0">&quot;0&quot;</span> <span class="br0">&#93;</span><span class="br0">&#93;</span>; <span class="kw1">then</span> growlnotify <span class="re5">-t</span> <span class="st0">&quot;Alert&quot;</span> <span class="re5">-m</span> <span class="st0">&quot;FOUND&quot;</span> <span class="re5">-s</span>; <span class="kw3">break</span>; <span class="kw1">fi</span>; <span class="kw2">sleep</span> <span class="nu0">30</span>; <span class="kw1">done</span></pre></div> <p>Or in one line,<br /> <span class="geshifilter"><code class="bash geshifilter-bash"><span class="kw1">while</span> <span class="kw2">true</span>; <span class="kw1">do</span> <span class="re2">FOUND</span>=<span class="sy0">`</span><span class="kw2">ping</span> <span class="re5">-c1</span> <span class="sy0">|</span> <span class="kw2">grep</span> <span class="st0">&quot;;</span>; <span class="kw3">echo</span> <span class="re4">$?</span><span class="sy0">`</span>; <span class="kw1">if</span> <span class="br0">&#91;</span><span class="br0">&#91;</span> <span class="st0">&quot;<span class="es2">$FOUND</span>&quot;</span> <span class="re5">-eq</span> <span class="st0">&quot;0&quot;</span> <span class="br0">&#93;</span><span class="br0">&#93;</span>; <span class="kw1">then</span> growlnotify <span class="re5">-t</span> <span class="st0">&quot;Alert&quot;</span> <span class="re5">-m</span> <span class="st0">&quot;FOUND&quot;</span> <span class="re5">-s</span>; <span class="kw3">break</span>; <span class="kw1">fi</span>; <span class="kw2">sleep</span> <span class="nu0">30</span>; <span class="kw1">done</span></code></span></p> mac shell Wed, 10 Aug 2011 21:06:34 +0000 ben 7205 at Drupal’s increasing complexity is becoming a turnoff for developers <p>I&rsquo;ve been developing custom applications with Drupal for three years, a little with 4.7 and 5, primarily with 6, and lately more with 7. Lately I&rsquo;ve become concerned with the trend in Drupal&rsquo;s code base toward increasing complexity, which I believe is becoming a danger to Drupal&rsquo;s adoption.</p> <p> In general when writing code, a solution can solve the current scenario in front of us right now, or it can try to account for future scenarios in advance. I&rsquo;ve seen this referred to as <strong>N-case or N+1 development.</strong> N-case code is efficient, but not robust; N+1 code is abstract and complex, and theoretically allows for an economy of scale, allowing more to be done with less code/work. In practice, it also shifts the burden: <strong>as non-developers want the code to accommodate more use cases, the developers write more code, with more complexity and abstraction.</strong></p> <p> Suppose you want to record a date with a form and save it to a database. You&rsquo;d need an HTML form, a timestamp (integer) field in your schema, and a few lines of code. Throw in a stable jQuery date popup widget and you have more code but not much more complexity. Or you could imagine every possible date permutation, all theoretically accessible to non-developers, and you end up with the <strong>14,673 lines in Drupal&rsquo;s <a href="">Date</a> module</strong>.</p> <p> Drupal is primarily a content management system, not simply a framework for efficient development, so it <strong>needs to account for the myriad use cases of non-developer site builders</strong>. This calls for abstracting everything into user interfaces, which takes a lot of code. However, there needs to be a countervailing force in the development process, pushing back against increasing abstraction (in the name of end-user simplicity) for the sake of preserving underlying simplicity. In other words, <strong>there is an inherent tension in Drupal (like any big software project) between keeping the UI both robust and simple, and keeping the code robust and simple </strong> - and increasingly Drupal, rather than trying to maintain a balance, has tended to sacrifice the latter.</p> <p> User interfaces are one form of abstraction; N+infinity APIs - which I&rsquo;m more concerned with - are another, which particularly increase underlying complexity. Drupal has a legacy code base built with partly outdated assumptions, and developers adding new functionality have to make a choice: <strong>rewrite the old code to be more robust but less complex, or add additional abstraction layers on top?</strong> The latter takes less time but easily creates a mess. For example: Drupal 7 tries to abstract nodes, user profiles, actions, etc into &ldquo;entities&rdquo; and attach fields to any kind of entity. Each of these still has its legacy ID, but now there is an additional layer in between tying these &ldquo;entity IDs&rdquo; to their types, and then another layer for &ldquo;bundles,&rdquo; which apply to some entity types but not others. The result from a development cycle perspective was a Drupal 7 release that, even delayed a year, lacked components of the Entity system in core (they moved to &ldquo;contrib&rdquo;). The result from a systems perspective is an architecture that has too many layers to make sense if it were built from scratch. <strong>Why not, for example, have everything be a node? </strong> Content as nodes, users as nodes, profiles as nodes, etc. The node table would need to lose legacy columns like &ldquo;sticky&rdquo; - they would become fields - and some node types like &ldquo;user&rdquo; might need fixed meanings in core. Then three structures get merged into one, and the system gets simpler without compromising flexibility.</p> <p> I recently tried to programatically use the <a href="">Activity</a> module - which used to be a simple way to record user activity - and had to &ldquo;implement&rdquo; the Entities and Trigger APIs to do it, requiring hundreds of lines of code. I gave up on that approach and instead used the elegant core module <a href="">Watchdog</a> - which, with a simple custom report pulling from the existing system, produced the <strong>same end-user effect as Activity with a tiny fraction of the code and complexity</strong>. The fact that Views doesn&rsquo;t natively generate Watchdog reports and Rules doesn&rsquo;t report to Watchdog as an action says a lot, I think, about the way Drupal has developed over the last few years.</p> <p> On a Drupal 7 site I&rsquo;m building now, I&rsquo;ve worked with the Node API, Fields API, Entities API, Form API, Activity API, Rules API, Token API... I could have also worked with the Schema, Views, Exportables, Features, and Batch APIs, and on and on. The best definition I&rsquo;ve heard for an API (I believe by Larry Garfield at Drupalcon Chicago) is &ldquo; <strong>the wall between 2 systems</strong>.&rdquo; In a very real way, rather than feeling open and flexible, Drupal&rsquo;s code base increasingly feels like it&rsquo;s erecting barriers and fighting with itself. <strong>When it&rsquo;s necessary to write so much code for so many APIs to accomplish simple tasks, the framework is no longer developer-friendly.</strong> The irony is, the premise of that same Drupalcon talk was the ways APIs create &ldquo;power and flexibility&rdquo; - but that power has come at great cost to the developer experience.</p> <p> I&rsquo;m aware of all these APIs under the hood because I&rsquo;ve seen them develop for a few years. But how is someone new to Drupal supposed to learn all this? (They could start with the <a href="">Definitive Guide to Drupal 7</a>, which sounds like a massive tome.) <strong>Greater abstraction and complexity lead to a steeper learning curve.</strong> <strong>Debugging Drupal - which requires &ldquo;wrapping your head&rdquo; around its architecture - has become a Herculean task. Good developer documentation is scarce because it takes so much time to explain something so complex.</strong></p> <p> There is a cycle: the code gets bigger and harder to understand; the bugs get more use-case-specific and harder to nail down; the issue queues get bloated; the developers have less time to devote to code quality improvement and big-picture architecture decisions. But someone wants all those use cases handled, so the code gets bigger and bigger and harder to understand... as of this writing, Drupal core has 9166 open issues, the Date module has 813, Rules has 494. Queues that big need a staff of dozens to manage effectively, and even if those resources existed, the business case for devoting them can&rsquo;t be easy to make. <strong>The challenge here is not simply in maintaining our work; it&rsquo;s in building projects from the get-go that aren&rsquo;t so complicated as to need endless maintenance.</strong></p> <p> Some other examples of excessive complexity and abstraction in Drupal 7:</p> <ul> <li> <a href=""><strong>Field Tokens</strong></a>. This worked in Drupal 6 with contrib modules; to date with Drupal 7, this can&rsquo;t be done. The APIs driving all these separate systems have gotten so complex, that either no one knows how to do this anymore, or the architecture doesn&rsquo;t allow it.</li> <li> The <a href=""><strong>Media</strong></a> module was supposed to be an uber-abstracted API for handling audio, video, photos, etc. As of a few weeks ago, basic YouTube and Vimeo integration didn&rsquo;t work. The parts of Media that did work (sponsored largely by <a href="">Acquia</a>) <a href="">didn&rsquo;t conform</a> to long-standing Drupal standards. Fortunately there were <a href="">workarounds</a> for the site I was building, but their existence is a testament to the unrealistic ambition and excessive complexity of the master project.</li> <li> The Render API, intended to increase flexibility, has compounded the old problem in Drupal of business logic being spread out all over the place. The point in the flow where structured data gets rendered into HTML strings isn&rsquo;t standardized, so knowing how to modify one type of output doesn&rsquo;t help with modifying another. (Recently I tried to modify a <code>date_select</code> field at the code level to show the date parts in a different order - as someone else <a href="">tried to do</a> a year ago - and gave up after hours. The solution ended up being in the UI - so the end-user was given code-free power at the expense of the development experience and overall flexibility.)</li> </ul> <p> Drupal 8 has an &ldquo;<a href="">Initiatives</a>&rdquo; structure for prioritizing effort. <strong> I&rsquo;d like to see a new initiative, <em>Simplification</em>: Drupal 8 should have fewer lines of code, fewer APIs, and fewer database tables than Drupal 7.</strong> Every component should be re-justified and eliminated if it duplicates an existing function. And the Drupal 8 contrib space should follow the same principles. I submit that this is more important than any single new feature that can be built, and that if the codebase becomes simpler, adding new features will be easier.</p> <p> A few examples of places I think are ripe for simplifying:</p> <ul> <li> The Form API has too much redundancy. <code>#process</code> handlers are a bear to work with (try altering the <code>#process</code> flow of a date field) and do much the same as #after_build.</li> <li> The render API now has <code>hook_page_build</code>, <code>hook_page_alter</code>, <code>hook_form_alter</code>, <code>hook_preprocess</code>, <code>hook_process</code>, <code>hook_node_views</code>, <code>hook_entity_view</code>, (probably several more for field-level rendering), etc. This makes understanding even a well-architected site built by anyone else an enormous challenge. Somewhere in that mix there&rsquo;s bound to be unnecessary redundancy.</li> </ul> <p> <strong>Usable code isn&rsquo;t a luxury, it&rsquo;s critical to attracting and keeping developers in the project.</strong> I saw a presentation recently on Rapid Prototyping and it reminded me how far Drupal has come from being able to do anything like that. (I don&rsquo;t mean the rapid prototype I did of a <a href="">job listing site</a> - I mean application development, building something <em>new</em>.) The demo included a massive data migration accomplished with 4 lines of javascript in the MongoDB terminal; by comparison, I recently tried to change a dropdown field to a text field (both identical strings in the database) and Drupal told me it couldn&rsquo;t do that because &ldquo;the field already had data.&rdquo;</p> <p> My own experience is that Drupal is becoming more frustrating and less rewarding to work with. Backend expertise is also harder to learn and find (at the last meetup in Boston, a very large Drupal community, only one other person did freelance custom development). Big firms like Acquia are hiring most of the rest, which is great for Acquia, but skews the product toward enterprise clients, and increases the cost of development for everyone else. If that&rsquo;s the direction Drupal is headed - a project understood and maintained only by large enterprise vendors, for large enterprise users, giving the end-user enormous power but the developer a migraine - let&rsquo;s at least make sure we go that way deliberately and with our eyes open. <strong>If we want the product to stay usable for newbie developers, or even people with years of experience - and ultimately, if we want the end-user experience to </strong><em>work</em><strong> - then the trend has to be reversed toward a better balance.</strong></p> drupal Wed, 10 Aug 2011 20:23:41 +0000 ben 7203 at Upgrading to OSX Lion: Some Gotchas <p>I upgraded my MacBookpro to OSX Lion today. It's $30 in the App Store (or copy the DMG from someone who paid and it's free). ArsTechnica has an extremely comprehensive <a href="">review of Lion</a> if you want the full details. Here are a few minor hiccups I ran into:</p> <p>&bull; Custom symlinks in /usr/bin were removed. So far I've noticed the symlink for Git missing (/usr/bin/git -&gt; /usr/local/git/bin/git), I put it back. (There's also a StackExchange <a href="">thread</a> about this.)</p> <p>&bull; The AFP protocol on my <a href="">ReadyNAS</a> - which I was using, among other things, for Time Machine backups - is not compatible with Lion. Fortunately the Netgear folks are quick and have a new <a href=";t=55155&amp;sid=614424b8c4d70afa1733983b9362a4cb">beta release</a> with the new protocol; I installed that it seems to be working fine.</p> <p>&bull; I turned off the new mobile-inspired "natural" scrolling. It doesn't feel natural to me, and I don't want to get disoriented every time I use someone else's computer.</p> <p>&bull; The new (also mobile-inspired) bouncy scrolling is slightly annoying too, but I can't figure out how to disable that.</p> <p>&bull; The OS seems to handle <strong>memory</strong> much better. I'm running all the usual apps, but it's turning a lot more of the "active" RAM (yellow in Activity Monitor) into "free" (green) RAM. I'm not sure what the overall performance impact is yet, but it's nice to see the OS [apparently] cleaning up dead memory better than before.</p> <p>&bull; Some of the startup items in my user account were removed, I put them back.</p> <p>Otherwise it's been pretty smooth. I like the new Spaces+Exposé hybrid called Mission Control. One of the main reasons I upgraded so quickly was the new full-disk encryption built in, which I'll set up as soon as I can reboot.</p> osx Thu, 04 Aug 2011 19:58:12 +0000 ben 7194 at Drupal 7 / Drush tip: Find all field content using a text format <p>I'm working on a Drupal 7 site and decided one of the <strong>text formats</strong> ("input formats" in D6) was redundant. So I disabled it, and was warned that "any content stored with that format will not be displayed." How do I know what content is using that format? This little shell snippet told me:</p> <div class="geshifilter"> <pre class="bash geshifilter-bash">drush sql-query <span class="st0">&quot;show tables like 'field_data_%'&quot;</span> <span class="sy0">|</span> <span class="kw2">tail</span> -n+<span class="nu0">2</span> <span class="sy0">|</span> <span class="kw1">while</span> <span class="kw2">read</span> TABLE; <span class="kw1">do</span> <span class="re2">FIELD</span>=<span class="sy0">`</span>drush sql-query <span class="st0">&quot;show fields in <span class="es2">$TABLE</span> like '%format%';&quot;</span> <span class="sy0">|</span> <span class="kw2">tail</span> -n+<span class="nu0">2</span> <span class="sy0">|</span> <span class="kw2">awk</span> <span class="st_h">'{ print $1 }'</span><span class="sy0">`</span>; <span class="kw3">echo</span> <span class="st0">&quot;<span class="es2">$TABLE</span> - <span class="es2">$FIELD</span>&quot;</span>; <span class="kw1">if</span> <span class="br0">&#91;</span><span class="br0">&#91;</span> <span class="st0">&quot;<span class="es2">$FIELD</span>&quot;</span> <span class="sy0">!</span>= <span class="st0">&quot;&quot;</span> <span class="br0">&#93;</span><span class="br0">&#93;</span>; <span class="kw1">then</span> drush sql-query <span class="st0">&quot;select * from <span class="es3">${TABLE}</span> where <span class="es3">${FIELD}</span>='old_format''&quot;</span>; <span class="kw1">fi</span> <span class="kw1">done</span></pre></div> <p>You'll need to run that in the terminal from your site's webroot and have <a href="">Drush</a> installed. Rename <span class="geshifilter"><code class="text geshifilter-text">old_format</code></span> to the code name of your text format. (<span class="geshifilter"><code class="text geshifilter-text">drush sql-query &quot;select * from {filter_format}&quot;</code></span> will show you that.) It'll work as a single command if you copy and paste it (as multiple lines or with line breaks stripped - the semi-colons indicate the end of each statement).</p> <p>Breaking it down:</p> <ol> <li>Find all the tables used for content storage.</li> <li>Find all the 'format' fields in those tables. (They'll only exist if the field uses formats.)</li> <li>Find all the rows in those tables matching the format you want to delete. Alternatively, if you want everything to be in one format, you can see what does <em>not</em> use that format by changing the <span class="geshifilter"><code class="text geshifilter-text">${FIELD}=...</code></span> condition to <span class="geshifilter"><code class="text geshifilter-text">${FIELD}&lt;&gt;'new_format'</code></span>.</li> </ol> <p>This won't fix anything for you, it'll just show you where to go - look at the <span class="geshifilter"><code class="text geshifilter-text">entity_id </code></span> columns (that's the <span class="geshifilter"><code class="text geshifilter-text">nid</code></span> if the content is nodes) and go edit that content.</p> <p>Also note, this is checking the field_<strong>data</strong>_ tables, which (as far as I can tell) track the latest revision. If you are using content revisions you might want to change the first query to <span class="geshifilter"><code class="text geshifilter-text">show tables like 'field_revision_%'</code></span>. I'm not sure why D7 duplicates so much data, but that's for another post.</p> <p><em>Update</em>: I modified the title from <em>Find all content</em> to <em>Find all field content</em> because of the comment by David Rothstein below.</p> drupal Tue, 02 Aug 2011 15:51:59 +0000 ben 7185 at Workaround to variables cache bug in Drupal 6 <p>I run my Drupal <a href="">crons</a> with <a href="">Drush</a> and <a href="">Jenkins</a>, and have been running into a race condition frequently where it tells me, <strong><em>Attempting to re-run cron while it is already running</em></strong>, and fails to run.</p> <p>That error is triggered when the <span class="geshifilter"><code class="text geshifilter-text">cron_semaphore</code></span> variable is found. It's set when cron starts, and is deleted when cron ends, so if it's still there, cron is presumably still running. Except it wasn't really - the logs show the previous crons ended successfully.</p> <p>I dug into it a little further: <span class="geshifilter"><code class="bash geshifilter-bash">drush vget cron_semaphore</code></span> brought up the timestamp value of the last cron, like it was still set. But querying the <span class="geshifilter"><code class="text geshifilter-text">`variables`</code></span> table directly for <span class="geshifilter"><code class="text geshifilter-text">cron_semaphore</code></span> brought up nothing! That tipped me off to the problem - it was <strong>caching</strong> the variables array for too long.</p> <p>Searching the issue brought up a bunch of posts acknowledging the issue, and suggesting that people clear their whole cache to fix it. I care about performance on the site in question, so clearing the whole cache every 15 minutes to run cron is not an option.</p> <p>The underlying solution to the problem is very complex, and the subject of several ongoing threads:</p> <ul> <li><a href=""> variable_set() should rebuild the variable cache, not just clear it</a></li> <li><a href="">Optimize variable caching</a></li> <li><a href="">Concurrency problem with variable caching</a></li> </ul> <p>Following Drupal core dev policy now (which is foolish IMHO), if this bug is resolved, it has to be resolved first in 8.x (which won't be released for another 2-3 years), then 7.x, then 6.x. So waiting for that to work for my D6 site in production isn't feasible.</p> <p>As a stopgap, I have Jenkins clear only the <span class="geshifilter"><code class="text geshifilter-text">'variables'</code></span> cache entry before running cron:<br /> <strong><span class="geshifilter"><code class="text geshifilter-text">drush php-eval &quot;cache_clear_all('variables', 'cache');&quot;</code></span></strong></p> <p>That seems to fix the immediate problem of cron not running. It's not ideal, but at least it doesn't clear the entire site cache every 15 minutes.</p> drupal Thu, 19 May 2011 04:48:22 +0000 ben 7118 at A GitHub dev on the importance of side projects <p>GitHub developer Zach Holman wrote a great post a month ago, <a href=""><em>Why GitHub Hacks on Side Projects</em></a> (discovered via <a href=""><em>Signal vs Noise</em></a>). It's about having a culture that encourages quirky side projects, "automated inefficiencies," to give the mind breathing time between big challenges, to promote camaraderie, and to make people smile. I recommend anyone who does creative or technical work read it. Snippet:</p> <blockquote><p>You should build out a side project culture. A Campfire bot is natural for us, since we spend so much time in Campfire, but there’s plenty of other areas. Hack on your continuous integration server. An app that picks where you’re having lunch that day. A miniapp that collects and stores employee-created animated gifs. A continuous integration animated lunch machine. It doesn’t matter what it is; if it improves the lives of your coworkers or makes them laugh, it helps build a stronger company culture. And that’s cool.</p></blockquote> side-projects Wed, 11 May 2011 16:51:02 +0000 ben 7104 at Three Quirks of Drupal Database Syntax <p>Database query syntax in Drupal can be finicky, but doing it right - following the <a href="">coding standards</a> <em>as a matter of habit</em> - is very important. Here are three "gotchas" I've run into or avoided recently:</p> <p><strong>1. Curly braces around tables:</strong> Unit testing with <a href="">SimpleTest</a> absolutely requires that table names in all your queries be wrapped in <strong>{curly braces}</strong>. SimpleTest runs in a sandbox with its own, clean database tables, so you can create nodes and users without messing up actual content. It does this by using the existing <strong>table prefix</strong> concept. If you write a query in a module like this,<br /> <span class="geshifilter"><code class="php geshifilter-php"><span class="re0">$result</span> <span class="sy0">=</span> db_query<span class="br0">&#40;</span><span class="st0">&quot;SELECT nid from node&quot;</span><span class="br0">&#41;</span><span class="sy0">;</span></code></span><br /> when that runs in test, it will load from the regular <span class="geshifilter"><code class="text geshifilter-text">node</code></span> table, not the sandboxed one (assuming you have no prefix on your regular database). Having tests write to actual database tables can make your tests break, or real content get lost. Instead, all queries (not just in tests) should be written like:</p> <p><span class="geshifilter"><code class="php geshifilter-php"><span class="re0">$result</span> <span class="sy0">=</span> db_query<span class="br0">&#40;</span><span class="st0">&quot;SELECT nid from {node} node&quot;</span><span class="br0">&#41;</span><span class="sy0">;</span></code></span><br /> (The 2nd <span class="geshifilter"><code class="text geshifilter-text">node</code></span> being an optional <em>alias</em> to use later in the query, for example as <span class="geshifilter"><code class="text geshifilter-text">node.nid</code></span> JOINed to another table with a <span class="geshifilter"><code class="text geshifilter-text">nid</code></span> column.) When Drupal runs the query, it will prefix <span class="geshifilter"><code class="text geshifilter-text">{node}</code></span> by context as <span class="geshifilter"><code class="text geshifilter-text">site_node</code></span>, or <span class="geshifilter"><code class="text geshifilter-text">simpletestXXnode</code></span>, to keep the sandboxes separate. Make sure to always curly-brace your table names!<br /> <br/></p> <p><strong>2. New string token syntax:</strong> Quotation marks around <strong>string tokens</strong> are different in Drupal 6 and 7. D7 uses the new <a href="">"DBTNG" abstraction layer</a> (backported to D6 as the <a href="">DBTNG module</a>). In Drupal 6, you'd write a query with a string token like this:<br /> <span class="geshifilter"><code class="php geshifilter-php"><span class="re0">$result</span> <span class="sy0">=</span> db_query<span class="br0">&#40;</span><span class="st0">&quot;SELECT nid from {node} where title='<span class="es6">%s</span>'&quot;</span><span class="sy0">,</span> <span class="st_h">'My Favorite Node'</span><span class="br0">&#41;</span><span class="sy0">;</span></code></span><br /> Note the single quotation marks around the placeholder <span class="geshifilter"><code class="text geshifilter-text">%s</code></span>.</p> <p>With D7 or DBTNG, however, the same static query would be written:<br /> <span class="geshifilter"><code class="php geshifilter-php"><span class="re0">$result</span> <span class="sy0">=</span> db_query<span class="br0">&#40;</span><span class="st0">&quot;SELECT nid from {node} WHERE title = :title&quot;</span><span class="sy0">,</span> <span class="kw3">array</span><span class="br0">&#40;</span><span class="st_h">':title'</span> <span class="sy0">=&gt;</span> <span class="st_h">'My Favorite Node'</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span></code></span><br /> No more quotes around the <span class="geshifilter"><code class="text geshifilter-text">:title</code></span> token - DBTNG puts it in for you when it replaces the placeholder with the string value.<br /> <br/></p> <p><strong>3. Uppercase SQL commands:</strong> Make sure to use UPPERCASE SQL commands (SELECT, FROM, ORDER BY, etc) in queries. Not doing so is valid syntax 99% of the time, but will occasionally trip you up. For example: the <a href="">db_query_range</a> function (in D6) does not like lowercase <span class="geshifilter"><code class="text geshifilter-text">from</code></span>. I was using it recently to paginate the results of a big query, like <strong><span class="geshifilter"><code class="text geshifilter-text">select * from {table}</code></span></strong>. The pagination was all messed up, and I didn't know why. Then I changed it to <strong><span class="geshifilter"><code class="text geshifilter-text">SELECT * FROM {table}</code></span></strong> and it worked. Using uppercase like that is a good habit, and in the few cases where it matters, I'll be glad I'm doing it from now on.</p> drupal Sun, 08 May 2011 19:45:14 +0000 ben 7099 at Monitoring Drupal sites with Munin <p>One of the applications I've been working with recently is the <a href="">Munin</a> monitoring tool. Its homepage describes it simply:</p> <blockquote><p>Munin is a networked resource monitoring tool that can help analyze resource trends and "what just happened to kill our performance?" problems. It is designed to be very plug and play. A default installation provides a lot of graphs with almost no work.</p></blockquote> <p><img id="n7055-munin-graph" src="/files/mysql_queries-day.png" style="float: right; width:300px;" alt="Munin graph" title="Munin graph" />Getting Munin set up on an Ubuntu server is very <a href="">easy</a>. (One caveat: a lot of new plugins require the latest version of Munin, which is only available in Ubuntu 10.) Munin works on a "master" and "node" structure, the basic idea being:</p> <ol> <li>On a cron, the master asks all its nodes for all their stats (usually via port 4949, so configure your firewall accordingly).</li> <li>Each node server asks all its plugins for their stats.</li> <li>Each plugin dumps out brief key:value pairs.</li> <li>The master collects all the data and compiles graphes as images on static HTML pages.</li> </ol> <p>Its simplicity is admirable: Each plugin is its own script, written in any executable language. There are common environment variables and output syntax, but otherwise writing or modifying a plugin is very easy. The plugin directory is called <a href="">Munin Exchange</a>. (The latest version of each plugin isn't necessarily on there, though: in some cases searching for the plugin name brought up newer versions on Github.)</p> <p>I set up Munin for two reasons: 1) get notifications of problems, 2) see historical graphs to spot trends and bottlenecks. I have Munin running on a dedicated monitoring server (also running <a href="">Jenkins</a>), since notifications coming from the web server wouldn't be much use if the web server went down. It's currently monitoring three nodes (including itself), giving me stats on memory (total and for specific processes), CPU, network traffic, apache, mysql, <a href="">S3</a> buckets, memcached, varnish, and mongodb. Within a few days of it running, a memory leak on one server became apparent, and the "MySql slow query" spikes that coincide with cron (doing a bunch of stats/aggregation) are illuminating.</p> <p>None of this is Drupal specific, but graphing patterns in Drupal simply requires a plugin, and McGo has fortunately given us a <a href="">Munin module</a> that provides just that. (The package includes two modules: Munin API to define stats and queries, and Munin Defaults with some basic node and user queries.) I asked for maintainer access and modified it a little - the 6.x-2.x branch now uses <a href="">Drush</a> for database queries rather than storing the credentials in the scripts, for example. The module generates the script code which you copy to files in your plugins directory.</p> <p>Conclusions so far: getting Munin to show you graphs on all the major stats of a server takes a few hours (coming at it as a total beginner). Setting up useful notifications is more complicated, though, and will probably have to evolve over time through trial and error. For simple notifications on servers going down, for example, it's easier to set up a simple cron script (on another server) with <span class="geshifilter"><code class="text geshifilter-text">curl</code></span> and <span class="geshifilter"><code class="text geshifilter-text">mail</code></span>, or use the free version of <a href="">CloudKick</a>. Munin's notifications are more suited to spotting spikes and edge cases.</p> munin Fri, 22 Apr 2011 18:41:45 +0000 ben 7055 at Yesterday's Cloud Collapse <p>Amazon's <a href="">EC2</a> cloud hosting system went down for several hours yesterday. I first noticed the disruption because my <a href="">dotCloud</a> instances (which I've been <a href="">playing with</a> for Drupal feasibility) stopped responding. Then a server I'm running on a totally different hosting service, <a href=""></a>, went down at the same time. (It turns out that was just a coincidence; does not run on EC2 according to their support staff.) Anyway, the simultaneous outage made me think it was more than a coincidence, so I googled "cloud outage" and found a breaking CNN story.</p> <p>Mashable <a href="">explains</a> the problem exposed by the outage: EC2 is supposed to be redundant across multiple "Availability Zones," but a cascading failure still managed to bring down the whole system. That article links to a <a href="">more detailed</a> explanation of what happened.</p> <p>I expect there's going to be a knee-jerk reaction now among some [mostly old-school] sysadmins away from the cloud, back to co-locating physical servers in a data center. But I worked at a company that hosted dozens of sites that way, and when the data center had a fire and lost power, their sites all went down for days. The cloud is just an abstraction inside a physical machine. It's an abstraction that allows for tremendous efficiency, cost-savings, and redundancy. But physical failures (of power or connectivity) can still bring any infrastructure down.</p> <p>One notable EC2-based service that was not disrupted (according to Mashable at least) was Netflix, because they built sufficient redundancy to handle an entire data center's failure. That's the obvious lesson for customers of any hosting service: if 24/7/365 uptime of your service is absolutely critical, then build in massive redundancy. That applies if you're hosting on physical servers or in the cloud. Redundancy is complicated, and expensive, and like an insurance policy, only seems worthwhile in a crisis. So it's probably not worth the cost for most applications.</p> <p>I'm also somewhat fatalistic about infrastructure in general: it's all very fragile. And the more complex and interdependent our systems become, the more points of failure we introduce. Redundancy itself is kind of two steps forward, one step back, simply because it adds complexity.</p> cloud Fri, 22 Apr 2011 15:33:18 +0000 ben 7053 at Setting up Drupal on DotCloud's server automation platform <p>Managing a properly configured server stack is one of the pain points in developing small client sites. Shared hosting is usually sub-par, setting up a VPS from scratch is overkill, and automation/simplification of the server configuration and deployment is always welcome. So I've been very interested in seeing how <a href="">DotCloud</a> might work for Drupal sites.</p> <p>DotCloud is in the same space as <a href="">Heroku</a>, an automated server/deployment platform for Rails applications. DotCloud is trying to cater to much more than Rails, however: they currently support PHP (for websites and "worker" daemons), Rails, Ruby, Python, MySql, Postgresql, and have Node.js, MongoDB and a whole bunch of other components on their <a href="">roadmap</a>.</p> <p>The basic idea is to automate the creation of pre-configured <a href="">EC2</a> instances using a shell-based API. So you create a web and database setup and push your code with four commands:</p> <div class="geshifilter"><pre class="text geshifilter-text">dotcloud create mysite dotcloud deploy -t php mysite.www dotcloud deploy -t mysql mysite.db dotcloud push mysite.www ~/code</pre></div> <p>Each "deployment" is its own EC2 server instance, and you can SSH into each (but without root). The "push" command for deployment can use a Git repository, and files are deployed to the server <a href="">Capistrano</a>-style, with symlinked releases and rollbacks. (This feature alone, of pushing your code and having it automatically deploy, is invaluable.)</p> <p><strong>Getting it to work with Drupal</strong> is a little tricky. First of all, the PHP instances run on <a href="">nginx</a>, not Apache. So the usual .htaccess file in core doesn't apply. Drupal can be deployed on nginx with some contortions, and there is a <a href="">drupal-for-nginx</a> project on Github. However, I write this post after putting in several hours trying to adapt that code to work, and failing. (I've never used nginx before, which is probably the main problem.) I'll update it if I figure it out, but in the meantime, this is only a partial success.</p> <p>The basic process is this:</p> <ul> <li>Set up an account (currently needs a beta invitation which you can request)</li> <li>Install the dotcloud client using python's <span class="geshifilter"><code class="text geshifilter-text">easy_install</code></span> app</li> <li>Set up a web (nginx) instance with <span class="geshifilter"><code class="text geshifilter-text">dotcloud deploy</code></span></li> <li>Set up a database (mysql or postgres) instance</li> <li>Set up a local Git repo, download Drupal, and configure settings.php (as shown with <span class="geshifilter"><code class="text geshifilter-text">dotcloud info</code></span></li> <li>Push the repository using <span class="geshifilter"><code class="text geshifilter-text">dotcloud push</code></span></li> <li>Navigate to your web instance's URL and install Drupal.</li> <li>To use your own domain, set up a CNAME record and run <span class="geshifilter"><code class="text geshifilter-text">dotcloud alias</code></span>. ("Naked" domains, i.e. without a prefix like www, don't work, however, so you have to rely on DNS-level redirecting.)</li> <li>For added utility, SSH in with <span class="geshifilter"><code class="text geshifilter-text">dotcloud ssh</code></span> and install <a href="">Drush</a>. (Easiest way I found was to put a symlink to the executable in ~/bin.)</li> </ul> <p>The main outstanding issue is that friendly URL's don't work, because of the nginx configuration. I hope to figure this out soon.</p> <p>Some other issues and considerations:</p> <ul> <li>The platform is still in beta, so I experienced a number of API timeouts yesterday. I <a href="!/dot_cloud/status/60151655211085824">mentioned</a> this on Twitter and they said they're working on it; today I had fewer timeouts.</li> <li>The server instances don't give you root access. They come fully configured but you're locked into your home directory, like shared hosting. I understand the point here - if you changed the server stack, their API and scaling methodologies wouldn't work - but it means if something in the core server config is wrong, tough luck.</li> <li>The shell (bash in Ubuntu 10.04 in my examples) is missing a Git client, vim, and nano, and some of its configuration (for <span class="geshifilter"><code class="text geshifilter-text">vi</code></span> for instance) is wonky out of the box.</li> <li>The intended deployment direction is one-way, from a local dev environment to the servers, so if you change files on the server, you need to rsync them down. (You can SSH in with the dotcloud app and put on a normal SSH key for rsync.)</li> <li>Because the webroot is a symlink, any uploaded files have to be outside the webroot (as a symlink as well). This is normal on Capistrano setups, but unusual for most Drupal sites (running on shared or VPS hosting).</li> <li>It's free now, but only because it's in beta and they haven't announced pricing. It is yet to be seen if this will be cost-effective when it goes out of beta.</li> <li>They promise automated scaling, but it's not clear how that actually works. (Nowhere in the process is there a choice of RAM capacity, for instance.) Does scaling always involve horizontally adding small instances, and if so, does that make sense for high-performance applications?</li> </ul> <h3>Conclusion so far</h3> <p>The promise of automated server creation and code deployment is very powerful, and this kind of platform could be perfect for static sites, daemons, or some custom apps. If/when I get it working in Drupal, it could be as simple as a shell script to create a whole live Drupal site from nothing in a few seconds.</p> <p><a href="">Try it out!</a> I'm very curious what others' experience or thoughts are on this approach. I'd especially love for someone to post a solution to the nginx-rewrite issue in the comments.</p> <h3>Update 4/22:</h3> <p>Their support staff recommended reducing the nginx.conf file to one line:</p> <p><code>try_files $uri $uri/ /index.php?q=$uri;</code></p> <p>And that worked. However, it leaves out all the other recommended <a href="">rules</a> for caching time, excluding private directories, etc. I asked about these and am still waiting for a reply.</p> <p>Also, to get <strong>file uploads</strong> to work properly, you'll need to put your files directory outside of the webroot, and symlink sites/default/files (or equivalent) to that directory, using a <a href="">postinstall</a> script. Mine looks like this (after creating a <span class="geshifilter"><code class="text geshifilter-text">~/drupal-files</code></span> directory):</p> <div class="geshifilter"><pre class="bash geshifilter-bash"><span class="kw2">chmod</span> g+<span class="kw2">w</span> <span class="sy0">/</span>home<span class="sy0">/</span>dotcloud<span class="sy0">/</span>current<span class="sy0">/</span>sites<span class="sy0">/</span>default <span class="sy0">&amp;&amp;</span> \ <span class="kw2">ln</span> <span class="re5">-s</span> <span class="sy0">/</span>home<span class="sy0">/</span>dotcloud<span class="sy0">/</span>drupal-files <span class="sy0">/</span>home<span class="sy0">/</span>dotcloud<span class="sy0">/</span>current<span class="sy0">/</span>sites<span class="sy0">/</span>default<span class="sy0">/</span>files <span class="sy0">&amp;&amp;</span> \ <span class="kw3">echo</span> <span class="st0">&quot;Symlink for files created.&quot;</span></pre></div> <p>That runs whenever you run <span class="geshifilter"><code class="text geshifilter-text">dotcloud push</code></span>, and is similar to sites deployed with Capistrano, where the same kind of symlink approach is generally used.</p> dotcloud hosting nginx Tue, 19 Apr 2011 16:10:05 +0000 ben 7039 at A showcase of red flags: How do web shops get away with this? <p>I recently had occasion to review the <a href="">new website</a> of a major bank's <a href="">CRA</a>/charity wing. As a web developer, I'm always curious how other sites are built. This one raised a number of red flags for me, so I'd like to write about it as a showcase. I have three questions on my mind:</p> <ol> <li>How do professional web shops get away with such poor quality work? </li> <li>How do clients know what to look for (and avoid)? </li> <li>With plenty of good web shops out there, why aren't big clients connecting with them?</li> </ol> <p>I don't have the answers yet, but I do want to raise the questions. First, reviewing the site from my developer's perspective:</p> <ul> <li>The page contents are loaded with Javascript. With Javascript turned off, there's a little bit of left nav, and the main area is <a href="">essentially blank</a>. This means the site is unreadable to <a href="">screen readers</a> (browsers for blind people), so the site is not <a href="">508 compliant</a>. Maybe more importantly, it means the contents of the page are invisible to search engines. (See Google's <a href=";cd=1&amp;hl=en&amp;ct=clnk&amp;gl=us&amp;">cached copy</a> of the homepage for example.)</li> <li>The Javascript that pulls in the page contents is loading an XML file with <a href="">AJAX</a> (see line 72 of the <a href="view-source:">homepage source</a>). <a href="">XML</a> is meant for computers to talk to each other, not for human-readable websites, and AJAX is meant for interactive applications, not the main content area of every page. (I can only imagine the workflow for editing the content on the client side: does their CMS output XML? Do they manually edit XML? Or can the content never change without the original developers?)</li> <li>The meta tags are all generic: The <a href="">OpenGraph</a> page title (used by Facebook) across the site is "ShareThis Homepage". (<a href="">ShareThis</a> has a "social" widget which I assume they copied the code from, but having those meta values is probably worse than having none at all.)</li> <li>None of the titles are links, so even if Google could read the site, it would just see a lot of <em>Read More</em>'s.</li> <li>From a usability perspective, the 11px font size for most of the content is difficult to read.</li> <li>The <a href="">Initiatives by State map</a> is built in Flash, which makes it unviewable on non-Android mobile devices. Flash is also unnecessary for maps now, given the slew of great HTML5-based mapping tools. Not to mention the odd usability quirks/bugs of the map's interface.</li> </ul> <p>I could go on, but that's enough to make the point. So what's going on here? I've seen enough similar signs in other projects to feel confident in speculating about this one.</p> <p>The vendor wasn't entirely incompetent - the hundreds of lines of Javascript code needed some technical proficiency to write - yet the site ignores so many core principles of good web development circa 2011. Whatever skills were applied here, were misplaced. The "web" has to accommodate our phones, <a href="">TVs</a>, even our <a href="">cars</a>, with "mobile" browsers (broadly defined) expected to eclipse the desktop in the not-too-distant future. That means <a href="">progressive enhancement</a> and basic HTML quality are critical. Web users also have an infinity of sites to visit, so to justify the investment in <em>yet another</em> site, you need some basic <a href="">Search Engine Optimization</a> for people to find you. Building a site that is readable only to a narrow subset of desktop browsers constitutes an unfinished product in my book.</p> <p>On the client side, any site with more than one page, that needs to be updated more than once in a blue moon, needs a <a href="">content management system</a>. I don't see the tell-tales of any common CMS here, and the way the contents are populated with AJAX suggests the CMS under the hood is weak or non-existent. Reinventing the wheel with entirely custom code for a site makes it difficult to maintain in the long run: developers with expertise in common frameworks/CMSs won't want to touch it, and whoever does will need a long ramp-up/head-scratching period to understand it. It's also unnecessary with so many tested tools available. So clients need to <em>insist on a CMS</em>, and if a vendor tries to talk them out of one, or claim it will be 3x the price, they need to find a better vendor. I work with <a href="">Drupal</a> and think it's the best fit for many sites (and free of license fees), but there are many good options.</p> <p>The site doesn't say who built it, and searching for relevant keywords doesn't bring up any clearly proud vendors. Was it a web shop at all, or an ad agency that added some token web services to their roster? (General rule: <em>avoid those vendors.</em>) Clients need to see their sites not as another piece of throwaway marketing material, but as a long-term, <em>audience-building</em> investment. Thinking of websites as advertisements that only need to be viewed on Windows running Internet Explorer is missing the point.</p> <p>I wonder, given the client (with <a href="">$10 billion</a> in profit in 2010), how much this site cost. It's not a brochure site, but it's not particularly complex either. The only really custom piece is the map, and the same style could probably be implemented with <a href="">OpenLayers</a> (or <a href="">Google Maps</a> with some compromise from the client on color requirements). Whatever they paid, I suspect they could have paid one of the top Drupal shops the same price to build a maintainable, standards-based, truly impressive website, for visitors, internal staff, and reviewing developers alike.</p> <p>Then again, being such a large client means the vendor likely had to deal with all kinds of red tape. Maybe the really good web shops don't connect with that class of client because it's not worth the hassle. But surely the U.S. House of Representatives, in the process of <a href="">moving to Drupal</a>, has its own brand of red tape, and the <a href="">vendor</a> has project managers who can handle it.</p> <p>Websites are complex beasts and evaluating them from the client perspective is not the same as watching a proposed TV commercial. So how do client without core competencies in web development know what to avoid? <a href=";ie=UTF-8&amp;q=web+development+best+practices">Googling it</a> will only get them so far. But the burden is ultimately on them: we all consume products about which we lack core expertise, and big corporations (as consumers and clients themselves) need to figure out the same <a href="">heuristics</a> as everyone else. Trusting reputable vendors is one approach, but it's a vicious cycle if they're locked into one vendor (as companies with existing B2B relationships often are).</p> <p><em>Diversifying the advice you get</em> is critical. Big projects should have <a href="">RFPs</a> and a bidding process. (That helps enforce a realistic division of labor: little shops like <a href="">mine</a> don't respond to RFPs, but big shops that can afford that investment are happy to manage the project and outsource some development so suit their own core competencies.)</p> <p>The bidding process could even involve the vendors defending their proposals in front of their competitors. Then the top-tier CMS shop can eliminate the static-HTML or Cold Fusion shop from the running before it's too late. There are no silver bullets - there's a potential to be fleeced in any market - but in most of them, consumers have figured out ways to spot red flags and protect themselves. Website clients need to educate themselves and catch up.</p> clients Thu, 14 Apr 2011 14:29:58 +0000 ben 7000 at Web Development as a Spiritual Experience <p>I'm setting aside the <a href="">code-heavy posting</a> for a little while to share some abstract thoughts I've had recently. I've been enjoying a brief (intentional) lull in my work load, and the resulting cognitive surplus has made me remember how much I <em>enjoy my work</em> as a web developer. There are many good (and obvious) reasons for this:</p> <ul> <li><p>Web development involves <em>building</em> things. The product may be virtual, but there's a pleasure in the construction (the kind Matthew Crawford talks about in <a href=";tag=benbucknet-20&amp;link_code=as3&amp;camp=211189&amp;creative=373489&amp;creativeASIN=0143117467"><em>Shop Class as Soulcraft</em></a>), a fulfillment from creating something that didn't exist before (that often provides people a clear value).</p></li> <li><p>Web development, insofar as it involves programming, involves a molding of things to one's will. (<a href=";tag=benbucknet-20&amp;link_code=as3&amp;camp=211189&amp;creative=373489&amp;creativeASIN=1430219483"><em>Coders at Work</em></a> is a great read on this subject.) Code can be a pain in the ass sometimes, but it doesn't "disobey". Just as dealing with obstinate people can be stressful, dealing with machines can be comforting.</p></li> <li><p>There's a tremendous amount of <em>mastery</em> involved in web development. Good web developers are familiar with the whole technology stack, from performance-tuning a tiny SQL query to scaling multiple servers. There's always something new to learn, and the learning contributes to a positive feedback loop. Web development is not a job for incurious people.</p></li> </ul> <p>These reasons I've known for a while. But recently I've been pondering another reason, maybe the most important one:</p> <p>The web is <em>infinite</em>. On the web anything can talk to anything else. Any knowable information can be discovered and connected to any other information. Physical boundaries are irrelevant. The promise of "open" (source, standards, data) is the most concrete manifestation of progress I can think of, maybe the most real example humanity has ever created. (Maybe this is why the web <a href="">freaks out</a> established powers.) If we expand our imagination a little more, the web could even have the potential to <a href=";tag=benbucknet-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=B003L1ZXCU">restructure</a> the whole fabric of society.</p> <p>So as a web developer, we deal with infinity every day. There's always something bigger and cooler to build, some new technology to master, some new innovation around the corner. Getting from <em>"Wouldn't it be cool if..."</em> to <a href="">shipping</a> a brand new creation is just a matter of putting in the time. (That makes <em>time</em> the only real enemy of the developer - but that post is for another time.)</p> <p>I think there's a kind of spiritual experience in working with that kind of infinity. It's not the metaphorical infinity of religion; it's an infinity we can directly perceive through our work. I don't think developers of pre-web desktop software felt the same kind of potential. (That divide between <a href="">proprietary lock-in</a> and <a href="">openness</a> is the reason former <a href="">titans</a> of the tech industry have become irrelevant.) But I suspect the original <a href="">pioneers</a> of the web understood the power they were unleashing.</p> <p>Long live the internet.</p> Thu, 14 Apr 2011 04:26:56 +0000 ben 7013 at How to render image fields with Drupal 7 <p>Suppose you have a <a href="">Drupal 7</a> site, with a node containing an image field called <span class="geshifilter"><code class="text geshifilter-text">field_image</code></span>, and you want to pull the URL of the image into your <strong>page</strong> template. (As opposed to a node template, where it's already rendered.) For bonus points, you want the image to be rendered through a "style" (aka Imagecache preset).</p> <p>In Drupal 6, this involved <span class="geshifilter"><code class="text geshifilter-text">theme('imagecache')</code></span> and <span class="geshifilter"><code class="text geshifilter-text">$node-&gt;field_image[0]['filepath']</code></span>. In Drupal 7 it's a little different, because the Files API has been abstracted from the local file system (to handle other types of storage) and Imagecache is now (mostly) in the core Image module.</p> <p>First, I'm separating this into the <strong>preprocessor</strong> logic where we check if the image field exists and get the right code, and the <strong>template</strong> output where we use the finished value. The page preprocessor would probably be in your theme's template.php and look like <span class="geshifilter"><code class="php geshifilter-php"><span class="kw2">function</span> THEME_preprocess_page<span class="br0">&#40;</span><span class="sy0">&amp;</span><span class="re0">$vars</span><span class="br0">&#41;</span></code></span>, and your page template would look like <span class="geshifilter"><code class="text geshifilter-text">page.tpl.php</code></span>.</p> <p>Fortunately, the node object is already in the page preprocessor's variables, as <span class="geshifilter"><code class="text geshifilter-text">$vars['node']</code></span>. Let's break this down a little with the available API functions:</p> <div class="geshifilter"> <pre class="php geshifilter-php"><span class="co1">// filename relative to files directory</span> <span class="co1">// e.g. 'masthead.jpg'</span> <span class="re0">$filename</span> <span class="sy0">=</span> <span class="re0">$vars</span><span class="br0">&#91;</span><span class="st_h">'node'</span><span class="br0">&#93;</span><span class="sy0">-&gt;</span><span class="me1">field_image</span><span class="br0">&#91;</span><span class="st_h">'und'</span><span class="br0">&#93;</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span><span class="br0">&#91;</span><span class="st_h">'filename'</span><span class="br0">&#93;</span><span class="sy0">;</span> &nbsp; <span class="co1">// relative path to raw image in 'scheme' format</span> <span class="co1">// e.g. 'public://masthead.jpg'</span> <span class="re0">$image_uri</span> <span class="sy0">=</span> file_build_uri<span class="br0">&#40;</span><span class="re0">$filename</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; <span class="co1">// relative path to 'styled' image (using an arbitrary 'banner' style) in 'scheme' format</span> <span class="co1">// e.g. 'public://styles/banner/public/masthead.jpg'</span> image_style_path<span class="br0">&#40;</span><span class="st_h">'banner'</span><span class="sy0">,</span> <span class="re0">$filename</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; <span class="co1">// raw URL with an image style</span> <span class="co1">// e.g. ''</span> <span class="co1">// [passing a raw path here will return a very ungraceful fatal error, see]</span> <span class="re0">$vars</span><span class="br0">&#91;</span><span class="st_h">'masthead_raw'</span><span class="br0">&#93;</span> <span class="sy0">=</span> image_style_url<span class="br0">&#40;</span><span class="st_h">'banner'</span><span class="sy0">,</span> <span class="re0">$image_uri</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; <span class="co1">// html for a styled image</span> <span class="co1">// e.g. '&lt;img typeof=&quot;foaf:Image&quot; src=&quot;; alt=&quot;&quot; /&gt;'</span> <span class="re0">$vars</span><span class="br0">&#91;</span><span class="st_h">'masthead'</span><span class="br0">&#93;</span> <span class="sy0">=</span> theme<span class="br0">&#40;</span><span class="st_h">'image_style'</span><span class="sy0">,</span> <span class="kw3">array</span><span class="br0">&#40;</span><span class="st_h">'style_name'</span> <span class="sy0">=&gt;</span> <span class="st_h">'banner'</span><span class="sy0">,</span> <span class="st_h">'path'</span> <span class="sy0">=&gt;</span> <span class="re0">$image_uri</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span></pre></div> <p>So to do something useful with this:</p> <div class="geshifilter"> <pre class="php geshifilter-php"><span class="kw2">function</span> THEME_preprocess_page<span class="br0">&#40;</span><span class="sy0">&amp;</span><span class="re0">$vars</span><span class="br0">&#41;</span> <span class="br0">&#123;</span> <span class="co1">// pull the masthead image into the page template</span> <span class="kw1">if</span> <span class="br0">&#40;</span><span class="kw3">isset</span><span class="br0">&#40;</span><span class="re0">$vars</span><span class="br0">&#91;</span><span class="st_h">'node'</span><span class="br0">&#93;</span><span class="sy0">-&gt;</span><span class="me1">field_image</span><span class="br0">&#41;</span> <span class="sy0">&amp;&amp;</span> <span class="sy0">!</span><span class="kw3">empty</span><span class="br0">&#40;</span><span class="re0">$vars</span><span class="br0">&#91;</span><span class="st_h">'node'</span><span class="br0">&#93;</span><span class="sy0">-&gt;</span><span class="me1">field_image</span><span class="br0">&#91;</span><span class="st_h">'und'</span><span class="br0">&#93;</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span><span class="br0">&#91;</span><span class="st_h">'filename'</span><span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span> &nbsp; <span class="re0">$filename</span> <span class="sy0">=</span> <span class="re0">$vars</span><span class="br0">&#91;</span><span class="st_h">'node'</span><span class="br0">&#93;</span><span class="sy0">-&gt;</span><span class="me1">field_image</span><span class="br0">&#91;</span><span class="st_h">'und'</span><span class="br0">&#93;</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span><span class="br0">&#91;</span><span class="st_h">'filename'</span><span class="br0">&#93;</span><span class="sy0">;</span> &nbsp; <span class="re0">$image_uri</span> <span class="sy0">=</span> file_build_uri<span class="br0">&#40;</span><span class="re0">$filename</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; <span class="co1">// url</span> <span class="re0">$vars</span><span class="br0">&#91;</span><span class="st_h">'masthead_raw'</span><span class="br0">&#93;</span> <span class="sy0">=</span> image_style_url<span class="br0">&#40;</span><span class="st_h">'banner'</span><span class="sy0">,</span> <span class="re0">$image_uri</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; <span class="co1">// html</span> <span class="re0">$vars</span><span class="br0">&#91;</span><span class="st_h">'masthead'</span><span class="br0">&#93;</span> <span class="sy0">=</span> theme<span class="br0">&#40;</span><span class="st_h">'image_style'</span><span class="sy0">,</span> <span class="kw3">array</span><span class="br0">&#40;</span><span class="st_h">'style_name'</span> <span class="sy0">=&gt;</span> <span class="st_h">'banner'</span><span class="sy0">,</span> <span class="st_h">'path'</span> <span class="sy0">=&gt;</span> <span class="re0">$image_uri</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span> <span class="br0">&#125;</span></pre></div> <p>Now in your page template, you have <span class="geshifilter"><code class="text geshifilter-text">$masthead</code></span> (HTML) and <span class="geshifilter"><code class="text geshifilter-text">$masthead_raw</code></span> (URL) [with the <span class="geshifilter"><code class="text geshifilter-text">$vars</code></span> variables now being independently named] so you can do something like this in a PHP block:</p> <div class="geshifilter"> <pre class="php geshifilter-php"><span class="kw2">&lt;?php</span> <span class="kw1">if</span> <span class="br0">&#40;</span><span class="re0">$masthead</span><span class="br0">&#41;</span><span class="sy0">:</span> <span class="sy1">?&gt;</span> &lt;div id=&quot;masthead&quot;&gt;<span class="kw2">&lt;?php</span> <span class="kw1">echo</span> <span class="re0">$masthead</span><span class="sy0">;</span> <span class="sy1">?&gt;</span>&lt;/div&gt; <span class="kw2">&lt;?php</span> <span class="kw1">endif</span><span class="sy0">;</span> <span class="sy1">?&gt;</span></pre></div> <p>A quick-and-dirty alternative would be, directly in page.tpl.php:</p> <div class="geshifilter"> <pre class="php geshifilter-php"><span class="kw1">if</span> <span class="br0">&#40;</span><span class="sy0">!</span><span class="kw3">empty</span><span class="br0">&#40;</span><span class="re0">$node</span><span class="sy0">-&gt;</span><span class="me1">field_image</span><span class="br0">&#91;</span><span class="st_h">'und'</span><span class="br0">&#93;</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span><span class="br0">&#91;</span><span class="st_h">'filename'</span><span class="br0">&#93;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span> <span class="kw1">echo</span> theme<span class="br0">&#40;</span><span class="st_h">'image_style'</span><span class="sy0">,</span> <span class="kw3">array</span><span class="br0">&#40;</span><span class="st_h">'style_name'</span> <span class="sy0">=&gt;</span> <span class="st_h">'banner'</span><span class="sy0">,</span> <span class="st_h">'path'</span> <span class="sy0">=&gt;</span> file_build_uri<span class="br0">&#40;</span><span class="re0">$node</span><span class="sy0">-&gt;</span><span class="me1">field_image</span><span class="br0">&#91;</span><span class="st_h">'und'</span><span class="br0">&#93;</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span><span class="br0">&#91;</span><span class="st_h">'filename'</span><span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span></pre></div> <p>(note <span class="geshifilter"><code class="text geshifilter-text">$vars['node']</code></span> is now <span class="geshifilter"><code class="text geshifilter-text">$node</code></span>)</p> <p>The shorthand version will work, but separating the logic from the output (with a preprocessor and template) is the "best practices" approach.</p> drupal Tue, 12 Apr 2011 18:00:19 +0000 ben 7003 at Good Git GUIs for Mac <p>Today on Twitter, I saw two <a href="">Git</a> apps for OSX worth spreading:</p> <p>Via <a href="!/jayroh/status/55816057356693504">@jayroh</a>: <a href=""><em>Brotherbard</em></a> has an "experimental fork" of GitX which has a nice GUI for branch trees, local and remote branches, staging, cherry-picking, and rebasing. (Some time I have to try the last two with the app in particular.) It's free.</p> <p>Via <a href="!/mortendk/status/55781789775568898">@mortendk</a>, if you're willing to spend $59 (and I generally don't mind paying for great Mac software -developers have to eat!) - check out <a href=""><em>Tower</em></a>, which claims to be and likely is "the most powerful Git client for Mac."</p> <p>Of course there's always the terminal, which I'll still use for most operations - <span class="geshifilter"><code class="text geshifilter-text">git log --graph</code></span> shows a rudimentary graph, for instance. Visualizing complex branch trees in a GUI is really nice, and GitK (the Java-based app which comes with the Mac Git package) doesn't have a lot going for it.</p> <p>(If you're new to Git, check out the excellent (and free) <a href="">Pro Git</a> book, or (via <a href="!/danigrrl/status/55774323599872000">@danigrrl</a>), <a href="">a Git tutorial for designers</a>.</p> git Thu, 07 Apr 2011 02:48:24 +0000 ben 6996 at Lullabot post on configuring Varnish for Drupal <p>Lullabot has a beautifully comprehensive post on <a href="">configuring Varnish for Drupal</a>. This is a realm that has lacked good Drupal documentation in the past, so I haven't utilized Varnish as much as I should, but that will change now.</p> <p><em>Added:</em> Another great Varnish resource is schoefmax's <a href="">Caching With Varnish</a> presentation.</p> varnish Tue, 05 Apr 2011 15:25:18 +0000 ben 6979 at Set up Hudson/Jenkins to notify via GTalk <p>I'm in the process of setting up a <a href="">Jenkins</a> (aka Hudson) Continuous Integration server. I wanted some of the jobs to notify me in real time, and what better way than IM (specifically GTalk), which I have running most of the time I'm online?</p> <p>It took me a little while to get this working, and some of the documentation was lacking, so I thought I'd write quick notes on what worked for me.</p> <p>This assumes you have Jenkins already set up. You'll need to install the <a href="">Jabber Plugin</a> (and its dependency Instant Messaging Plugin).</p> <p>With that running, from the dashboard, go to <strong>Manage Jenkins</strong> -&gt; <strong>Configure System</strong>. Scroll down to <strong>Jabber Notification</strong>. Enable it. Enter your credentials according to the official docs:</p> <blockquote><p>If you have a GMail or Google Mail account, specify the full e-mail address, such as <a href=""></a>. If you are a Google App user, specify full e-mail address like <a href=""></a>.<br /> (If you have a european gmail account, you may need <a href=""></a>)<br /> Expand "Advanced settings" and put server address <strong></strong> otherwise it won't work.</p></blockquote> <p>Set the <strong>port</strong> in the Advanced Settings to 5223.<br /> Make sure your firewall allows TCP access to 5223! (On an Ubuntu server with <a href="">UFW</a>, do this with <span class="geshifilter"><code class="text geshifilter-text">sudo ufw allow 5223</code></span>.)</p> <p>In the job configuration, enable <strong>Jabber Notification</strong> and put in the account your local IM client is logged in with. (In my case I set up a Google Apps user for Jenkins, which talks to my personal GTalk account, but that might not be necessary.)</p> <p>Finally, add the GTalk account Jenkins is using to your buddy list (by inviting in GMail, adding in iChat, etc).</p> <p>Now run the job and read the console output. If everything's set up right, you should get an IM notification.</p> jenkins Sat, 02 Apr 2011 00:34:35 +0000 ben 6972 at Quick tip: Extract all unique IP addresses from Apache logs <p>Say you wanted to count the number of unique IP addresses hitting your Apache server. It's very easy to do in a Linux (or compatible) shell. </p> <p>First locate the log file for your site. The generic log is generally at <span class="geshifilter"><code class="text geshifilter-text">/var/log/httpd/access_log</code></span> or <span class="geshifilter"><code class="text geshifilter-text">/var/log/apache2/access_log</code></span> (depending on your distro). For virtualhost-specific logs, check the conf files or (if you have one active site and others in the background) run <span class="geshifilter"><code class="bash geshifilter-bash"><span class="kw2">ls</span> <span class="re5">-alt</span> <span class="sy0">/</span>var<span class="sy0">/</span>log<span class="sy0">/</span>httpd</code></span> to see which file is most recently updated.</p> <p>Then spit out one line to see the format:<br /> <span class="geshifilter"><code class="bash geshifilter-bash"><span class="kw2">tail</span> <span class="re5">-n1</span> <span class="sy0">/</span>var<span class="sy0">/</span>log<span class="sy0">/</span>httpd<span class="sy0">/</span>access_log</code></span></p> <p>Find the IP address in that line and count which part it is. In this example it's the 1st part (hence <span class="geshifilter"><code class="text geshifilter-text">$1</code></span>):</p> <p><span class="geshifilter"><code class="bash geshifilter-bash"><span class="kw2">cat</span> <span class="sy0">/</span>var<span class="sy0">/</span>log<span class="sy0">/</span>httpd<span class="sy0">/</span>access_log <span class="sy0">|</span> <span class="kw2">awk</span> <span class="st_h">'{print $1}'</span> <span class="sy0">|</span> <span class="kw2">sort</span> <span class="sy0">|</span> <span class="kw2">uniq</span> <span class="sy0">&gt;</span> <span class="sy0">/</span>var<span class="sy0">/</span>log<span class="sy0">/</span>httpd<span class="sy0">/</span>unique-ips.log</code></span></p> <p>You'll now have a list of sorted, unique IP addresses. To figure out the time range, run<br /> <span class="geshifilter"><code class="bash geshifilter-bash"><span class="kw2">head</span> <span class="re5">-n1</span> <span class="sy0">/</span>var<span class="sy0">/</span>log<span class="sy0">/</span>httpd<span class="sy0">/</span>access_log</code></span><br /> to see the start point (and the <span class="geshifilter"><code class="text geshifilter-text">tail</code></span>) syntax above for the end point.)</p> <p>To count the number of IPs:<br /> <span class="geshifilter"><code class="bash geshifilter-bash"><span class="kw2">cat</span> &nbsp;<span class="sy0">/</span>var<span class="sy0">/</span>log<span class="sy0">/</span>httpd<span class="sy0">/</span>unique-ips.log <span class="sy0">|</span> <span class="kw2">wc</span> <span class="re5">-l</span></code></span></p> <p>To paginate:<br /> <span class="geshifilter"><code class="bash geshifilter-bash"><span class="kw2">more</span> &nbsp;<span class="sy0">/</span>var<span class="sy0">/</span>log<span class="sy0">/</span>httpd<span class="sy0">/</span>unique-ips.log</code></span></p> <p>Have fun.</p> apache linux Wed, 30 Mar 2011 19:41:09 +0000 ben 6966 at Track Freshbooks Expenses in Google Docs with PHP and XML <p>I've been trying to automate as much of my financial forecasting as possible, with coding up front that will last a while. My primary tools are <a href="">Freshbooks</a> (for expense and invoice tracking) and <a href="">Google Docs</a> for spreadsheets. I wrote yesterday about pulling data from one spreadsheet into another using <a href=""><span class="geshifilter"><code class="text geshifilter-text">importRange</code></span></a>. Last night I took it several steps further, pulling expenses from the Freshbooks API into XML, then XML to GDocs, and automating tax calculations based on expense category.</p> <h4>1. Freshbooks Expenses to XML</h4> <p>Building on an existing <a href="">freshbooks-php library</a>, I wrote a PHP script called <a href=""><strong>freshbooks_expenses_xml</strong></a>. (Link goes to GitHub.)</p> <p>To get it set up, create a keys.php file, and put the whole package on your server somewhere. Play with the parameters described in the readme to get different XML output.</p> <h4>2. XML to Google Docs</h4> <p>In cell A1 of a clean spreadsheet, enter this function:<br /> <span class="geshifilter"><code class="text geshifilter-text">=importXML(&quot;;date_to=2011-12-31&amp;headers=1&amp;&quot;, &quot;//expenses/*&quot;)</code></span>.<br /> GDocs will fetch the data and populate the spreadsheet. (Note: I had some trouble making the headers consistent with the columns, and worked around it; you might want to do the same by omitting <span class="geshifilter"><code class="text geshifilter-text">headers=1</code></span> in the function and putting in your own.)</p> <h4>3. Making useful tax calculations with the data</h4> <p>For estimated quarterly taxes (as an LLC), I need to know my revenue (calculated in another spreadsheet, not yet but possibly soon also pulled automatically from Freshbooks) minus my business expenses. As I learned doing taxes for 2010, not all expense categories are equal: Meals &amp; Entertainment, for example, is generally deducted at 50%, while others are 100%. This is easy to do with <a href="">custom GDocs functions</a>. Next to my expenses (pulled automatically), I have a column for Month, a column for Quarter (using a custom function), and a column for Deduction, using the amount and the category. (To write a custom function, go to <em>Tools &gt; Scripts &gt; Script Editor</em>.)</p> <p>Finally, in my income sheet, I use <span class="geshifilter"><code class="text geshifilter-text">sumif()</code></span> on the range in the other [expenses] sheet with the calculated deductions for that quarter, times my expected tax rate, and I know how much quarterly taxes to pay!</p> <p><em>(Update: A revised version of this post now appears on the <a href="">FreshBooks Developer Blog</a>.)</em></p> freshbooks gdocs php Thu, 24 Mar 2011 15:34:15 +0000 ben 6955 at Google Docs tip: importRange function to pull in other spreadsheet data <p>A great Google Docs Spreadsheets function I've been using a lot lately is <a href=";answer=98757"><code><strong>importRange</strong></code></a>, for pulling in data from one spreadsheet into another. For example, I have one spreadsheet with my income, including calculating my taxes, and another spreadsheet with my budget, where I pull taxes in using <code>importRange</code>. Then I can fill out each sheet and not worry about the other.</p> <p>The syntax as described in the documentation is:</p> <blockquote><p><code>=importrange("abcd123abcd123", "sheet1!A1:C10")</code><br /> "abcd123abcd123" is the value in the "key=" attribute on the URL of the target spreadsheet and "sheet1!A1:C10" is the range which is desired to be imported.</p></blockquote> <p>A few notes/lessons I've learned:</p> <ul> <li>Make sure to use double quotes not single quotes, or it'll give a big "ERROR".</li> <li>The values update when you refresh the spreadsheet, and possibly on some timers, but if you want to update on the fly (say, with both sheets open), go to the start of the <code>importRange</code> block and press Ctrl+R (for Refresh).</li> </ul> <p>Another great function I've been using is <a href=";answer=155266"><code>sumif</code></a> (i.e. a conditional sum) - very useful for financial spreadsheets.</p> gdocs Thu, 24 Mar 2011 00:58:00 +0000 ben 6949 at Switching PHP extension to <p>I was having some <a href="">caching issues</a> earlier that I concluded were memcache-related. The memcache terminology is confusing: 'memcache' is the colloquial name, 'memcached' is the daemon, and php has both memcache and memcached extensions. The <a href="">memcache module for Drupal</a> supports both, but recommends the memcached version. I was running the other one, so I decided to switch to see if that would fix my problems.</p> <p>The swap was harder than I expected, so here's how I did it, in case anyone else wants to do the same. This assumes you already have the daemon and old memcache library working correctly.</p> <p>First try the simple method. This didn't work for me because I didn't have <a href="">libmemcached</a> installed. If it works for you, you're lucky:</p> <p><span class="geshifilter"><code class="bash geshifilter-bash"><span class="kw2">sudo</span> pecl <span class="kw2">install</span> memcached</code></span><br /> (I specified the version, <span class="geshifilter"><code class="text geshifilter-text">memcached-1.0.2</code></span>, to make sure I got the latest stable release, but that number might change by the time you read this.)</p> <p>Anyway, that didn't work for me - I got an error, <span class="geshifilter"><code class="text geshifilter-text">Can't find libmemcached headers&quot;</code></span>. The <a href="">documentation</a> specifies a <span class="geshifilter"><code class="text geshifilter-text">--with-libmemcached-dir</code></span> parameter to handle this. But I didn't have the library installed anywhere, so I had to install it. (<em>Fully</em> install it, not just download it.)</p> <p>Using <span class="geshifilter"><code class="text geshifilter-text">/opt</code></span> to hold the files, the <a href="wget">latest version</a> of libmemcached, and running as <span class="geshifilter"><code class="text geshifilter-text">root</code></span> (otherwise add <span class="geshifilter"><code class="text geshifilter-text">sudo</code></span> to each line, or at least to the <span class="geshifilter"><code class="text geshifilter-text">make install</code></span> step).</p> <div class="geshifilter"> <pre class="bash geshifilter-bash"><span class="kw3">cd</span> <span class="sy0">/</span>opt <span class="kw2">wget</span> http:<span class="sy0">//</span><span class="sy0">/</span>libmemcached<span class="sy0">/</span><span class="nu0">1.0</span><span class="sy0">/</span>0.40a<span class="sy0">/</span>+download<span class="sy0">/</span>libmemcached-<span class="nu0">0.40</span>.tar.gz <span class="kw2">tar</span> <span class="re5">-xzvf</span> libmemcached-<span class="nu0">0.40</span>.tar.gz <span class="kw3">cd</span> libmemcached-<span class="nu0">0.40</span> .<span class="sy0">/</span>configure <span class="kw2">make</span> <span class="kw2">make</span> <span class="kw2">install</span></pre></div> <p>Now try the simple method again: <span class="geshifilter"><code class="text geshifilter-text">sudo pecl install memcached</code></span>. If that still doesn't work, specify the directory manually:</p> <div class="geshifilter"> <pre class="bash geshifilter-bash"><span class="kw3">cd</span> <span class="sy0">/</span>opt pecl download memcached-1.0.2 <span class="kw2">tar</span> zxvf memcached-1.0.2.tgz <span class="kw3">cd</span> memcached-1.0.2 phpize .<span class="sy0">/</span>configure <span class="re5">--with-libmemcached-dir</span>=<span class="sy0">/</span>opt<span class="sy0">/</span>libmemcached-<span class="nu0">0.40</span><span class="sy0">/</span>libmemcached <span class="kw2">make</span> <span class="kw2">make</span> <span class="kw2">install</span></pre></div> <p>(Play around with the <span class="geshifilter"><code class="text geshifilter-text">configure</code></span> line there if it still fails. I tried 100 variations until I got it working - I think with <span class="geshifilter"><code class="text geshifilter-text">pecl install</code></span> after the full <span class="geshifilter"><code class="text geshifilter-text">make install</code></span> on libmemcached - but your results may vary.</p> <p>If this worked, there should now be a file in your PHP extensions directory.</p> <p>Now for the php config: the documentation on memcached's runtime configuration is sparse. The Drupal module recommends setting <span class="geshifilter"><code class="text geshifilter-text">memcache.hash_strategy=&quot;consistent&quot;</code></span>, however, I'm not sure if this has any effect on In my setup there was a conf.d/memcache.ini file, symlinked to cli/conf.d (for command line config) and apache2/conf.d. I changed the extension call to the new file, removed extraneous configs that didn't seem to be documented anywhere, and set the <span class="geshifilter"><code class="text geshifilter-text">hash_strategy</code></span> for good measure. Then I checked the config with <span class="geshifilter"><code class="text geshifilter-text">apache2ctl configtest</code></span> (will differ by distro), that checked out, so I restarted apache. phpinfo() showed the new extension, my caching problem went away, and all seems well so far.</p> memcache php Mon, 14 Mar 2011 00:56:22 +0000 ben 6919 at Mac tip: pipe text from the terminal to the clipboard <p>Via <a href="">Fantastip</a> (from a Google search): if you're in the Mac terminal and want to <a href="">pipe</a> output to the OSX clipboard, use <span class="geshifilter"><code class="bash geshifilter-bash">pbcopy</code></span>, e.g.<br /> <span class="geshifilter"><code class="bash geshifilter-bash"><span class="kw3">echo</span> <span class="st0">&quot;copyme&quot;</span> <span class="sy0">|</span> pbcopy</code></span></p> mac Sat, 26 Feb 2011 18:24:22 +0000 ben 6873 at on Git: how to preserve GitHub repositories for existing modules <p>I've been keeping all my code on <a href="">GitHub</a>, waiting for <a href=""></a> to move from CVS to Git. Well, <a href="">it just did</a>! No more bitching about the lousy CVS process, time to start maintaining my module releases again.</p> <p>I want to keep my GitHub repositories' histories, of course. (Keeping the actual code on GitHub isn't critical, it's the commit/branches/tags that are important.) The new repositories on d.o were copied from the old CVS repositories, which contained (because I was lazy / holding out for Git) only final releases at best, or nothing at all, so I want the GitHub history to take precedence.</p> <p><a href="">Git</a> is "distributed," meaning the full repository is cloned everywhere it's used, including multiple "remotes." (This is unlike CVS or SVN, where there is a single remote server, and local "checkouts.") Anyway, this isn't the place for a Git tutorial (see the great [free] <a href="">Pro-Git book</a> to learn Git and some of my past <a href="">Git tricks</a>). I had assumed when d.o moved to Git, I could simply add a new remote and it would work automagically. It almost does, but needs a little extra work.</p> <p>There is some official d.o documentation for <a href="">Copying your repository onto from an existing repository</a>. It clones the GitHub repo as a "mirror" and pushes the merged repo to d.o. This didn't work for me, for some reason. Maybe it'll work for you - try it first and see if it does. This worked for me instead.</p> <p>This is all done through the terminal, from the directory where I've cloned my existing GitHub repository. My remote name for Github is <em>origin</em> and the branch is <em>master</em> (the standard convention). I'm going to leave GitHub at <em>origin</em> and add as <em>drupal</em>. To get the exact path to your repository, go to the Git Instructions tab in your project page. (I assume you've already agreed to the new TOS and set up SSH keys as it explains there.)</p> <p><em>Add a new remote to the existing (but different) d.o repository:</em><br /> <span class="geshifilter"><code class="text geshifilter-text">git remote add drupal</code></span></p> <p>(A digression: if you try at this point, <span class="geshifilter"><code class="text geshifilter-text">git push drupal master</code></span>, it'll throw an error - </p> <pre> ! [rejected] master -> master (non-fast-forward)</pre><p> - because the repository histories are different and can't be merged normally.)</p> <p>Instead, pull the git.d.o branch alongside your existing one. (Note that the new 'drupal' remote has its own 'master' branch separate from the one we want. Hence we're <span class="geshifilter"><code class="text geshifilter-text">fetch</code></span>ing and not <span class="geshifilter"><code class="text geshifilter-text">pull</code></span>ing.)<br /> <span class="geshifilter"><code class="text geshifilter-text">git fetch drupal</code></span></p> <p>Then merge, but keeping the existing history ("ours") as the correct version:<br /> <span class="geshifilter"><code class="text geshifilter-text">git merge remotes/drupal/master --strategy=ours</code></span></p> <p>One problem at this point: the git.d.o migration made a good change: "Stripping CVS keywords" (like <span class="geshifilter"><code class="text geshifilter-text">$Id$</code></span>). That's now gone, because we've dismissed the d.o history. So we get it back with a cherry-pick: <span class="geshifilter"><code class="text geshifilter-text">git log</code></span> shows the commit hash from the migration, so copy the start of the hash, and re-apply it:<br /> <span class="geshifilter"><code class="text geshifilter-text">git cherry-pick ####</code></span>.</p> <p>Check your code to make sure it's good... then<br /> <span class="geshifilter"><code class="text geshifilter-text">git push drupal master</code></span><br /> And it's all up!</p> <p>To create a tag for a new release (example):<br /> <span class="geshifilter"><code class="text geshifilter-text">git tag 6.x-1.0-alpha1</code></span><br /> <span class="geshifilter"><code class="text geshifilter-text">git push drupal 6.x-1.0-alpha1</code></span></p> <p>(Interested in any feedback saying why this is stupid / why the other approaches should have worked / why this is causing the d.o infrastructure horrible damage... all I know is, it seemed to work for me.)</p> drupal git Fri, 25 Feb 2011 20:13:29 +0000 ben 6865 at Drupal as an Application Framework: Unofficially competing in the BostonPHP Framework Bakeoff <p>BostonPHP hosted a <a href="">PHP Framework Bake-Off</a> last night, a competition among four application frameworks: <a href="">CakePHP</a>, <a href="">Symfony</a>, <a href="">Zend</a>, and <a href="">CodeIgniter</a>. A developer coding in each framework was given 30 minutes to build a simple <a href="">job-posting app</a> (wireframes publicized the day before) in front of a live audience.</p> <p>I asked the organizer if I could enter the competition representing <a href="">Drupal</a>. He replied that Drupal was a Content Management System, not a framework, so it should compete against <a href="">Wordpress</a> and <a href="">Joomla</a>, not the above four. My opinion on the matter was and remains as follows: <ol> <li>The differences between frameworks and robust CMSs are not well defined, and Drupal straddles the line between them.</li> <li>The test of whether a toolkit is a framework is whether the following question yields an affirmative answer: “Can I use this toolkit to build a given application?” Here Drupal clearly does, and for apps far more advanced that this one.</li> <li>The exclusion reflects a kind of coder-purist snobbery ("it's not a framework if you build any of it in a UI") and lack of knowledge about Drupal's underlying code framework.</li> <li>In a fair fight, Drupal would either beat Wordpress hands-down building a complex app (because its APIs are far more robust) or fail to show its true colors with a simple blog-style site that better suits WP.</li> </ol> <p>Needless to say, I wasn't organizing the event, so Drupal was not included.</p> <p><strong>So I entered Drupal into the competition anyway.</strong> While the first developer (using CakePHP) coded for 30 minutes on the big screen, I built the app in my chair from the back of the auditorium, starting with a clean Drupal 6 installation, recording my screen. Below is that recording, with narration added afterwards. (Glance at the <a href="">app wireframes</a> first to understand the task.)</p> <p>Worth noting: <ul> <li>I used Drupal 6 because I know it best; if this were a production app, I would be using the newly released <a href="">Drupal 7</a>.</li> <li>I start, as you can see, with an empty directory on a Linux server and an Apache <a href="">virtualhost</a> already defined.</li> <li>I build a small custom module at the end just to show that code is obviously involved at anything beyond the basic level, but most of the setup is done in the UI.</li> </ul> <p><br/></p> <p><iframe src="" width="681" height="383" frameborder="0"></iframe></p> <p>One irony of the framework-vs-CMS argument is that what makes these frameworks appealing is precisely the automated helpers - be it scaffolding in Symfony, baking in CakePHP, raking in Rails, etc - that all reduce the need for wheel-reinventing manual coding. After the tools do their thing, the frameworks require code, and Drupal requires (at the basic level) visual component building (followed, of course, by code as the app gets more custom/complex). Why is one approach more "framework"-y or app-y than the other? If I build a complex app in Drupal, and my time spent writing custom code outweighs the UI work (as it usually does), does that change the nature of the framework?</p> <p>Where the CMS nature of Drupal hits a wall in my view is in building apps that aren't compatible with Drupal's basic assumptions. It assumes the basic unit - a piece of "content" called a "node" - should have a title, body, author, and date, for example. If that most basic schema doesn't fit what you're trying to build, then you probably don't want to use Drupal. But for many apps, it fits well enough, so Drupal deserves a spot on the list of application frameworks, to be weighed for its pros and cons on each project just like the rest.</p> boston drupal php Wed, 23 Feb 2011 16:20:29 +0000 ben 6848 at Launched: the new! <img style="float: right; margin: 0 0 10px 10px;" src="" alt="" /><p>After four months of great work, we recently launched the new <a href="">!</a></p><p><a href="">New Leaf Digital</a> was contracted by <a href="">ViewPoint Creative</a> to implement a website overhaul for <a href="">GlobalPost</a>, America's only primarily-global news organization. The site runs on Drupal 6, with 70,000 nodes, three million monthly hits (pre-relaunch), and was built with Drupal best practices throughout.</p> <p>We'll be publishing a series of blog posts about the site over the coming weeks.</p> drupal Wed, 23 Feb 2011 01:41:00 +0000 ben 6847 at Drupal: Re-Sync Content Taxonomy from core taxonomy <p>I'm working on a major Drupal (6) site migration now, and one of the components that needs to be automatically migrated is taxonomies, mapping old vocabularies and terms to new ones. Core taxonomy stores node terms in the <span class="geshifilter"><code class="text geshifilter-text">term_node</code></span> table, which is fairly easy to work with. However, two taxonomy supplements are making the process a little more complicated:</p> <p>First, we're using <a href=""><strong>Content Taxonomy</strong></a> to generate prettier/smarter taxonomy fields on node forms. The module allows partial vocabularies to be displayed, in nicer formats than the standard multi-select field. The problem with Content Taxonomy for the migration, however, is that it duplicates the term_node links into its own CCK tables. If a node is mapped to a term in term_node, but not in the Content Taxonomy table, when the taxonomy list appears on the node form, the link isn't set.</p> <p>Ideally, the module would have the ability to re-sync from node_term built in. There's an issue thread related to this - <a href=""><em>Keep core taxonomy &amp; CCK taxonomy synced</em></a> - but it's not resolved.</p> <p>So I wrote a <a href="">Drush</a> command to do this. To run it, rename "MODULE" to your custom module, <strong><em>backup your database</strong></em>, read the code so you understand what it does, and run <span class="geshifilter"><code class="text geshifilter-text">drush sync-content-taxonomy --verbose</code></span>.</p> <p style="font-weight: bold;">Warning: This code only works properly on *shared* CCK fields, that is, fields with their own tables (<span class="geshifilter"><code class="text geshifilter-text">content_field_X</code></span> tables, not a common <span class="geshifilter"><code class="text geshifilter-text">content_type_Y</code></span> table). Don't use this if your fields are only used by one content type.</p> <script src=""></script><p><span style="font-size:.8em">[<a href="">Embedded Gist - if it's not showing up, click here</a>.]</span></p> <p>The other taxonomy supplement that needs to be migrated is <a href="">Primary Term</a>. I'll be writing a similar Drush script for this in the next few days.</p> <p><em>Update 1/27:</em> There was a bug in the way it cleared the tables before rebuilding, should be good now. (Make sure to download the latest Gist.)</p> drupal drush taxonomy Fri, 21 Jan 2011 17:19:55 +0000 ben 6807 at Core hacks: multiple crons and ajax error handling <p>Prodded by the folks at the Boston Drupal meetup tonight, I submitted two patches to the Drupal core (6.19) issue queue that I use on my sites:</p> <p><a href="">Better AJAX/AHAH error handling in ahah.js</a></p> <p><a href="">Multiple crons</a></p> drupal Wed, 05 Jan 2011 01:09:25 +0000 ben 6786 at Adding Logic, Timestamp columns to Drupal's computed fields I just <a href="">posted some code</a> I'm working on to the issue thread of Drupal's <a href="">Computed Field</a> module, copying it here as well. (Background on the module: it allows a value to be "computed" when a node is saved, rather than entered manually like normal CCK fields.) <blockquote><p>I have a need to extend Computed Field with 2 new columns: "Logic" (where I store a serialized array of the steps in the calculation of the computed value) and Timestamp (where I track the timestamp of the computed field, separately from <code>$node-&gt;changed</code>, for saving-in-place updating.)<br> I'm not sure if this is of general interest to the module's users or maintainers, but I thought I'd share the code I've written so far (modifying computed_field), and see if anyone else has a use for this. So far it adds checkboxes to toggle logic and timestamp, creates the columns, and saves the data. The next step will be integrating the new columns properly with Views, so I can run a report showing the computed values, times they were computed (with filtering), and the logic. </p></blockquote> Patch (so far) is <a href="">here</a>. drupal Sun, 02 Jan 2011 06:04:14 +0000 ben 6784 at