Tech Blog :: Tech Blog

Tech Blog


Jul 8 '12 10:12am

Node.js and Arduino: Building the Nodeuino example

I've wanted to start building robots with an Arduino for years. I've had several fits and starts - building the examples in the Getting Started with Arduino book, starting to learn basic electronics from Make: Electronics. But I never put enough time into it to get much further.

I even included my electronics kit in the limited space we had moving to Argentina. But it mostly sat on a shelf and got dusty.

Until I started seeing all the buzz around Node.js and Arduino - nodeuino, node.js-powered robots at nodeconf, Rick Waldron's awesome johnny-five project… and realized I'm way too late to the party and need to get moving.

So I made it my weekend project yesterday, and built the walkLED example in the Nodeuino documentation. It uses 6 LEDs and 2 switches, I only have 4 LEDs and 1 switch that doesn't fit on a breadboard, so I simplified it. Here's a little video showing it working. It pans from the circuit diagram in the nodeuino docs, to the circuit in the Fritzing circuit-diagramming app, to my circuit built on the breadboard, to the terminal showing the node.js app, to the speed being controlled by the potentiometer.

None of this is original yet - it's all from the example circuit and code created by Sebastian Müller (github.com/semu). My task now is to understand that code and build on it. Then I want to build a circuit that reads some public Twitter feed and shows the number of new tweets on a digital LCD display.

Jun 20 '12 2:06pm

Understanding MapReduce in MongoDB, with Node.js, PHP (and Drupal)

(Note, some of the details in this post may be outdated. For example, MongoDB now supports Aggregation queries which simplify some of these use cases. However, the MapReduce concepts are probably still the same.)

MongoDB's query language is good at extracting whole documents or whole elements of a document, but on its own it can't pull specific items from deeply embedded arrays, or calculate relationships between data points, or calculate aggregates. To do that, MongoDB uses an implementation of the MapReduce methodology to iterate over the dataset and extract the desired data points. Unlike SQL joins in relational databases, which essentially create a massive combined dataset and then extract pieces of it, MapReduce iterates over each document in the set, "reducing" the data piecemeal to the desired results. The name was popularized by Google, which needed to scale beyond SQL to index the web. Imagine trying to build the data structure for Facebook, with near-instantaneous calculation of the significance of every friend's friend's friend's posts, with SQL, and you see why MapReduce makes sense.

I've been using MongoDB for two years, but only in the last few months starting using MapReduce heavily. MongoDB is also introducing a new Aggregation framework in 2.1 that is supposed to simplify many operations that previously needed MapReduce. However, the latest stable release as of this writing is still 2.0.6, so Aggregation isn't officially ready for prime time (and I haven't used it yet).

This post is not meant to substitute the copious documentation and examples you can find across the web. After reading those, it still took me some time to wrap my head around the concepts, so I want to try to explain those as I came to understand them.

The Steps

A MapReduce operation consists of a map, a reduce, and optionally a finalize function. Key to understanding MapReduce is understanding what each of these functions iterates over.

Map

First, map runs for every document retrieved in the initial query passed to the operation. If you have 1000 documents and pass an empty query object, it will run 1000 times.

Inside your map function, you emit a key-value pair, where the key is whatever you want to group by (_id, author, category, etc), and the value contains whatever pieces of the document you want to pass along. The function doesn't return anything, because you can emit multiple key-values per map, but a function can only return 1 result.

The purpose of map is to extract small pieces of data from each document. For example, if you're counting articles per author, you could emit the author as the key and the number 1 as the value, to be summed in the next step.

Reduce

The reduce function then receives each of these key-value(s) pairs, for each key emitted from map, with the values in an array. Its purpose is to reduce multiple values-per-key to a single value-per-key. At the end of each iteration of your reduce function, you return (not emit this time) a single variable.

The number of times reduce runs for a given operation isn't easy to predict. (I asked about it on Stack Overflow and the consensus so far is, there's no simple formula.) Essentially reduce runs as many times as it needs to, until each key appears only once. If you emit each key only once, reduce never runs. If you emit most keys once but one special key twice, reduce will run once, getting (special key, [ value, value ]).

A rule of thumb with reduce is that the returned value's structure has to be the same as the structure emitted from map. If you emit an object as the value from map, every key in that object has to be present in the object returned from reduce, and vice-versa. If you return an integer from map, return an integer from reduce, and so on. The basic reason is that (as noted above), reduce shouldn't be necessary if a key only appears once. The results of an entire map-reduce operation, run back through the same operation, should return the same results (that way huge operations can be sharded and map/reduced many times). And the output of any given reduce function, plugged back into reduce (as a single-item array), needs to return the same value as went in. (In CS lingo, reduce has to be idempotent. The documentation explains this in more technical detail.)

Here's a simple JS test, using Node.js' assertion API, to verify this. To use it, have your mapReduce operation export their methods for a separate test script to import and test:

// this should export the map, reduce, [finalize] functions passed to MongoDB.
var mr = require('./mapreduce-query');
 
// override emit() to capture locally
var emitted = [];
 
// (in global scope so map can access it)
global.emit = function(key, val) {
  emitted.push({key:key, value:val});
};
 
// reduce input should be same as output for a single object
// dummyItems can be fake or loaded from DB
mr.map.call(dummyItems[0]);
 
var reduceRes = mr.reduce(emitted[0].key, [ emitted[0].value ]);
assert.deepEqual(reduceRes, emitted[0].value, 'reduce is idempotent');

A simple MapReduce example is to count the number of posts per author. So in map you could emit('author name', 1) for each document, then in reduce loop over each value and add it to a total. Make sure reduce is adding the actual number in the value, not just 1, because that won't be idempotent. Similarly, you can't just return values.length and assume each value represents 1 document.

Finalize

Now you have a single reduced value per key, which get run through the finalize function once per key.

To understand finalize, consider that this is essentially the same as not having a finalize function at all:

var finalize = function(key, value) {
  return value;
}

finalize is not necessary in every MapReduce operation, but it's very useful, for example, for calculating averages. You can't calculate the average in reduce because it can run multiple times per key, so each iteration doesn't have enough data to calculate with.

The final results returned from the operation will have one value per key, as returned from finalize if it exists, or from reduce if finalize doesn't exist.

MapReduce in PHP and Drupal

The MongoDB library for PHP does not include any special functions for MapReduce. They can be run simply as a generic command, but that takes a lot of code. I found a MongoDB-MapReduce-PHP library on Github which makes it easier. It works, but hasn't been updated in two years, so I forked the library and created my own version with what I think are some improvements.

The original library by infynyxx created an abstract class XMongoCollection that was meant to be sub-classed for every collection. I found it more useful to make XMongoCollection directly instantiable, as an extended replacement for the basic MongoCollection class. I added a mapReduceData method which returns the data from the MapReduce operation. For my Drupal application, I added a mapReduceDrupal method which wraps the results and error handling in Drupal API functions.

I could then load every collection with XMongoCollection and run mapReduce operations on it directly, like any other query. Note that the actual functions passed to MongoDB are still written in Javascript. For example:

// (this should be statically cached in a separate function)
$mongo = new Mongo($server_name);      // connection
$mongodb = $mongo->selectDB($db_name); // MongoDB instance
 
// use the new XMongoCollection class. make it available with an __autoloader.
$collection = new XMongoCollection($mongodb, $collection_name);
 
$map = <<<MAP
  function() {
    // doc is 'this'
    emit(this.category, 1);
  }
MAP;
 
$reduce = <<<REDUCE
  function(key, vals) {
    // have `variable` here passed in `setScope`
    return something;
  }
REDUCE;
 
$mr = new MongoMapReduce($map, $reduce, array( /* limit initial document set with a query here */ ));
 
// optionally pass variables to the functions. (e.g. to apply user-specified filters)
$mr->setScope(array('variable' => $variable));
 
// 2nd param becomes the temporary collection name, so tmp_mapreduce_example. 
// (This is a little messy and could be improved. Stated limitation of v1.8+ not supporting "inline" results is not entirely clear.)
// 3rd param is $collapse_value, see code
$result = $collection->mapReduceData($mr, 'example', FALSE);

MapReduce in Node.js

The MongoDB-Native driver for Node.js, now an official 10Gen-sponsored project, includes a collection.mapReduce() method. The syntax is like this:

 
var db = new mongodb.Db(dbName, new mongodb.Server(mongoHost, mongoPort, {}));
db.open(function(error, dbClient) {
  if (error) throw error;  
  dbClient.collection(collectionName, function(err, collection) {
    collection.mapReduce(map, reduce, { 
        out : { inline : 1 },
        query: { ... },     // limit the initial set (optional)
        finalize: finalize,  // function (optional)
        verbose: true        // include stats
      },
      function(error, results, stats) {   // stats provided by verbose
        // ...
      }
    });
  });
});

It's mostly similar to the command-line syntax, except in the CLI, the results are returned from the mapReduce function, while in Node.js they are passed (asynchronously) to the callback.

MapReduce in Mongoose

Mongoose is a modeling layer on top of the MongoDB-native Node.js driver, and in the latest 2.x release does not have its own support for MapReduce. (It's supposed to be coming in 3.x.) But the underlying collection is still available:

var db = mongoose.connect('mongodb://dbHost/dbName');
// (db.connection.db is the native MongoDB driver)
 
// build a model (`Book` is a schema object)
// model is called 'Book' but collection is 'books'
mongoose.model('Book', Book, 'books');
 
...
 
var Book = db.model('Book');
Book.collection.mapReduce(...);

(I actually think this is a case of Mongoose being better without its own abstraction on top of the existing driver, so I hope the new release doesn't make it more complex.)

In sum

I initially found MapReduce very confusing, so hopefully this helps clarify rather than increase the confusion. Please write in the comments below if I've misstated or mixed up anything above.

May 24 '12 1:10pm

Quick tip: Share a large MongoDB query object between the CLI and Node.js

I was writing a very long MongoDB query in JSON that needed to work both in a Mongo CLI script and in a Node.js app. Duplicating the JSON for the query across both raised the risk of one changing without the other. So I dug around for a way to share them and came up with this:

Create a query.js file, like so (the query is just an example, substitute your own):

// dual-purpose, include me in mongo cli or node.js
var module = module || {};    // filler for mongo CLI
 
// query here is for mongo cli. module.exports is for node.js
var query = module.exports = {
  '$and' : [
    { '$or' : [ 
      { someField: { '$exists': false } }, 
      { someOtherField: 0 } 
    ] },
 
    { '$or' : [ 
      { 'endDate' : { '$lt' : new Date() } },
      { 'endDate' : { '$exists' : false } }
    ] }
 
    // ...  
  ]
};

Then in your mongo CLI script, use

load('./query.js');   // sets query var
 
db.items.find(
    query,
 
    {
      // fields to select ...
    }
  )
  .sort({
    // etc...
  });

In your Node.js app, use var query = require('./query.js'); and plug the same query into your MongoDB driver or Mongoose find function. Duplication avoided!

May 22 '12 9:58pm

Using Node.js to connect with the eBay APIs on AntiquesNearMe.com

We recently rolled out a new space on Antiques Near Me called ANM Picks, which uses eBay's APIs to programmatically find high-quality antique auctions using the same metrics that Sean (my antique-dealer business partner) uses in his own business. We'll cover the product promotion elsewhere, but I want to write here about how it works under the hood.

The first iteration a few weeks ago simply involved an RSS feed being piped into the sidebar of the site. The site is primarily a Drupal 6 application, and Drupal has tools for handling feeds, but they're very heavy: They make everything a "node" (Drupal content item), and all external requests have to be run in series using PHP cron scripts on top of a memory-intensive Drupal process - i.e. they're good if you want to pull external content permanently into your CMS, but aren't suited for the kind of ephemeral, 3rd-party data that eBay serves. So I built a Node.js app that loaded the feed periodically, reformatted the items, and served them via Ajax onto the site.

The RSS feed gave us very limited control over the filters, however. To really curate the data intelligently, I needed to use eBay's APIs. They have dozens of APIs of varying ages and protocols. Fortunately for our purposes, all the services we needed spoke JSON. No single API gave us all the dimensions we needed (that would be too easy!), so the app evolved to make many requests to several different APIs and then cross-reference the results.

Node.js is great for working with 3rd-party web services because its IO is all asynchronous. Using async, my favorite Node.js flow-control module, the app can run dozens of requests in parallel or in a more finely-tuned queue. (Node.js obviously didn't invent asynchronous flow control - Java and other languages can spawn off threads to do this - but I doubt they do it as easily from a code perspective.)

To work with the eBay APIs, I wrote a client library which I published to npm (npm install ebay-api). Code is on Github. (Ironically, someone else published another eBay module as I was working on mine, independently; they're both incomplete in different ways, so maybe they'll converge eventually. I like that in the Node.js ecosystem, unlike in Drupal's, two nutcrackers are considered better than one.) The module includes a ebayApiGetRequest method for a single request, paginateGetRequest for a multi-page request, a parser for two of the services, a flatten method that simplifies the data structure (to more easily query the results with MongoDB), and some other helper functions. There are examples in the code as well.

Back to my app: Once the data is mashed up and filtered, it's saved into MongoDB (using Mongoose for basic schema validation, but otherwise as free-form JSON, which Mongo is perfect for). A subset is then loaded into memory for the sidebar Favorites and ANM Picks. (The database is only accessed for an individual item if it's no longer a favorite.) All this is frequently reloaded to fetch new items and flush out old ones (under the eBay TOS, we're not allowed to show anything that has passed).

The app runs on a different port from the website, so to pipe it through cleanly, I'm using Varnish as a proxy. Varnish is already running in front of the Drupal site, so I added another backend in the VCL and activate it based on a subdomain. Oddly, trying to toggle by URL (via req.url) didn't work - it would intermittently load the wrong backend without a clear pattern - so the subdomain (via `req.http.host) was second best. It was still necessary to deal with Allow-Origin issues, but at least the URL scheme looked better, and the cache splits the load.

Having the data in MongoDB means we can multiply the visibility (and affiliate-marketing revenue) through social networks. Each item now has a Facebook Like widget which points back to a unique URL on our site (proxied through Drupal, with details visible until it has passed). The client-side JS subscribes to the widgets' events and pings our app, so we can track which items and categories are the most popular, and (by also tracking clicks) make sure eBay is being honest. We're tuning the algorithm to show only high-quality auctions, so the better it does, the more (we hope) they'll be shared organically.

Comments? Questions? Interested in using the eBay APIs with Node.js? Feel free to email me or comment below.

May 22 '12 3:55pm

JSConf Argentina

Over on my travel blog I wrote about attending JSConf Argentina this past weekend. They flew in a dozen Node.js luminaries, I connected with the Javascript community here in Buenos Aires, as well as people in San Francisco (where I'll probably be moving later this year). It was awesome, check out the post (with photos of the very cool venue).
Apr 29 '12 10:06am

Liberate your Drupal data for a service-oriented architecture (using Redis, Node.js, and MongoDB)

Drupal's basic content unit is a "node," and to build a single node (or to perform any other Drupal activity), the codebase has to be bootstrapped, and everything needed to respond to the request (configuration, database and cache connections, etc) has to be initialized and loaded into memory from scratch. Then node_load runs through the NodeAPI hooks, multiple database queries are run, and the node is built into a single PHP object.

This is fine if your web application runs entirely through Drupal, and always will, but what if you want to move toward a more flexible Service-oriented architecture (SOA), and share your content (and users) with other applications? For example, build a mobile app with a Node.js backend like LinkedIn did; or calculate analytics for business intelligence; or have customer service reps talk to your customers in real-time; or integrate with a ticketing system; or do anything else that doesn't play to Drupal's content-publishing strengths. Maybe you just want to make your data (which is the core of your business, not the server stack) technology-agnostic. Maybe you want to migrate a legacy Drupal application to a different system, but the cost of refactoring all the business logic is prohibitive; with an SOA you could change the calculation and get the best of both worlds.

The traditional way of doing this was setting up a web service in Drupal using something like the Services module. External applications could request data over HTTP, and Drupal would respond in JSON. Each request has to wait for Drupal to bootstrap, which uses a lot of memory (every enterprise Drupal site I've ever seen has been bogged down by legacy code that runs on every request), so it's slow and doesn't scale well. Rather than relieving some load from Drupal's LAMP stack by building a separate application, you're just adding more load to both apps. To spread the load, you have to keep adding PHP/Apache/Mysql instances horizontally. Every module added to Drupal compounds the latency of Drupal's hook architecture (running thousands of function_exists calls for example), so the stakeholders involved in changing the Drupal app has to include the users of every secondary application requesting the data. With a Drupal-Services approach, other apps will always be second-class citizens, dependent on the legacy system, not allowing the "loose coupling" principle of SOA.

I've been shifting my own work from Drupal to Node.js over the last year, but I still have large Drupal applications (such as Antiques Near Me) which can't be easily moved away, and frankly don't need to be for most use cases. Overall, I tend to think of Drupal as a legacy system, burdened by too much cruft and inconsistent architecture, and no longer the best platform for most applications. I've been giving a lot of thought to ways to keep these apps future-proof without rebuilding all the parts that work well as-is.

That led me to build what I've called the "Drupal Liberator". It consists of a Drupal module and a Node.js app, and uses Redis (a very fast key-value store) for a middleman queue and MongoDB for the final storage. Here's how it works:

  • When a node (or user, or other entity type) is saved in Drupal, the module encodes it to JSON (a cross-platform format that's also native to Node.js and MongoDB), and puts it, along with metadata (an md5 checksum of the JSON, timestamp, etc), into a Redis hash (a simple key-value object, containing the metadata and the object as a JSON string). It also notifies a Redis pub/sub channel of the new hash key. (This uses 13KB of additional memory and 2ms of time for Drupal on the first node, and 1KB/1ms for subsequent node saves on the same request. If Redis is down, Drupal goes on as usual.)

  • The Node.js app, running completely independently of Drupal, is listening to the pub/sub channel. When it's pinged with a hash key, it retrieves the hash, JSON.parse's the string into a native object, possibly alters it a little (e.g., adding the checksum and timestamp into the object), and saves it into MongoDB (which also speaks JSON natively). The data type (node, user, etc) and other information in the metadata directs where it's saved. Under normal conditions, this whole process from node_save to MongoDB takes less than a second. If it were to bottleneck at some point in the flow, the Node.js app runs asynchronously, not blocking or straining Drupal in any way.

  • For redundancy, the Node.js app also polls the hash namespace every few minutes. If any part of the mechanism breaks at any time, or to catch up when first installing it, the timestamp and checksum stored in each saved object allow the two systems to easily find the last synchronized item and continue synchronizing from there.

The result is a read-only clone of the data, synchronized almost instantaneously with MongoDB. Individual nodes can be loaded without bootstrapping Drupal (or touching Apache-MySql-PHP at all), as fully-built objects. New apps utilizing the data can be built in any framework or language. The whole Drupal site could go down and the data needed for the other applications would still be usable. Complex queries (for node retrieval or aggregate statistics) that would otherwise require enormous SQL joins can be built using MapReduce and run without affecting the Drupal database.

One example of a simple use case this enables: Utilize the CMS backend to edit your content, but publish it using a thin MongoDB layer and client-side templates. (And outsource comments and other user-write interactions to a service like Disqus.) Suddenly your content displays much faster and under higher traffic with less server capacity, and you don't have to worry about Varnish or your Drupal site being "Slashdotted".

A few caveats worth mentioning: First, it's read-only. If a separate app wants to modify the data in any way (and maintain data integrity across systems), it has to communicate with Drupal, or a synchronization bridge has to be built in the other direction. (This could be the logical next step in developing this approach, and truly make Drupal a co-equal player in an SOA.)

Second, you could have Drupal write to MongoDB directly and cut out the middlemen. (And indeed that might make more sense in a lot of cases.) But I built this with the premise of an already strained Drupal site, where adding another database connection would slow it down even further. This aims to put as little additional load on Drupal as possible, with the "Liberator" acting itself as an independent app.

Third, if all you needed was instant node retrieval - for example, if your app could query MySql for node ID's, but didn't want to bootstrap Drupal to build the node objects - you could leave them in Redis and take Node.js and MongoDB out of the picture.

I've just started exploring the potential of where this can go, so I've run this mostly as a proof-of-concept so far (successfully). I'm also not releasing the code at this stage: If you want to adopt this approach to evolve your Drupal system to a service-oriented architecture, I am available as a consultant to help you do so. I've started building separate apps in Node.js that tie into Drupal sites with Ajax and found the speed and flexibility very liberating. There's also a world of non-Drupal developers who can help you leverage your data, if it could be easily liberated. I see this as opening a whole new set of doors for where legacy Drupal sites can go.

Apr 13 '12 2:44pm

Cracking the cross-domain/Allow-Origin nut

Update/preface: None of this is relevant in IE, because IE doesn't respect cross-origin rules. You have to use JSONP for IE (so you might as well use JSONP everywhere.)

The other day, I was setting up an Ajax feed loader - a node.js app pulling an RSS feed every few minutes and parsing it, and a Drupal site requesting a block of HTML from the node.js app via jQuery-ajax - and ran into a brick wall of cross-domain/origin (aka Cross Domain Resource Sharing) issues. This occurs any time you ajax-load something on a different subdomain or port from the main page it's loading into. (Even if you pipe the feed through the primary domain, using Varnish for example, if you use a separate hostname for your development site, or a local server, it'll break on those.)

In theory, it should be very simple to add an Access-Control-Allow-Origin header to the source app - the node.js app in this case - and bypass the restriction. In practice, it's not nearly so easy.

To get at the root of the problem and eliminate quirks in the particular app I was building, I set up 2 local virtualhosts with apache, and tried every combination until it worked.

Here are some problems I ran into, and solutions, to save the next person with this issue some time:

  • Access-Control-Allow-Origin is supposed to allow multiple domains to be set - as in http://sub1.domain.com http://sub2.domain.com - but no combination of these (separating with spaces, commas, or comma+space) actually worked. The solution to this is either allow all domains with * or dynamically set the domain to the origin on a per request basis. (In a typical node.js HTTP server for example, that's found at req.headers.origin - but that only exists if it's called via Ajax in another page.) The latter solution is fine when the source domain is always known, or every request hits the backend, but can be problematic if you're trying to run it on multiple endpoints, or through Varnish.
  • Chrome seems to have some bugs handling these situations, producing inconsistent results with the same environment.
  • The minimal working solution in the Apache experiment turned out to require, besides a valid Access-Control-Allow-Origin, this one header: Access-Control-Allow-Headers: X-Requested-With. (Apparently that's used only by Ajax/XmlHttpRequest requests, and without the server explicitly allowing that request header, the request fails.)
  • Before making the GET request for the content itself, some browsers make an OPTIONS request to verify the cross-domain permissions. Several other people running into these problems recommending including this header: Access-Control-Allow-Methods: OPTIONS, GET, POST. In the Apache experiment it wasn't necessary, but I put it in the final node.js app and it can't hurt.
  • Also from other people's recommendations, a more verbose version of Access-Control-Allow-Headers is possible, if not all necessary: Access-Control-Allow-Headers: Content-Type, Depth, User-Agent, X-File-Size, X-Requested-With, If-Modified-Since, X-File-Name, Cache-Control

Taking the lessons from the Apache experiment back to the node.js app, I used this code. It's written as an express middleware function (make sure to run it before app.router or any individual routes) The _ character refers to the underscore library.

app.use(function(req, res, next) {
  var headers = {
    'Cache-Control' : 'max-age:120'   // cache for 2m (in varnish and client)
  };
 
  // allowed origin?
  if (!_.isUndefined(req.headers.origin)) {
    // validate (primary, secondary, local-dev)
    if (req.headers.origin.match(/domain\.com/) 
    || req.headers.origin.match(/secondary\.domain\.com/) 
    || req.headers.origin.match(/domain\.local/)) {
      headers = _.extend(headers, {
        'Access-Control-Allow-Origin': req.headers.origin
      , 'Access-Control-Allow-Methods': 'GET, POST, OPTIONS'
      , 'Access-Control-Allow-Headers': 'Content-Type, X-Requested-With, X-PINGOTHER'
      , 'Access-Control-Max-Age': 86400   // 1 day
      });
    }
  }
 
  _.each(headers, function(value, key) {
    res.setHeader(key, value);
  });
 
  next();
});
Apr 4 '12 9:34am
Tags

New Varnish "Book"

I just found out about this so I thought I'd pass it along: It appears the creators of Varnish have created a for-profit venture "Varnish Software" to market Varnish as a service, and as part of that effort they wrote an e-book, The Varnish Book. One of the challenges with Varnish has always been the lack of thorough "how-to" documentation, so this is good news. (See my December post, Making sense of Varnish caching rules.)
Jan 30 '12 7:48pm

Reducing coordinate density in KML files (with a node.js script)

Lately I've been tracking my bicycle rides with an Android GPS tracking app. The app exports to multiple formats, including Google Maps and KML. I wanted to take all my rides for a week and overlay them on a single map, but the coordinate density was too high - thousands of points for each ride - so GMaps was paginating the maps to reduce the density, and the overlays didn't work.

So I needed some way to reduce the coordinate density of the KML files, taking 1 out of every N points, so the route looked the same on the map but with less unnecessary detail.

I tried to find an existing tool to do this, but couldn't find one. So I started writing one as a node.js script (I'm trying to do everything platform-neutral these days in node). First I tried to actually parse the KML using various XML parsers - but the parsers stripped the line breaks between coordinates, so the format broke, and I realized I didn't really need to parse the format at all. I just needed to eliminate some of the lines.

The result is a very simple, functional KML Coordinate Density Reducer. It reads each line of a KML file, uses regex to determine if it's a coordinate line or not, and if it is a coordinate line, strip all but every Nth line, as specified in the shell parameters.

Using the script, I reduced each route from thousands of points to a few hundred, imported them all into a single map, and can view or embed the routes all at once.

Update: Someone wrote an adaptation that also checks the distance between points. (See tweet.)

Jan 30 '12 11:39am
Tags

Git trick: Cleanup end-of-line changes

I was working on a site recently that was moved from one VPS hoster to another, and the ops team that moved it somehow introduced an end-of-line change to every file. I didn't want to commit these junk changes, so I wrote this little snippet to clean them. For each modified file (not new ones), it counts the number of changed lines, excluding EOL changes, and if there are none, it checks out a clean copy of the file:

git st --porcelain | grep -r "^ M" | awk '{ print $2 }' | while read FILE; do
  LINES=`git diff --ignore-space-at-eol "$FILE" | wc -l`;
  if [[ "$LINES" -eq "0" ]]; then
    git checkout "$FILE"; 
    echo "$FILE"; 
  fi;
done
Jan 17 '12 10:00am
Tags

Why Node.js? Why clients should ask for it, and developers should build with it

Following my post about my new node.js apps, and since I would like to turn New Leaf Digital into a node.js shop, I should write a little about why you, perhaps a potential client with a web project, might want to have it built in node.

Node.js is only two years old, but already sustains a vast ecosystem of add-on modules, tutorials, and meetups. The energy in the community is palpable and is based on strong fundamentals. Working in Node brings out the best parts of web development. Node is built in javascript, a language every developer knows (at least a little bit), so the learning curve is not a deterrent. That's important to consider as a client because, unlike other systems that have peaked in their appeal to developers, you can build a Node.js application today and know its platform will be supported for the long haul.

Node is truly lightweight: Unlike bloated Swiss army knife frameworks that try to solve every problem out of the box at the expense of performance and comprehension, a Node app starts as a completely blank slate and is only as complex as you make it. So you'll get more bang for your server capacity buck from the get-go. (I've worked on several Drupal projects involving performance, getting each page to load faster by eliminating cruft and bottlenecks. In Node that whole way of thinking is flipped on its head.) Every tiny operation of your app is also light: the whole system is built on a philosophy of "asynchronous" input/output. Think of a node app as a juggler: while each ball is arcing through the air, it's catching and throwing other balls. Interactions don't "block" the flow like a traditional web application. So you don't run out of capacity until you're really out of capacity, and a bottleneck in one part of the system won't bring down the rest of it.

This asynchronous I/O also makes node.js especially suited to applications involving file handling, interaction with external web services (as in Flashcards), or real-time user interaction (as in Interactive Lists). These are much harder to scale on traditional platforms, because the operations make other processes wait around while they're off doing their work.

Node.js is also perfectly positioned to work with new database technologies, like MongoDB, which offer a flexibility not available with traditional SQL/relational databases. Node.js and MongoDB both "speak" the same language natively - javascript - so building or working with JSON APIs is easy. Architectures can be "rapidly prototyped" and changed on the fly as the application concept evolves.

So what is node.js not good for? If you want a robust content management system out of the box for a news publication, for example, you probably want to stick with a platform like Drupal. If you want a simple blog with easy content creation tools and comments and photos, you're still safe with Wordpress. If you're building software for banks to transfer money across the globe, there are probably battle-hardened, traditional ways to do that.

But for almost any other web app, node.js might just be the best toolkit to build with. So please let me know when you're plotting your next big web idea!

Jan 17 '12 9:00am
Tags

Apps.newleafdigital.com: Building a suite of apps in node.js

I just launched a suite of node.js apps at apps.newleafdigital.com. Included for public consumption are my Spanish Flashcards app, refactored so each user has their own flashcards, and a new Interactive Lists app, an expansion of a proof of concept I built using websockets. They're connected with a common layout and a shared authentication layer using Facebook Connect.

The main purpose of building these apps was to learn a complete node.js stack (more on that below) and gain experience coding and troubleshooting node.js apps.

The second purpose was to demonstrate production node.js code to prospective clients. New Leaf Digital is now officially a half-Drupal, half-Node.js shop (and happy to switch more than half to the latter if there is work to be had).

The third purpose (besides using the apps myself) was to allow others to use the apps, as well as to learn from the code. Anyone can login using their Facebook ID and all the code is on Github (under a CC license).

What do the apps do?

Spanish Flashcards lets you create flashcards of English-Spanish word translations. Randomly play your flashcards until you get them all right, and look up new words with the WordReference API.

Interactive Lists lets you create to-do lists, shopping lists, or any other kinds of list, and share your lists with your friends. As you add and remove items from the list, everyone else sees it immediately in real-time. Imagine a scavenger hunt in which everyone is tracking the treasure on their phones, or a family trip to the mall.

Auth (under the hood): a common authentication layer using Facebook Connect, which the other 2 user-facing apps (and the parent app) share.

How they're built

The stack consists of: node.js as the engine, Express for the web framework, Jade for templates, Mongoose for MongoDB modeling, socket.io for real-time two-way communication, everyauth + mongoose-auth for 3rd party authentication, connect-mongodb for session storage, async for readable control flow, underscore for language add-ons, http-proxy for a flexible router. Plus connect-less and Bootstrap for aesthetics. Forever keeps it running.

To bring the 4 apps (parent/HQ, auth, flashcards, lists) together, there were a few options: a parent app proxying to child apps running independently; virtual hosts (requiring separate DNS records); or using Connect/Express's "mounting" capability. Mounted apps were the most complex option, but offered the best opportunity to learn the deep innards of Express, and the proxy solution was unclear at the time, so I went with mounted apps.

Along the way I refactored constantly and hit brick walls dozens of times. In the end it all works (so far!), and the code makes sense. Since the parent app is a whole standalone server hogging its port, I added a thin proxy on top which points the subdomain to the app, keeping other subdomains on port 80 open for the future.

The app mounting functionality of Express.js is incredibly robust: using the same app.use() syntax as middleware, you can app.use(anotherApp), or even app.use('/path', anotherApp) to load a whole app at a sub-path. (Then the sub-app's routes all change relative to that "mount point".)

Of course in practice, mounting multiple apps is extremely complex. It's also not the most stable approach: a fatal error in any part of the suite will bring down the whole thing. So on a suite of "real" production apps, I wouldn't necessarily recommend the mounting model, but it's useful to understand. And when it works, it's very elegant.

Coming soon

Over the next few weeks, I'll be writing a series of blog posts about specific lessons learned from building these apps. In the meantime, I hope some people make good use of them, and please report bugs if you find any.

Next I'm going to write about Why Node.js? - why you, perhaps a potential client, or perhaps another developer, should consider building your next app in Node.

Jan 14 '12 3:55pm

How to install xhprof on OSX (Snow Leopard or Lion) and XAMPP

Note: The same instructions apply to XDebug and possibly every other PECL module in this environment.

XHProf is a very neat PHP profiler. Documentation and general installation instructions are not hard to come by, but none of the examples I found had the right steps for my particular setup, Mac OSX Lion with XAMPP running php 5.3. (Should apply also to OSX Snow Leopard.) So this post fills in that hole; for everything else regarding xhprof look elsewhere.

The problem is the "architecture" that xhprof compiles to; OSX and xhprof are 64-bit but XAMPP is apparently 32-bit. (If you're using MAMP instead of XAMPP, you'll have a similar problem; here is a solution for that.)

After trying multiple variations, the solution that worked was adapted from this memcached-on-xampp post):

  1. Download the latest copy of xhprof
  2. sudo phpize (not sure if this is actually necessary. You may run into problems here; find the solution to that elsewhere.)
  3. If you've been trying other methods that failed, clean up the old junk: sudo make clean
  4. The most important part:
    sudo MACOSX_DEPLOYMENT_TARGET=10.6 CFLAGS='-O3 -fno-common -arch i386 -arch x86_64' LDFLAGS='-O3 -arch i386 -arch x86_64' CXXFLAGS='-O3 -fno-common -arch i386 -arch x86_64' ./configure --with-php-config=/Applications/XAMPP/xamppfiles/bin/php-config-5.3.1
  5. sudo make
  6. sudo make install
  7. Add to your php.ini:
    xhprof
    extension = xhprof.so
    xhprof.output_dir = "/Applications/XAMPP/xhprof-logs"

Then run php -i | grep xhprof and if it worked, it should tell you which version you're running. If it fails, it will say, xhprof.so: mach-o, but wrong architecture in Unknown on line 0.

Good luck!

Update: It's worth mentioning, you'll probably also need to install Graphviz so xhprof can find the dot utility to generate graphics.

Dec 14 '11 4:53pm

Unconventional unit testing in Drupal 6 with PhpUnit, upal, and Jenkins

Unit testing in Drupal using the standard SimpleTest approach has long been one of my pain points with Drupal apps. The main obstacle was setting up a realistic test "sandbox": The SimpleTest module builds a virtual site with a temporary database (within the existing database), from scratch, for every test suite. To accurately test the complex interactions of a real application, you need dozens of modules enabled in the sandbox, and installing all their database schemas takes a long time. If your site's components are exported to Features, the tests gain another level of complexity. You could have the test turn on every module that's enabled on the real site, but then each suite takes 10 minutes to run. And that still isn't enough; you also need settings from the variables table, content types real nodes and users, etc.

So until recently, it came down to the choice: make simple but unrealistic sandboxes that tested minutia but not the big-picture interactions; or build massive sandboxes for each test that made the testing workflow impossible. After weeks of trying to get a SimpleTest environment working on a Drupal 6 application with a lot of custom code, and dozens of hours debugging the tests or the sandbox setups rather than building new functionality, I couldn't justify the time investment, and shelved the whole effort.

Then Moshe Weizman pointed me to his alternate upal project, which aims to bring the PHPUnit testing framework to Drupal, with backwards compatibility for SimpleTest assertions, but not the baggage of SimpleTest's Drupal implementation. Moshe recently introduced upal as a proposed testing framework for Drupal 8, especially for core. Separately, a few weeks ago, I started using upal for a different purpose: as a unit testing framework for custom applications in Drupal 6.

I forked the Github repo, started a backport to D6 (copying from SimpleTest-6 where upal was identical to SimpleTest-7), and fixed some of the holes. More importantly, I'm taking a very different approach to the testing sandbox: I've set up an entirely separate test site, copied wholesale from the dev site (which itself is copied from the production site). This means:

  • I can visually check the test sandbox at any time, because it runs as a virtualhost just like the dev site.
  • All the modules, settings, users, and content are in place for each test, and don't need to be created or torn down.
  • Rebuilding the sandbox is a single operation (with shell scripts to sync MySql, MongoDB, and files, manually triggered in Jenkins)
  • Cleanup of test-created objects occurs (if desired) on a piecemeal basis in tearDown() - drupalCreateNode() (modified) and drupalVariableSet() (added) optionally undo their changes when the test ends.
  • setUp() is not needed for most tests at all.
  • dumpContentsToFile() (added) replicates SimpleTest's ability to save curl'd files, but on a piecemeal basis in the test code.
  • Tests run fast, and accurately reflect the entirety of the site with all its actual interactions.
  • Tests are run by the Jenkins continuous-integration tool and the results are visible in Jenkins using the JUnit xml format.

How to set it up (with Jenkins, aka Hudson)

(Note: the following are not comprehensive instructions, and assume familiarity with shell scripting and an existing installation of Jenkins.)

  1. Install upal from Moshe's repo (D7) or mine (D6). (Some of the details below I added recently, and apply only to the D6 fork.)
  2. Install PHPUnit. The pear approach is easiest.
  3. Upgrade drush: the notes say, "You currently need 'master' branch of drush after 2011.07.21. Drush 4.6 will be OK - http://drupal.org/node/1105514" - this seems to correspond to the HEAD of the 7.x-4.x branch in the Drush repository.
  4. Set up a webroot, database, virtualhost, DNS, etc for your test sandbox, and any scripts you need to build/sync it.
  5. Configure phpunit.xml. Start with upal's readme, then (D6/fork only) add DUMP_DIR (if wanted), and if HTTP authentication to the test site is needed, UPAL_HTTP_USER and UPAL_HTTP_PASS. In my version I've split the DrupalTestCase class to its own file, and renamed drupal_test_case.php to upal.php, so rename the "bootstrap" parameter accordingly. ** (note: the upal notes say it must run at a URL ending in /upal - this is no longer necessary with this approach.)
  6. PHPUnit expects the files to be named .php rather than .test - however if you explicitly call an individual .test file (rather than traversing a directory, the approach I took), it might work. You can also remove the getInfo() functions from your SimpleTests, as they don't do anything anymore.
  7. If Jenkins is on a different server than the test site (as in my case), make sure Jenkins can SSH over.
  8. To use dumpContentsToFile() or the XML results, you'll want a dump directory (set in phpunit.xml), and your test script should wipe the directory before each run, and rsync the files to the build workspace afterwards.
  9. To convert PHPUnit's JUnit output to the format Jenkins understands, you'll need the xUnit plugin for Jenkins. Then point the Jenkins job to read the XML file (after rsync'ing if running remotely). [Note: the last 3 steps have to be done with SimpleTest and Jenkins too.]
  10. Code any wrapper scripts around the above steps as needed.
  11. Write some tests! (Consult the PHPUnit documentation.)
  12. Run the tests!

Some issues I ran into (which you might also run into)

  1. PHPUnit, unlike SimpleTest, stops a test function after the first failure. This isn't a bug, it's expected behavior, even with --stop-on-failure disabled. I'd prefer it the other way, but that's how it is.
  2. Make sure your test site - like any dev site - does not send any outbound mail to customers, run unnecessary feed imports, or otherwise perform operations not meant for a non-production site.
  3. In my case, Jenkins takes 15 minutes to restart (after installing xUnit for example). I don't know why, but keep an eye on the Jenkins log if it's taking you a while too.
  4. Also in my case, Jenkins runs behind an Apache reverse-proxy; in that case when Jenkins restarts, it's usually necessary to restart Apache, or else it gets stuck thinking the proxy endpoint is down.
  5. I ran into a bug with Jenkins stopping its shell script commands arbitrarily before the end. I worked around it by moving the whole job to a shell script on the Jenkins server (which in turn delegates to a script on the test/dev server).

There is a pending pull request to pull some of the fixes and changes I made back into the original repo. In the pull request I've tried to separate what are merely fixes from what goes with the different test-site approach I've taken, but it's still a tricky merge. Feel free to help there, or make your own fork with a separate test site for D7.

I now have a working test environment with PHPUnit and upal, with all of the tests I wrote months ago working again (minus their enormous setUp() functions), and I've started writing tests for new code going forward. Success!

(If you are looking for a professional implementation of any of the above, please contact me.)

Recent related post: Making sense of Varnish caching rules

Dec 12 '11 3:52pm

Making sense of Varnish caching rules

Varnish is a reverse-proxy cache that allows a site with a heavy backend (such as a Drupal site) and mostly consistent content to handle very high traffic load. The “cache” part refers to Varnish storing the entire output of a page in its memory, and the “reverse proxy” part means it functions as its own server, sitting in front of Apache and passing requests back to Apache only when necessary.

One of the challenges with implementing Varnish, however, is the complex “VCL” protocol it uses to process requests with custom logic. The syntax is unusual, the documentation relies heavily on complex examples, and there don’t seem to be any books or other comprehensive resources on the software. A recent link on the project site to Varnish Training is just a pitch for a paid course. Searching more specifically for Drupal + Varnish will bring up many good results - including Lullabot’s fantastic tutorial from April, and older examples for Mercury - but the latest stable release is now 3.x and many of the examples (written for 2.x) don’t work as written anymore. So it takes a lot of trial and error to get it all working.

I’ve been running Varnish on AntiquesNearMe.com, partly to keep our hosting costs down by getting more power out of less [virtual] hardware. A side benefit is the site’s ability to respond very nicely if the backend Apache server ever goes down. They’re on separate VPS's (connected via internal private networking), and if the Apache server completely explodes from memory overload, or I simply need to upgrade a server-related package, Varnish will display a themed “We’re down for a little while” message.

But it wasn’t until recently that I got Varnish’s primary function, caching, really tuned. I spent several days under the hood recently, and while I don’t want to rehash what’s already been well covered in Lullabot’s tutorial, here are some other things I learned:

Check syntax before restarting

After you update your VCL, you need to restart Varnish - using sudo /etc/init.d/varnish restart for instance - for the changes to take effect. If you have a syntax error, however, this will take down your site. So check the syntax first (change the path to your VCL as needed):
varnishd -C -f /etc/varnish/default.vcl > /dev/null

If there are errors, it will display them; if not, it shows nothing. Use that as a visual check before restarting. (Unfortunately the exit code of that command is always 0, so you can’t do check-then-restart as simply as check-varnish-syntax && /etc/init.d/varnish restart, but you could grep the output for the words “exit 1” to accomplish the same.)

Logging

The std.log function allows you to generate arbitrary messages about Varnish’s processing. Add import std; at the top of your VCL file, and then std.log("DEV: some useful message") anywhere you want. The “DEV” prefix is an arbitrary way of differentiating your logs from all the others. So you can then run in the shell, varnishlog | grep "DEV" and watch only the information you’ve chosen to see.

How I use this:
- At the top of vcl_recv() I put std.log("DEV: Request to URL: " + req.url);, to put all the other logs in context.
- When I pipe back to apache, I put std.log("DEV: piping " + req.url + " straight back to apache"); before the return (pipe);
- On blocked URLs (cron, install), the same
- On static files (images, JS, CSS), I put std.log("DEV: Always caching " + req.url);
- To understand all the regex madness going on with cookies, I log req.http.Cookie at every step to see what’s changed.

Plug some of these in, check the syntax, restart Varnish, run varnishlog|grep PREFIX as above, and watch as you hit a bunch of URLs in your browser. Varnish’s internal logic will quickly start making more sense.

Watch Varnish work with your browser

Varnish headers in Chrome Inspector
The Chrome/Safari Inspector and Firebug show the headers for every request made on a page. With Varnish running, look at the Response Headers for one of them: you’ll see “Via: Varnish” if the page was processed through Varnish, or “Server:Apache” if it went through Apache. (Using Chrome, for instance, login to your Drupal site and the page should load via Apache (assuming you see page elements not available to anonymous users), then open an Incognito window and it should run through Varnish.)

Add hit/miss headers

  • When a page is supposed to be cached (not pipe'd immediately), Varnish checks if there is an existing hit or miss. To watch this in your Inspector, use this logic:
sub vcl_deliver {
  std.log("DEV: Hits on " + req.url + ": " + obj.hits);
 
  if (obj.hits > 0) {
    set resp.http.X-Varnish-Cache = "HIT";
  }
  else {
    set resp.http.X-Varnish-Cache = "MISS";
  }
 
  return (deliver);
}

Then you can clear the caches, hit a page (using the browser technique above), see “via Varnish” and a MISS, hit it again, see a HIT (or not), and know if everything is working.

Clear Varnish when aggregated CSS+JS are rebuilt

If you have CSS/JS aggregation enabled (as recommended), your HTML source will reference long hash-string files. Varnish caches that HTML with the hash string. If you clear only those caches (“requisites” via Admin Menu or cc css+js via Drush), Varnish will still have the old references, but the files will have been deleted. Not good. You could simply never use that operation again, but that’s a little silly.

The heavy-handed solution I came up with (I welcome alternatives) is to wipe the Varnish cache when CSS+JS resets. That operation is not hook-able, however, so you have to patch core. In common.js, _drupal_flush_css_js(), add:

if (module_exists('varnish') && function_exists('varnish_purge_all_pages')) {
  varnish_purge_all_pages();
}

This still keeps Memcache and other in-Drupal caches intact, avoiding an unnecessary “clear all caches” operation, but makes sure Varnish doesn’t point to dead files. (You could take it a step further and purge only URLs that are Drupal-generated and not static; if you figure out the regex for that, please share.)

Per-page cookie logic

On AntiquesNearMe.com we have a cookie that remembers the last location you searched, which makes for a nicer UX. That cookie gets added to Varnish’s page “hash” and (correctly) bypasses the cache on pages that take that cookie into account. The cookie is not relevant to the rest of the site, however, so it should be ignored in those cases. How to handle this?

There are two ways to handle cookies in Varnish: strip cookies you know you don’t want, as in this old Pressflow example, or leave only the cookies you know you do want, as in Lullabot’s example. Each strategy has its pros and cons and works on its own, but it’s not advisable to combine them. I’m using Lullabot’s technique on this site, so to deal with the location cookie, I use if-else logic: if the cookie is available but not needed (determined by regex like req.url !~ "PATTERN" || ...), then strip it; otherwise keep it. If the cookie logic you need is more varied but still linear, you could create a series of elsif statements to handle all the use cases. (Just make sure to roast a huge pot of coffee first.)

Useful add-ons to varnish.module

  • Added watchdog('varnish', ...) commands in varnish.module on cache-clearing operations, so I could look at the logs and spot problems.
  • Added a block to varnish.module with a “Purge this page” button for each URL, shown only for admins. (I saw this in an Akamai module and it made a lot of sense to copy. I’d be happy to post a patch if others want this.)
  • The Expire offers plug-n-play intelligence to selectively clear Varnish URLs only when necessary (clearing a landing page of blog posts only if a blog post is modified, for example.) Much better than the default behavior of aggressive clearing “just in case”.

I hope this helps people adopt Varnish. I am also available via my consulting business New Leaf Digital for paid implementation, strategic advice, or support for Varnish-aided sites.

Nov 29 '11 1:06pm

Parse Drupal watchdog logs in syslog (using node.js script)

Drupal has the option of outputting its watchdog logs to syslog, the file-based core Unix logging mechanism. The log in most cases lives at /var/log/messages, and Drupal's logs get mixed in with all the others, so you need to cat /var/log/messages | grep drupal to filter.

But then you still have a big text file that's hard to parse. This is probably a "solved problem" many times over, but recently I had to parse the file specifically for 404'd URLs, and decided to do it (partly out of convenience but mostly to learn how) using Node.js (as a scripting language). Javascript is much easier than Bash at simple text parsing.

I put the code in a Gist, node.js script to parse Drupal logs in linux syslog (and find distinct 404'd URLs). The last few lines of URL filtering can be changed to any other specific use case you might have for reading the logs out of syslog. (This could also be used for reading non-Drupal syslogs, but the mapping applies keys like "URL" which wouldn't apply then.)

Note the comment at the top: to run it you'll need node.js and 2 NPM modules as dependencies. Then take your filtered log (using the greg method above) and pass it as a parameter, and read the output on screen or output with > to another file.

Nov 25 '11 11:59am

Migrating a static 960.gs grid to a responsive, semantic grid with LessCSS

The layout of Antiques Near Me (a startup I co-founded) has long been built using the sturdy 960.gs grid system (implemented in Drupal 6 using the Clean base theme). Grids are very helpful: They allow layouts to be created quickly; they allow elements to be fit into layouts easily; they keep dimensions consistent; they look clean. But they have a major drawback that always bothered me: the grid-X classes that determine an element's width are in the HTML. That mixes up markup/content and layout/style, which should ideally be completely separated between the HTML and CSS.

The rigidity of an in-markup grid becomes especially apparent when trying to implement "responsive" design principles. I'm not a designer, but the basic idea of responsive design for the web, as I understand it, is that a site's layout should adapt automagically to the device it's viewed in. For a nice mobile experience, for example, rather than create a separate mobile site - which I always thought was a poor use of resources, duplicating the content-generating backend - the same HTML can be used with @media queries in the CSS to make the layout look "native".

(I've put together some useful links on Responsive Design and @media queries using Delicious. The best implementation of a responsive layout that I've seen is on the site of FourKitchens.)

Besides the 960 grid, I was using LessCSS to generate my styles: it supports variables, mix-ins, nested styles, etc; it generally makes stylesheet coding much more intuitive. So for a while the thought simmered, why not move the static 960 grid into Less (using mixins), and apply the equivalent of grid-X classes directly in the CSS? Then I read this article in Smashing on The Semantic Grid System, which prescribed pretty much the same thing - using Less with a library called Semantic.gs - and I realized it was time to actually make it happen.

To make the transition, I forked semantic.gs and made some modifications: I added .alpha and .omega mixins (to cancel out side margins); for nested styles, I ditched semantic.gs's .row() approach (which seems to be buggy anyway) and created a .nested-column mixin instead. I added clear:both to the .clearfix mixin (seemed to make sense, though maybe there was a reason it wasn't already in).

To maintain the 960.gs dimensions and classes (as an intermediary step), I made a transitional-960gs.less stylesheet with these rules: @columns: 16; @column-width: 40; @gutter-width: 20;. Then I made equivalents of the .grid_X classes (as Clean's implementation had them) with an s_ prefix:

.s_container, .s_container_16 {
  margin-left: auto;
  margin-right: auto;
  width: @total-width;
  .clearfix();
}
.s_grid_1 {
  .column(1);
}
.s_grid_2 {
  .column(2);
}
...
.s_grid_16 {
  .column(16);
}

The s_grid_X classes were purely transitional: they allowed me to do a search-and-replace from grid_ to s_grid_ and remove the 960.gs stylesheet, before migrating all the styles into semantic equivalents. Once that was done, the s_grid_ classes could be removed.

960.gs and semantic.gs also implement their columns a little differently, one with padding and the other with margins, so what was actually a 1000px-wide layout with 960.gs became a 960px layout with semantic.gs. To compensate for this, I made a wrapper mixin applied to all the top-level wrappers:

.wide-wrapper {
  .s_container;
  padding-right: 20px;
  padding-left: 20px;
  .clearfix();
}

With the groundwork laid, I went through all the grid_/s_grid_ classes in use and replaced them with purely in-CSS semantic mixins. So if a block had a grid class before, now it only had a semantic ID or class, with the grid mixins applied to that selector.

Once the primary layout was replicated, I could make it "respond" to @media queries, using a responsive.less sheet. For example:

/* iPad in portrait, or any screen below 1000px */
@media only screen and (max-device-width: 1024px) and (orientation: portrait), screen and (max-width: 999px) {
  ...
}
 
/* very narrow browser, or iPhone -- note that <1000px styles above will apply here too! 
note: iPhone in portrait is 320px wide, in landscape is 480px wide */
@media only screen and (max-device-width: 480px), only screen and (-webkit-min-device-pixel-ratio: 2), screen and (max-width: 499px) {
  ...
}
 
/* iPhone - portrait */
@media only screen and (max-device-width: 480px) and (max-width: 320px) {
  ...
}

Some vitals tools for the process:

  • Less.app (for Mac), or even better, the new CodeKit by the same author compiles and minifies the Less files instantly, so the HTML can refer to normal CSS files.
  • The iOS Simulator (part of XCode) and Android Emulator (with the Android SDK), to simulate how your responsive styles work on different devices. (Getting these set up is a project in itself).
  • To understand what various screen dimensions looked like, I added a simple viewport debugger to show the screen size in the corner of the page (written as a Drupal6/jQuery document-ready "behavior"; fills a #viewport-size element put separately in the template):
    Drupal.behaviors.viewportSize = function() {
      if (!$('#viewport-size').size()) return;
     
      Drupal.fillViewportSize = function() {
        $('#viewport-size').text( $(window).width() + 'x' + $(window).height() )
          .css('top', $('#admin-menu').height());
      };
      Drupal.fillViewportSize();
      $(window).bind('resize', function(event){
        Drupal.fillViewportSize();
      });  
    };

After three days of work, the layout is now entirely semantic, and the 960.gs stylesheet is gone. On a wide-screen monitor it looks exactly the same as before, but it now adapts to narrower screen sizes (you can see this by shrinking the window's width), and has special styles for iPad and iPhone (portrait and landscape), and was confirmed to work on a popular Android tablet. It'll be a continuing work in progress, but the experience is now much better on small devices, and the groundwork is laid for future tweaks or redesigns.

There are some downsides to this approach worth considering:

  • Mobile devices still load the full CSS and HTML needed for the "desktop" layout, even if not all the elements are shown. This is a problem for performance.
  • The stylesheets are enormous with all the mixins, compounding the previous issue. I haven't examined in depth how much of a problem this actually is, but I'll need to at some point.
  • The contents of the page can only change as much as the stylesheets allow. The order of elements can't change (unless their visible order can be manipulated with CSS floats).

To mitigate these and modify the actual content on mobile devices - to reduce the performance overhead, load smaller images, or put less HTML on the page - would probably require backend modifications that detect the user agent (perhaps using Browscap). I've been avoiding that approach until now, but with most of the work done on the CSS side, a hybrid backend solution is probably the next logical step. (For the images, Responsive Images could also help on the client side.)

See the new layout at work, and my links on responsive design. I'm curious to hear what other people do to solve these issues.

Added: It appears the javascript analog to media queries is media query lists, which are event-able. And here's an approach with media queries and CSS transition events.

Nov 25 '11 11:30am
Tags

Generate pager (previous/next) links for an old Blogspot blog using Node.js

I have an old Blogspot blog that still gets a lot of traffic, but it was very hard to navigate without links from post to post. The template uses an old version of their templating language, and doesn’t have any tags available to generate pager links within Blogger.

So I wrote a node app called Blogger Pager (code on Github) to generate the links, loaded client-side via AJAX.


How it works

  1. Export your blog from Blogspot using the Export functionality. You’ll get a big XML file.
  2. Check out this code on a server with node.js installed.
  3. Put the exported XML file into the root of this app, as blog-export.xml; or change the path in app.js.
  4. Run the app (node app.js, or with forever).
  5. The module in posts.js will parse the XML file and generate an in-memory array of all the post URLs and titles. (Uses the xml2js library, after trying 3 others that didn’t work as well/easily.)
  6. The module in server.js will respond to HTTP requests (by default on port 3003, set in server.js):
    • /pager handles JSONP requests with a ?url parameter, returning a JSON object of the surrounding posts.
    • /posts returns an HTML page of all the parsed posts.
  7. The client-side script depends on jQuery, so make sure your blog template is loading that:
    • e.g. &lt;script src='//ajax.googleapis.com/ajax/libs/jquery/1.X.X/jquery.min.js' /&gt;
  8. In your blog template, load the client-side script in this app, exposed at /js/blog-pager-client.js.
  9. Change the URL (var url…) in the client-side script to the URL of your node app.
  10. Save the template, load a post page. (To debug, comment out the return in bloggerPagerLog() and open the browser console.)
  11. Customize the generated HTML in the client-side addPagerForPost() function or style with CSS.


Known Limitations

  1. Only works with a blog export; if your blog is still getting new content, this won’t read the RSS.


Enjoy!
https://github.com/newleafdigital/blogger_pager

Oct 30 '11 7:21pm

Tilt, 3D Dom Inspector for Firefox

HTML pages appear in the browser in 2 dimensions, but there are actually 2 additional dimensions to the DOM: the hierarchy of elements, and the z-index. A new DOM inspector for Firefox called Tilt displays the DOM in 3 dimensions, showing the hierarchy of elements (not sure about the z-index). This is what the homepage of AntiquesNearMe.com looks like in 3D. Pretty cool.

Oct 30 '11 6:42pm
Tags

Great list of new dev tools from Smashing Magazine

Here's a really superb list of "coding tools and javascript libraries for web developers" from Smashing Magazine. Some of the ones I'll be playing with in the next few days:

  • Tilt, Firefox DOM inspection in 3D.

  • Money.js, an open source API and JS library for currency exchange rates (the code on this will also be good for improving my Node.js techniques)

  • Bootstrap development toolkit

  • IE VMs - might cover more versions of the evil browser than the VirtualBox I use now.

  • a bunch of remote debuggers for mobile, haven't tried them yet

  • Less App - I've been using this one for a long time

  • Has.js - test your JS environment for available constructs.

and a whole bunch of others. After reading that post, I also added Smashing Magazine to my RSS reader.