Tech Blog :: node.js


Apr 29 '12 11:06am

Liberate your Drupal data for a service-oriented architecture (using Redis, Node.js, and MongoDB)

Drupal's basic content unit is a "node," and to build a single node (or to perform any other Drupal activity), the codebase has to be bootstrapped, and everything needed to respond to the request (configuration, database and cache connections, etc) has to be initialized and loaded into memory from scratch. Then node_load runs through the NodeAPI hooks, multiple database queries are run, and the node is built into a single PHP object.

This is fine if your web application runs entirely through Drupal, and always will, but what if you want to move toward a more flexible Service-oriented architecture (SOA), and share your content (and users) with other applications? For example, build a mobile app with a Node.js backend like LinkedIn did; or calculate analytics for business intelligence; or have customer service reps talk to your customers in real-time; or integrate with a ticketing system; or do anything else that doesn't play to Drupal's content-publishing strengths. Maybe you just want to make your data (which is the core of your business, not the server stack) technology-agnostic. Maybe you want to migrate a legacy Drupal application to a different system, but the cost of refactoring all the business logic is prohibitive; with an SOA you could change the calculation and get the best of both worlds.

The traditional way of doing this was setting up a web service in Drupal using something like the Services module. External applications could request data over HTTP, and Drupal would respond in JSON. Each request has to wait for Drupal to bootstrap, which uses a lot of memory (every enterprise Drupal site I've ever seen has been bogged down by legacy code that runs on every request), so it's slow and doesn't scale well. Rather than relieving some load from Drupal's LAMP stack by building a separate application, you're just adding more load to both apps. To spread the load, you have to keep adding PHP/Apache/Mysql instances horizontally. Every module added to Drupal compounds the latency of Drupal's hook architecture (running thousands of function_exists calls for example), so the stakeholders involved in changing the Drupal app has to include the users of every secondary application requesting the data. With a Drupal-Services approach, other apps will always be second-class citizens, dependent on the legacy system, not allowing the "loose coupling" principle of SOA.

I've been shifting my own work from Drupal to Node.js over the last year, but I still have large Drupal applications (such as Antiques Near Me) which can't be easily moved away, and frankly don't need to be for most use cases. Overall, I tend to think of Drupal as a legacy system, burdened by too much cruft and inconsistent architecture, and no longer the best platform for most applications. I've been giving a lot of thought to ways to keep these apps future-proof without rebuilding all the parts that work well as-is.

That led me to build what I've called the "Drupal Liberator". It consists of a Drupal module and a Node.js app, and uses Redis (a very fast key-value store) for a middleman queue and MongoDB for the final storage. Here's how it works:

  • When a node (or user, or other entity type) is saved in Drupal, the module encodes it to JSON (a cross-platform format that's also native to Node.js and MongoDB), and puts it, along with metadata (an md5 checksum of the JSON, timestamp, etc), into a Redis hash (a simple key-value object, containing the metadata and the object as a JSON string). It also notifies a Redis pub/sub channel of the new hash key. (This uses 13KB of additional memory and 2ms of time for Drupal on the first node, and 1KB/1ms for subsequent node saves on the same request. If Redis is down, Drupal goes on as usual.)

  • The Node.js app, running completely independently of Drupal, is listening to the pub/sub channel. When it's pinged with a hash key, it retrieves the hash, JSON.parse's the string into a native object, possibly alters it a little (e.g., adding the checksum and timestamp into the object), and saves it into MongoDB (which also speaks JSON natively). The data type (node, user, etc) and other information in the metadata directs where it's saved. Under normal conditions, this whole process from node_save to MongoDB takes less than a second. If it were to bottleneck at some point in the flow, the Node.js app runs asynchronously, not blocking or straining Drupal in any way.

  • For redundancy, the Node.js app also polls the hash namespace every few minutes. If any part of the mechanism breaks at any time, or to catch up when first installing it, the timestamp and checksum stored in each saved object allow the two systems to easily find the last synchronized item and continue synchronizing from there.

The result is a read-only clone of the data, synchronized almost instantaneously with MongoDB. Individual nodes can be loaded without bootstrapping Drupal (or touching Apache-MySql-PHP at all), as fully-built objects. New apps utilizing the data can be built in any framework or language. The whole Drupal site could go down and the data needed for the other applications would still be usable. Complex queries (for node retrieval or aggregate statistics) that would otherwise require enormous SQL joins can be built using MapReduce and run without affecting the Drupal database.

One example of a simple use case this enables: Utilize the CMS backend to edit your content, but publish it using a thin MongoDB layer and client-side templates. (And outsource comments and other user-write interactions to a service like Disqus.) Suddenly your content displays much faster and under higher traffic with less server capacity, and you don't have to worry about Varnish or your Drupal site being "Slashdotted".

A few caveats worth mentioning: First, it's read-only. If a separate app wants to modify the data in any way (and maintain data integrity across systems), it has to communicate with Drupal, or a synchronization bridge has to be built in the other direction. (This could be the logical next step in developing this approach, and truly make Drupal a co-equal player in an SOA.)

Second, you could have Drupal write to MongoDB directly and cut out the middlemen. (And indeed that might make more sense in a lot of cases.) But I built this with the premise of an already strained Drupal site, where adding another database connection would slow it down even further. This aims to put as little additional load on Drupal as possible, with the "Liberator" acting itself as an independent app.

Third, if all you needed was instant node retrieval - for example, if your app could query MySql for node ID's, but didn't want to bootstrap Drupal to build the node objects - you could leave them in Redis and take Node.js and MongoDB out of the picture.

I've just started exploring the potential of where this can go, so I've run this mostly as a proof-of-concept so far (successfully). I'm also not releasing the code at this stage: If you want to adopt this approach to evolve your Drupal system to a service-oriented architecture, I am available as a consultant to help you do so. I've started building separate apps in Node.js that tie into Drupal sites with Ajax and found the speed and flexibility very liberating. There's also a world of non-Drupal developers who can help you leverage your data, if it could be easily liberated. I see this as opening a whole new set of doors for where legacy Drupal sites can go.

Apr 13 '12 3:44pm

Cracking the cross-domain/Allow-Origin nut

The other day, I was setting up an Ajax feed loader - a node.js app pulling an RSS feed every few minutes and parsing it, and a Drupal site requesting a block of HTML from the node.js app via jQuery-ajax - and ran into a brick wall of cross-domain/origin (aka Cross Domain Resource Sharing) issues. This occurs any time you ajax-load something on a different subdomain or port from the main page it's loading into. (Even if you pipe the feed through the primary domain, using Varnish for example, if you use a separate hostname for your development site, or a local server, it'll break on those.)

In theory, it should be very simple to add an Access-Control-Allow-Origin header to the source app - the node.js app in this case - and bypass the restriction. In practice, it's not nearly so easy.

To get at the root of the problem and eliminate quirks in the particular app I was building, I set up 2 local virtualhosts with apache, and tried every combination until it worked.

Here are some problems I ran into, and solutions, to save the next person with this issue some time:

  • Access-Control-Allow-Origin is supposed to allow multiple domains to be set - as in http://sub1.domain.com http://sub2.domain.com - but no combination of these (separating with spaces, commas, or comma+space) actually worked. The solution to this is either allow all domains with * or dynamically set the domain to the origin on a per request basis. (In a typical node.js HTTP server for example, that's found at req.headers.origin - but that only exists if it's called via Ajax in another page.) The latter solution is fine when the source domain is always known, or every request hits the backend, but can be problematic if you're trying to run it on multiple endpoints, or through Varnish.
  • Chrome seems to have some bugs handling these situations, producing inconsistent results with the same environment.
  • The minimal working solution in the Apache experiment turned out to require, besides a valid Access-Control-Allow-Origin, this one header: Access-Control-Allow-Headers: X-Requested-With. (Apparently that's used only by Ajax/XmlHttpRequest requests, and without the server explicitly allowing that request header, the request fails.)
  • Before making the GET request for the content itself, some browsers make an OPTIONS request to verify the cross-domain permissions. Several other people running into these problems recommending including this header: Access-Control-Allow-Methods: OPTIONS, GET, POST. In the Apache experiment it wasn't necessary, but I put it in the final node.js app and it can't hurt.
  • Also from other people's recommendations, a more verbose version of Access-Control-Allow-Headers is possible, if not all necessary: Access-Control-Allow-Headers: Content-Type, Depth, User-Agent, X-File-Size, X-Requested-With, If-Modified-Since, X-File-Name, Cache-Control

Taking the lessons from the Apache experiment back to the node.js app, I used this code. It's written as an express middleware function (make sure to run it before app.router or any individual routes) The _ character refers to the underscore library.

app.use(function(req, res, next) {
  var headers = {
    'Cache-Control' : 'max-age:120'   // cache for 2m (in varnish and client)
  };
 
  // allowed origin?
  if (!_.isUndefined(req.headers.origin)) {
    // validate (primary, secondary, local-dev)
    if (req.headers.origin.match(/domain\.com/) 
    || req.headers.origin.match(/secondary\.domain\.com/) 
    || req.headers.origin.match(/domain\.local/)) {
      headers = _.extend(headers, {
        'Access-Control-Allow-Origin': req.headers.origin
      , 'Access-Control-Allow-Methods': 'GET, POST, OPTIONS'
      , 'Access-Control-Allow-Headers': 'Content-Type, X-Requested-With, X-PINGOTHER'
      , 'Access-Control-Max-Age': 86400   // 1 day
      });
    }
  }
 
  _.each(headers, function(value, key) {
    res.setHeader(key, value);
  });
 
  next();
});
Jan 30 '12 8:48pm

Reducing coordinate density in KML files (with a node.js script)

Lately I've been tracking my bicycle rides with an Android GPS tracking app. The app exports to multiple formats, including Google Maps and KML. I wanted to take all my rides for a week and overlay them on a single map, but the coordinate density was too high - thousands of points for each ride - so GMaps was paginating the maps to reduce the density, and the overlays didn't work.

So I needed some way to reduce the coordinate density of the KML files, taking 1 out of every N points, so the route looked the same on the map but with less unnecessary detail.

I tried to find an existing tool to do this, but couldn't find one. So I started writing one as a node.js script (I'm trying to do everything platform-neutral these days in node). First I tried to actually parse the KML using various XML parsers - but the parsers stripped the line breaks between coordinates, so the format broke, and I realized I didn't really need to parse the format at all. I just needed to eliminate some of the lines.

The result is a very simple, functional KML Coordinate Density Reducer. It reads each line of a KML file, uses regex to determine if it's a coordinate line or not, and if it is a coordinate line, strip all but every Nth line, as specified in the shell parameters.

Using the script, I reduced each route from thousands of points to a few hundred, imported them all into a single map, and can view or embed the routes all at once.

Update: Someone wrote an adaptation that also checks the distance between points. (See tweet.)

Jan 17 '12 11:00am
Tags

Why Node.js? Why clients should ask for it, and developers should build with it

Following my post about my new node.js apps, and since I would like to turn New Leaf Digital into a node.js shop, I should write a little about why you, perhaps a potential client with a web project, might want to have it built in node.

Node.js is only two years old, but already sustains a vast ecosystem of add-on modules, tutorials, and meetups. The energy in the community is palpable and is based on strong fundamentals. Working in Node brings out the best parts of web development. Node is built in javascript, a language every developer knows (at least a little bit), so the learning curve is not a deterrent. That's important to consider as a client because, unlike other systems that have peaked in their appeal to developers, you can build a Node.js application today and know its platform will be supported for the long haul.

Node is truly lightweight: Unlike bloated Swiss army knife frameworks that try to solve every problem out of the box at the expense of performance and comprehension, a Node app starts as a completely blank slate and is only as complex as you make it. So you'll get more bang for your server capacity buck from the get-go. (I've worked on several Drupal projects involving performance, getting each page to load faster by eliminating cruft and bottlenecks. In Node that whole way of thinking is flipped on its head.) Every tiny operation of your app is also light: the whole system is built on a philosophy of "asynchronous" input/output. Think of a node app as a juggler: while each ball is arcing through the air, it's catching and throwing other balls. Interactions don't "block" the flow like a traditional web application. So you don't run out of capacity until you're really out of capacity, and a bottleneck in one part of the system won't bring down the rest of it.

This asynchronous I/O also makes node.js especially suited to applications involving file handling, interaction with external web services (as in Flashcards), or real-time user interaction (as in Interactive Lists). These are much harder to scale on traditional platforms, because the operations make other processes wait around while they're off doing their work.

Node.js is also perfectly positioned to work with new database technologies, like MongoDB, which offer a flexibility not available with traditional SQL/relational databases. Node.js and MongoDB both "speak" the same language natively - javascript - so building or working with JSON APIs is easy. Architectures can be "rapidly prototyped" and changed on the fly as the application concept evolves.

So what is node.js not good for? If you want a robust content management system out of the box for a news publication, for example, you probably want to stick with a platform like Drupal. If you want a simple blog with easy content creation tools and comments and photos, you're still safe with Wordpress. If you're building software for banks to transfer money across the globe, there are probably battle-hardened, traditional ways to do that.

But for almost any other web app, node.js might just be the best toolkit to build with. So please let me know when you're plotting your next big web idea!

Jan 17 '12 10:00am
Tags

Apps.newleafdigital.com: Building a suite of apps in node.js

I just launched a suite of node.js apps at apps.newleafdigital.com. Included for public consumption are my Spanish Flashcards app, refactored so each user has their own flashcards, and a new Interactive Lists app, an expansion of a proof of concept I built using websockets. They're connected with a common layout and a shared authentication layer using Facebook Connect.

The main purpose of building these apps was to learn a complete node.js stack (more on that below) and gain experience coding and troubleshooting node.js apps.

The second purpose was to demonstrate production node.js code to prospective clients. New Leaf Digital is now officially a half-Drupal, half-Node.js shop (and happy to switch more than half to the latter if there is work to be had).

The third purpose (besides using the apps myself) was to allow others to use the apps, as well as to learn from the code. Anyone can login using their Facebook ID and all the code is on Github (under a CC license).

What do the apps do?

Spanish Flashcards lets you create flashcards of English-Spanish word translations. Randomly play your flashcards until you get them all right, and look up new words with the WordReference API.

Interactive Lists lets you create to-do lists, shopping lists, or any other kinds of list, and share your lists with your friends. As you add and remove items from the list, everyone else sees it immediately in real-time. Imagine a scavenger hunt in which everyone is tracking the treasure on their phones, or a family trip to the mall.

Auth (under the hood): a common authentication layer using Facebook Connect, which the other 2 user-facing apps (and the parent app) share.

How they're built

The stack consists of: node.js as the engine, Express for the web framework, Jade for templates, Mongoose for MongoDB modeling, socket.io for real-time two-way communication, everyauth + mongoose-auth for 3rd party authentication, connect-mongodb for session storage, async for readable control flow, underscore for language add-ons, http-proxy for a flexible router. Plus connect-less and Bootstrap for aesthetics. Forever keeps it running.

To bring the 4 apps (parent/HQ, auth, flashcards, lists) together, there were a few options: a parent app proxying to child apps running independently; virtual hosts (requiring separate DNS records); or using Connect/Express's "mounting" capability. Mounted apps were the most complex option, but offered the best opportunity to learn the deep innards of Express, and the proxy solution was unclear at the time, so I went with mounted apps.

Along the way I refactored constantly and hit brick walls dozens of times. In the end it all works (so far!), and the code makes sense. Since the parent app is a whole standalone server hogging its port, I added a thin proxy on top which points the subdomain to the app, keeping other subdomains on port 80 open for the future.

The app mounting functionality of Express.js is incredibly robust: using the same app.use() syntax as middleware, you can app.use(anotherApp), or even app.use('/path', anotherApp) to load a whole app at a sub-path. (Then the sub-app's routes all change relative to that "mount point".)

Of course in practice, mounting multiple apps is extremely complex. It's also not the most stable approach: a fatal error in any part of the suite will bring down the whole thing. So on a suite of "real" production apps, I wouldn't necessarily recommend the mounting model, but it's useful to understand. And when it works, it's very elegant.

Coming soon

Over the next few weeks, I'll be writing a series of blog posts about specific lessons learned from building these apps. In the meantime, I hope some people make good use of them, and please report bugs if you find any.

Next I'm going to write about Why Node.js? - why you, perhaps a potential client, or perhaps another developer, should consider building your next app in Node.

Nov 29 '11 2:06pm

Parse Drupal watchdog logs in syslog (using node.js script)

Drupal has the option of outputting its watchdog logs to syslog, the file-based core Unix logging mechanism. The log in most cases lives at /var/log/messages, and Drupal's logs get mixed in with all the others, so you need to cat /var/log/messages | grep drupal to filter.

But then you still have a big text file that's hard to parse. This is probably a "solved problem" many times over, but recently I had to parse the file specifically for 404'd URLs, and decided to do it (partly out of convenience but mostly to learn how) using Node.js (as a scripting language). Javascript is much easier than Bash at simple text parsing.

I put the code in a Gist, node.js script to parse Drupal logs in linux syslog (and find distinct 404'd URLs). The last few lines of URL filtering can be changed to any other specific use case you might have for reading the logs out of syslog. (This could also be used for reading non-Drupal syslogs, but the mapping applies keys like "URL" which wouldn't apply then.)

Note the comment at the top: to run it you'll need node.js and 2 NPM modules as dependencies. Then take your filtered log (using the greg method above) and pass it as a parameter, and read the output on screen or output with > to another file.

Nov 25 '11 12:30pm
Tags

Generate pager (previous/next) links for an old Blogspot blog using Node.js

I have an old Blogspot blog that still gets a lot of traffic, but it was very hard to navigate without links from post to post. The template uses an old version of their templating language, and doesn’t have any tags available to generate pager links within Blogger.

So I wrote a node app called Blogger Pager (code on Github) to generate the links, loaded client-side via AJAX.


How it works

  1. Export your blog from Blogspot using the Export functionality. You’ll get a big XML file.
  2. Check out this code on a server with node.js installed.
  3. Put the exported XML file into the root of this app, as blog-export.xml; or change the path in app.js.
  4. Run the app (node app.js, or with forever).
  5. The module in posts.js will parse the XML file and generate an in-memory array of all the post URLs and titles. (Uses the xml2js library, after trying 3 others that didn’t work as well/easily.)
  6. The module in server.js will respond to HTTP requests (by default on port 3003, set in server.js):
    • /pager handles JSONP requests with a ?url parameter, returning a JSON object of the surrounding posts.
    • /posts returns an HTML page of all the parsed posts.
  7. The client-side script depends on jQuery, so make sure your blog template is loading that:
    • e.g. <script src='//ajax.googleapis.com/ajax/libs/jquery/1.X.X/jquery.min.js' />
  8. In your blog template, load the client-side script in this app, exposed at /js/blog-pager-client.js.
  9. Change the URL (var url…) in the client-side script to the URL of your node app.
  10. Save the template, load a post page. (To debug, comment out the return in bloggerPagerLog() and open the browser console.)
  11. Customize the generated HTML in the client-side addPagerForPost() function or style with CSS.


Known Limitations

  1. Only works with a blog export; if your blog is still getting new content, this won’t read the RSS.


Enjoy!
https://github.com/newleafdigital/blogger_pager

Oct 16 '11 10:24pm

Exploring the node.js frontier

I have spent much of the last few weeks learning and coding in Node.js, and I'd like to share some of my impressions and lessons-learned for others starting out. If you're not familiar yet, Node.js is a framework for building server-side applications with asynchronous javascript. It's only two years old, but already has a vast ecosystem of plug-in "modules" and higher-level frameworks built on top of it.

My first application is a simple web app for learning Spanish using flashcards. The code is open on Github. The app utilizes basic CRUD (Create-Retrieve-Update-Delete) functionality (of "Words" in this case), form handling, authentication, input validation, and an end-user interface - i.e. the basic components of a web app. I'm using MongoDB for the database and Express.js (which sites on top of Connect, on top of Node) as the web framework. For templating I learned Jade, and for easier CSS I'm using LessCSS.

In the process of building it, I encountered numerous challenges and questions, some solved and many still open; found some great resources; and started to train my brain to think of server-side code asynchronously.

Node is a blank slate

Node "out of the box" isn't a web server like Apache; it's more of a language, like Ruby. You start with a blank slate, on top of which you can code a daemon, an IRC server, a process manager, or a blog - there's no automatic handling of virtualhosts, requests, responses, webroots, or any of the components that a LAMP stack (for example) assumes you want. The node community is building infrastructural components that can be dropped in, and I expect that the more I delve into the ecosystem, the more familiar I'll become with those components. At its core, however, Node is simply an API for asynchronous I/O methods.

No more linear flow

I'm used to coding in PHP, which involves linear instructions, each of them "blocking." Take this linear pseudocode snippet for CRUD operations on a "word" object, for example:

if (new word) {
  render an empty form
}
else if (editing existing word) {
  load the word
  populate the form
  render the form
}
else if (deleting existing word) {
  delete the word
  redirect back to list
}

This is easy to do with "blocking" code. Functions return values, discrete input-output functions can be reused in multiple situations, the returned values can be evaluated, each step follows from the previous one. This is convenient but limits performance: in a high-traffic PHP-MySql application, this flow takes up a server process, and if the database is responding slowly under the load, the whole process waits; concurrent processes quickly hog all the server's memory, and a bottleneck in one part of the stack stalls the whole application. In node, the rest of the operations in the "event loop" continue to run, waiting patiently for the database (or any other I/O) callback to respond.

Coding that way is not so easy, however. If you try to load the word, for instance, you run the query with an asynchronous callback. There is no return statement on the query function. The rest of the code has be nested inside that callback, or else the code will keep running and will never get the response. So that bit would look more like this:

load the word ( function(word) {
  populate the form
  render the form
});

But deeply nested code isn't as intuitive as linear code, and it can make function portability very difficult. Suppose you have to run 10 database queries to populate the form - nesting them all inside each other gets very messy, and what if the logic needs to be more conditional, requiring a different nesting order in different cases?

There are ways of handling these problems, of course, but I'm just starting to learn them. In the case of the simple "load the word" scenario, Express offers the app.param construct, which parses parameters in the URL before executing the route callback. So the :word token tells the app to load a word with a given ID into the request object, then it renders the form.

No more ignoring POST and GET

In PHP, if there's a form on a page, the same piece of code processes the page whether its HTTP method is POST or GET. The $_REQUEST array even combines their parameters. Express doesn't like that, however - there is an app.all() construct that ignores the method, but the framework seems to prefer separate app.get() and app.post() routing. (There's apparently some controversy/confusion over the additional method PUT, but I steered clear of that for now.)

Back to the "word form" scenario: I load the form with GET, but submit the form with POST. That's two routes with essentially duplicate code. I could simply save an entry on POST, or render the form with GET - but what if I want to validate the form, then it needs to render the form when a POST fails validation - so it quickly becomes a mess. Express offers some solutions for this - res.redirect('back') goes back to the previous URL - but that seems like a hack that doesn't suit every situation. You can see how I handled this here, but I haven't yet figured out the best general approach to this problem.

New code needs a restart

In a PHP application, you can edit or deploy the code directly to the webroot, and as soon as it's saved, the next request uses it. With node, however, the javascript is loaded into memory when the app is run using the node command, and it runs the same code until the application is restarted. In its simplest use, this involves a Ctrl+C to stop and node app.js to restart. There are several pitfalls here:

  • Sessions (and any other in-app memory items) are lost every time you restart. So anyone using your app is suddenly logged out. For sessions, this is resolved with a database or other external session store; I can imagine other scenarios where this would be more challenging.
  • An uncaught runtime bug can crash the app, and if it's running autonomously on a server, there's nothing built-in to keep it running. One approach to this is a process manager; I'm using forever, which was built especially for node, to keep processes running and restart them easily when I deploy new code. Others have built tools within Node that abstract an individual app's process through a separate process-managing app.

When should the database connect?

Node's architectural philosophy suggests that nothing should be loaded until it's needed. A database connection might not be needed on an empty form, for instance - so it makes sense to open a database connection per request, and only when needed. I tried this approach first, using a "route middleware" function to connect on certain requests, and separated the database handling into its own module. That failed when I wanted to keep track of session IDs with MongoDB (using connect-mongo) - because a database connection is then needed on every request, and the examples all opened a connection at the top of the app, in the global scope. I switched to the latter approach, but I'm not sure which way is better.

Javascript can get very complicated

  • As logic flows through nested callbacks, variable scope is constantly changing. var and this have to be watched very carefully.
  • Writing functions that work portably across use cases without simple return statements is tricky. (One nice Node convention that covers many of these scenarios is the callback(error, result) concept, allowing calling functions to know if the result came back successfully in a standard way.)
  • Passing logic flow across node's "modules" is also tricky. Closures are helpful here, passing the app object to route modules, for instance. But in many cases, it wasn't clear how to divide the code in a way that was simultaneously logical, preserved variable scope, and worked portably with callbacks.
  • Everything - functions, arrays, classes - is an object. Class inheritance is done by instantiating another class/object and then modifying the new object's prototype. The same object can have the equivalent of "static" functions (by assigning them directly to the object) or instantiated methods (by assigning them to prototype). It's easy to get confused.
  • Javascript is a little clunky with handling empty values. The standard approach still seems to be if (typeof x == "undefined") which, at the very least, is a lot of code to express if (x). I used Underscore.js to help with this and other basic object manipulation shortcuts.
  • Because Express processes the request until there's a clear end to the response, and because everything is asynchronous, it's easy to miss a scenario in the flow where something unpredictable happens, no response is sent, and the client/user's browser simply hangs waiting for a response. I don't know if this is bad on the node side - the hanging request probably uses very little resources, since it's not actively doing anything - but it means the code has to handle a lot of possible error scenarios. Unlike in blocking code, you can't just put a catch-all else at the end of the flow to handle the unknown.

What my Flashcards app does now

The Spanish Flashcards app currently allows words (with English, Spanish, and part of speech) to be entered, shown in a list, put into groups, and randomly cycled with only one side shown, as a flashcard quiz.
The app also integrates with the WordReference API to lookup a new word and enter it - however, as of now, there's a bug in the English-Spanish API that prevents definitions from being returned. So I tested it using the English-French dictionary, and hope they'll fix the Spanish one soon.
It's built now to require login, with a single password set in a plain-text config.js file.

Next Steps for the app

I'd like to build out the flashcard game piece, so it remembers what words have been played, lets the player indicate if he got the answer right or wrong, and prioritizes previously-wrong or unseen words over ones that the player already knows.

Where I want to go with Node.js

I've been working primarily with Drupal for several years, and I want to diversify for a number of reasons: I've become very frustrated with the direction of Drupal core development, and don't want all my eggs in that basket. Web applications are increasingly requiring real-time, high-concurrency, noSQL infrastructure, which Node is well-suited for and my LAMP/Drupal skillset is not. And maybe most importantly, I find the whole architecture of Node to be fascinating and exciting, and the open-source ecosystem around it is growing organically, extremely fast, and seemingly without top-down direction.

Some resources that helped me

(All of these and many more are in my node.js tag on Delicious.)

  • Nodejitsu docs - tutorials on conventions and some best practices.
  • Victor Kane's node introand Lit app, which taught me a lot about CRUD best practices.
  • The API documentation for node.js, connect, and express.js.
  • HowToNode - seems like a generally good node resource/blog.
  • NPM, the Node Package Manager, is critical for sharing portable components, and serves as a central directory of Node modules.
  • 2009 talk by Ryan Dahl, the creator of Node.js, introducing the framework.
  • Forms and express-form, two libraries for handling form rendering and/or validation. (I tried the former and decided not to use it, but they try to simplify a very basic problem.)

Check out the code for my Spanish Flashcards app, and if you're into Node yourself and want to learn more of it together, drop me a line!

Oct 26 '10 12:22am
Tags

Node.js tutorial

This is a wonderful tutorial: Learning Server-Side JavaScript with Node.js

Oct 25 '10 9:49pm

DevSeed's Data Browser built on Node.js and Mongo

It's a beautiful thing to see Development Seed, one of the most development-intensive Drupal shops, branching into Node.js and MongoDB, entirely away from Drupal.

In fact, I'm going to set up a Node.js server right now.