Drupal posts by Ben Buckman @ New Leaf Digital http://benbuckman.net/feed en Understanding MapReduce in MongoDB, with Node.js, PHP (and Drupal) http://benbuckman.net/tech/12/06/understanding-mapreduce-mongodb-nodejs-php-and-drupal <p>MongoDB's <a href="http://www.mongodb.org/display/DOCS/Advanced+Queries">query language</a> is good at extracting whole documents or whole elements of a document, but on its own it can't <a href="https://jira.mongodb.org/browse/SERVER-828">pull specific items</a> from deeply embedded arrays, or calculate relationships between data points, or calculate aggregates. To do that, MongoDB uses an <a href="http://www.mongodb.org/display/DOCS/MapReduce">implementation</a> of the <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a> methodology to iterate over the dataset and extract the desired data points. Unlike SQL <code>joins</code> in relational databases, which essentially create a massive combined dataset and then extract pieces of it, MapReduce <em>iterates</em> over each document in the set, "reducing" the data piecemeal to the desired results. The <a href="http://en.wikipedia.org/wiki/MapReduce">name</a> was popularized by Google, which needed to scale beyond SQL to index the web. Imagine trying to build the data structure for Facebook, with near-instantaneous calculation of the significance of every friend's friend's friend's posts, with SQL, and you see why MapReduce makes sense.</p> <p>I've been using MongoDB for two years, but only in the last few months starting using MapReduce heavily. MongoDB is also introducing a new <a href="http://www.mongodb.org/display/DOCS/Aggregation">Aggregation</a> framework in 2.1 that is supposed to simplify many operations that previously needed MapReduce. However, the latest <a href="http://www.mongodb.org/downloads">stable release</a> as of this writing is still 2.0.6, so Aggregation isn't officially ready for prime time (and I haven't used it yet).</p> <p>This post is not meant to substitute the copious <a href="http://www.mongodb.org/display/DOCS/MapReduce">documentation</a> and examples you can find across the web. After reading those, it still took me some time to wrap my head around the concepts, so I want to try to explain those as I came to understand them.</p> <h2>The Steps</h2> <p>A MapReduce operation consists of a <code>map</code>, a <code>reduce</code>, and optionally a <code>finalize</code> function. Key to understanding MapReduce is understanding what each of these functions iterates over.</p> <h3>Map</h3> <p>First, <code>map</code> runs for every document retrieved in the initial query passed to the operation. If you have 1000 documents and pass an empty query object, it will run 1000 times.</p> <p>Inside your <code>map</code> function, you <code>emit</code> a key-value pair, where the key is whatever you want to group by (_id, author, category, etc), and the value contains whatever pieces of the document you want to pass along. The function doesn't <code>return</code> anything, because you can <code>emit</code> multiple key-values per <code>map</code>, but a function can only <code>return</code> 1 result.</p> <p>The purpose of <code>map</code> is to extract small pieces of data from each document. For example, if you're counting articles per author, you could emit the author as the key and the number 1 as the value, to be summed in the next step.</p> <h3>Reduce</h3> <p>The <code>reduce</code> function then receives each of these key-value(s) pairs, for each key emitted from <code>map</code>, with the values in an array. Its purpose is to reduce multiple values-per-key to a single value-per-key. At the end of each iteration of your <code>reduce</code> function, you <code>return</code> (not <code>emit</code> this time) a single variable.</p> <p>The number of times <code>reduce</code> runs for a given operation isn't easy to predict. (I asked about it on <a href="http://stackoverflow.com/questions/11121299/mapreduce-with-mongodb-how-many-times-does-reduce-run">Stack Overflow</a> and the consensus so far is, there's no simple formula.) Essentially <code>reduce</code> runs as many times as it needs to, until each key appears only once. If you emit each key only once, reduce never runs. If you emit most keys once but one special key twice, reduce will run <em>once</em>, getting <code>(special key, [ value, value ])</code>.</p> <p>A rule of thumb with <code>reduce</code> is that the returned value's structure has to be the same as the structure emitted from <code>map</code>. If you emit an object as the value from <code>map</code>, every key in that object has to be present in the object returned from <code>reduce</code>, and vice-versa. If you return an integer from <code>map</code>, return an integer from <code>reduce</code>, and so on. The basic reason is that (as noted above), <code>reduce</code> shouldn't be necessary if a key only appears once. The results of an entire map-reduce operation, run back through the same operation, should return the same results (that way huge operations can be sharded and map/reduced many times). And the output of any given <code>reduce</code> function, plugged back into <code>reduce</code> (as a single-item array), <em>needs to return the same value as went in</em>. (In CS lingo, <code>reduce</code> has to be idempotent. The documentation <a href="http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-ReduceFunction">explains</a> this in more technical detail.)</p> <p>Here's a simple JS test, using Node.js' <a href="http://nodejs.org/docs/latest/api/assert.html">assertion API</a>, to verify this. To use it, have your mapReduce operation export their methods for a separate test script to import and test:</p> <div class="geshifilter"><pre class="javascript geshifilter-javascript"><span class="co1">// this should export the map, reduce, [finalize] functions passed to MongoDB.</span> <span class="kw2">var</span> mr <span class="sy0">=</span> require<span class="br0">&#40;</span><span class="st0">'./mapreduce-query'</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; <span class="co1">// override emit() to capture locally</span> <span class="kw2">var</span> emitted <span class="sy0">=</span> <span class="br0">&#91;</span><span class="br0">&#93;</span><span class="sy0">;</span> &nbsp; <span class="co1">// (in global scope so map can access it)</span> global.<span class="me1">emit</span> <span class="sy0">=</span> <span class="kw2">function</span><span class="br0">&#40;</span>key<span class="sy0">,</span> val<span class="br0">&#41;</span> <span class="br0">&#123;</span> emitted.<span class="me1">push</span><span class="br0">&#40;</span><span class="br0">&#123;</span>key<span class="sy0">:</span>key<span class="sy0">,</span> value<span class="sy0">:</span>val<span class="br0">&#125;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span><span class="sy0">;</span> &nbsp; <span class="co1">// reduce input should be same as output for a single object</span> <span class="co1">// dummyItems can be fake or loaded from DB</span> mr.<span class="me1">map</span>.<span class="me1">call</span><span class="br0">&#40;</span>dummyItems<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; <span class="kw2">var</span> reduceRes <span class="sy0">=</span> mr.<span class="me1">reduce</span><span class="br0">&#40;</span>emitted<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span>.<span class="me1">key</span><span class="sy0">,</span> <span class="br0">&#91;</span> emitted<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span>.<span class="me1">value</span> <span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">;</span> assert.<span class="me1">deepEqual</span><span class="br0">&#40;</span>reduceRes<span class="sy0">,</span> emitted<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span>.<span class="me1">value</span><span class="sy0">,</span> <span class="st0">'reduce is idempotent'</span><span class="br0">&#41;</span><span class="sy0">;</span></pre></div> <p>A simple MapReduce example is to count the number of posts per author. So in <code>map</code> you could <code>emit('author name', 1)</code> for each document, then in <code>reduce</code> loop over each value and add it to a total. Make sure <code>reduce</code> is adding the actual number in the value, not just 1, because that won't be idempotent. Similarly, you can't just <code>return values.length</code> and assume each value represents 1 document.</p> <h3>Finalize</h3> <p>Now you have a single reduced value per key, which get run through the <code>finalize</code> function once per key.</p> <p>To understand <code>finalize</code>, consider that this is essentially the same as not having a <code>finalize</code> function at all:</p> <div class="geshifilter"><pre class="javascript geshifilter-javascript"><span class="kw2">var</span> finalize <span class="sy0">=</span> <span class="kw2">function</span><span class="br0">&#40;</span>key<span class="sy0">,</span> value<span class="br0">&#41;</span> <span class="br0">&#123;</span> <span class="kw1">return</span> value<span class="sy0">;</span> <span class="br0">&#125;</span></pre></div> <p><code>finalize</code> is not necessary in every MapReduce operation, but it's very useful, for example, for calculating averages. You can't calculate the average in <code>reduce</code> because it can run multiple times per key, so each iteration doesn't have enough data to calculate with.</p> <p>The final results returned from the operation will have one value per key, as returned from <code>finalize</code> if it exists, or from <code>reduce</code> if <code>finalize</code> doesn't exist.</p> <h2>MapReduce in PHP and Drupal</h2> <p>The <a href="http://php.net/manual/en/class.mongodb.php">MongoDB library for PHP</a> does not include any special functions for MapReduce. They can be run simply as a generic <a href="http://www.php.net/manual/en/mongodb.command.php"><code>command</code></a>, but that takes a lot of code. I found a <a href="https://github.com/infynyxx/MongoDB-MapReduce-PHP">MongoDB-MapReduce-PHP</a> library on Github which makes it easier. It works, but hasn't been updated in two years, so I forked the library and created <a href="https://github.com/newleafdigital/MongoDB-MapReduce-PHP">my own version</a> with what I think are some improvements.</p> <p>The original library by <a href="https://github.com/infynyxx">infynyxx</a> created an abstract class <code>XMongoCollection</code> that was meant to be <a href="https://github.com/infynyxx/MongoDB-MapReduce-PHP/blob/master/examples/animal_tags2.php">sub-classed</a> for every collection. I found it more useful to make <code>XMongoCollection</code> directly instantiable, as an extended <em>replacement</em> for the basic <code>MongoCollection</code> class. I added a <code>mapReduceData</code> method which returns the data from the MapReduce operation. For my Drupal application, I added a <code>mapReduceDrupal</code> method which wraps the results and error handling in Drupal API functions.</p> <p>I could then load every collection with <code>XMongoCollection</code> and run <code>mapReduce</code> operations on it directly, like any other query. Note that the actual functions passed to MongoDB are still written in Javascript. For example:</p> <div class="geshifilter"><pre class="php geshifilter-php"><span class="co1">// (this should be statically cached in a separate function)</span> <span class="re0">$mongo</span> <span class="sy0">=</span> <span class="kw2">new</span> Mongo<span class="br0">&#40;</span><span class="re0">$server_name</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="co1">// connection</span> <span class="re0">$mongodb</span> <span class="sy0">=</span> <span class="re0">$mongo</span><span class="sy0">-&gt;</span><span class="me1">selectDB</span><span class="br0">&#40;</span><span class="re0">$db_name</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="co1">// MongoDB instance</span> &nbsp; <span class="co1">// use the new XMongoCollection class. make it available with an __autoloader.</span> <span class="re0">$collection</span> <span class="sy0">=</span> <span class="kw2">new</span> XMongoCollection<span class="br0">&#40;</span><span class="re0">$mongodb</span><span class="sy0">,</span> <span class="re0">$collection_name</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; <span class="re0">$map</span> <span class="sy0">=</span> <span class="co3">&lt;&lt;&lt;MAP function() { // doc is 'this' emit(this.category, 1); } MAP</span><span class="sy0">;</span> &nbsp; <span class="re0">$reduce</span> <span class="sy0">=</span> <span class="co3">&lt;&lt;&lt;REDUCE function(key, vals) { // have `variable` here passed in `setScope` return something; } REDUCE</span><span class="sy0">;</span> &nbsp; <span class="re0">$mr</span> <span class="sy0">=</span> <span class="kw2">new</span> MongoMapReduce<span class="br0">&#40;</span><span class="re0">$map</span><span class="sy0">,</span> <span class="re0">$reduce</span><span class="sy0">,</span> <span class="kw3">array</span><span class="br0">&#40;</span> <span class="coMULTI">/* limit initial document set with a query here */</span> <span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; <span class="co1">// optionally pass variables to the functions. (e.g. to apply user-specified filters)</span> <span class="re0">$mr</span><span class="sy0">-&gt;</span><span class="me1">setScope</span><span class="br0">&#40;</span><span class="kw3">array</span><span class="br0">&#40;</span><span class="st_h">'variable'</span> <span class="sy0">=&gt;</span> <span class="re0">$variable</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; <span class="co1">// 2nd param becomes the temporary collection name, so tmp_mapreduce_example. </span> <span class="co1">// (This is a little messy and could be improved. Stated limitation of v1.8+ not supporting &quot;inline&quot; results is not entirely clear.)</span> <span class="co1">// 3rd param is $collapse_value, see code</span> <span class="re0">$result</span> <span class="sy0">=</span> <span class="re0">$collection</span><span class="sy0">-&gt;</span><span class="me1">mapReduceData</span><span class="br0">&#40;</span><span class="re0">$mr</span><span class="sy0">,</span> <span class="st_h">'example'</span><span class="sy0">,</span> <span class="kw4">FALSE</span><span class="br0">&#41;</span><span class="sy0">;</span></pre></div> <h2>MapReduce in Node.js</h2> <p>The <a href="https://github.com/mongodb/node-mongodb-native">MongoDB-Native driver for Node.js</a>, now an official 10Gen-sponsored project, includes a <a href="https://github.com/mongodb/node-mongodb-native/blob/master/lib/mongodb/collection.js#L1040"><code>collection.mapReduce()</code> method</a>. The syntax is like this:</p> <div class="geshifilter"><pre class="javascript geshifilter-javascript">&nbsp; <span class="kw2">var</span> db <span class="sy0">=</span> <span class="kw2">new</span> mongodb.<span class="me1">Db</span><span class="br0">&#40;</span>dbName<span class="sy0">,</span> <span class="kw2">new</span> mongodb.<span class="me1">Server</span><span class="br0">&#40;</span>mongoHost<span class="sy0">,</span> mongoPort<span class="sy0">,</span> <span class="br0">&#123;</span><span class="br0">&#125;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span> db.<span class="kw3">open</span><span class="br0">&#40;</span><span class="kw2">function</span><span class="br0">&#40;</span>error<span class="sy0">,</span> dbClient<span class="br0">&#41;</span> <span class="br0">&#123;</span> <span class="kw1">if</span> <span class="br0">&#40;</span>error<span class="br0">&#41;</span> <span class="kw1">throw</span> error<span class="sy0">;</span> dbClient.<span class="me1">collection</span><span class="br0">&#40;</span>collectionName<span class="sy0">,</span> <span class="kw2">function</span><span class="br0">&#40;</span>err<span class="sy0">,</span> collection<span class="br0">&#41;</span> <span class="br0">&#123;</span> collection.<span class="me1">mapReduce</span><span class="br0">&#40;</span>map<span class="sy0">,</span> reduce<span class="sy0">,</span> <span class="br0">&#123;</span> out <span class="sy0">:</span> <span class="br0">&#123;</span> inline <span class="sy0">:</span> <span class="nu0">1</span> <span class="br0">&#125;</span><span class="sy0">,</span> query<span class="sy0">:</span> <span class="br0">&#123;</span> ... <span class="br0">&#125;</span><span class="sy0">,</span> <span class="co1">// limit the initial set (optional)</span> finalize<span class="sy0">:</span> finalize<span class="sy0">,</span> <span class="co1">// function (optional)</span> verbose<span class="sy0">:</span> <span class="kw2">true</span> <span class="co1">// include stats</span> <span class="br0">&#125;</span><span class="sy0">,</span> <span class="kw2">function</span><span class="br0">&#40;</span>error<span class="sy0">,</span> results<span class="sy0">,</span> stats<span class="br0">&#41;</span> <span class="br0">&#123;</span> <span class="co1">// stats provided by verbose</span> <span class="co1">// ...</span> <span class="br0">&#125;</span> <span class="br0">&#125;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span><span class="br0">&#41;</span><span class="sy0">;</span></pre></div> <p>It's mostly similar to the <a href="http://www.mongodb.org/display/DOCS/MapReduce">command-line syntax</a>, except in the CLI, the results are <em>returned</em> from the <code>mapReduce</code> function, while in Node.js they are passed (asynchronously) to the callback.</p> <h3>MapReduce in Mongoose</h3> <p><a href="http://mongoosejs.com/">Mongoose</a> is a modeling layer on top of the MongoDB-native Node.js driver, and in the latest 2.x release does not have its own support for MapReduce. (It's supposed to be coming in 3.x.) But the underlying collection is still available:</p> <div class="geshifilter"><pre class="javascript geshifilter-javascript"><span class="kw2">var</span> db <span class="sy0">=</span> mongoose.<span class="me1">connect</span><span class="br0">&#40;</span><span class="st0">'mongodb://dbHost/dbName'</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="co1">// (db.connection.db is the native MongoDB driver)</span> &nbsp; <span class="co1">// build a model (`Book` is a schema object)</span> <span class="co1">// model is called 'Book' but collection is 'books'</span> mongoose.<span class="me1">model</span><span class="br0">&#40;</span><span class="st0">'Book'</span><span class="sy0">,</span> Book<span class="sy0">,</span> <span class="st0">'books'</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; ... &nbsp; <span class="kw2">var</span> Book <span class="sy0">=</span> db.<span class="me1">model</span><span class="br0">&#40;</span><span class="st0">'Book'</span><span class="br0">&#41;</span><span class="sy0">;</span> Book.<span class="me1">collection</span>.<span class="me1">mapReduce</span><span class="br0">&#40;</span>...<span class="br0">&#41;</span><span class="sy0">;</span></pre></div> <p>(I actually think this is a case of Mongoose being <em>better</em> without its own abstraction on top of the existing driver, so I hope the new release doesn't make it more complex.)</p> <h2>In sum</h2> <p>I initially found MapReduce very confusing, so hopefully this helps clarify rather than increase the confusion. Please write in the comments below if I've misstated or mixed up anything above.</p> http://benbuckman.net/tech/12/06/understanding-mapreduce-mongodb-nodejs-php-and-drupal#comments drupal mongodb node.js php Wed, 20 Jun 2012 19:06:53 +0000 ben 7884 at http://benbuckman.net Liberate your Drupal data for a service-oriented architecture (using Redis, Node.js, and MongoDB) http://benbuckman.net/tech/12/04/liberate-your-drupal-data-service-oriented-architecture-using-redis-nodejs-and-mongodb <p>Drupal's basic content unit is a "node," and to build a single node (or to perform any other Drupal activity), the codebase has to be bootstrapped, and everything needed to respond to the request (configuration, database and cache connections, etc) has to be initialized and loaded into memory from scratch. Then <span class="geshifilter"><code class="text geshifilter-text">node_load</code></span> runs through the NodeAPI hooks, multiple database queries are run, and the node is built into a single PHP object.</p> <p>This is fine if your web application runs entirely through Drupal, and always will, but what if you want to move toward a more flexible <a href="http://en.wikipedia.org/wiki/Service-oriented_architecture">Service-oriented architecture</a> (SOA), and share your content (and users) with other applications? For example, build a mobile app with a Node.js backend like <a href="http://venturebeat.com/2011/08/16/linkedin-node/">LinkedIn did</a>; or calculate analytics for business intelligence; or have customer service reps talk to your customers in real-time; or integrate with a ticketing system; or do anything else that doesn't play to Drupal's content-publishing strengths. Maybe you just want to make your data (which is the core of your business, not the server stack) technology-agnostic. Maybe you want to migrate a legacy Drupal application to a different system, but the cost of refactoring all the business logic is prohibitive; with an SOA you could change the calculation and get the best of both worlds.</p> <p>The traditional way of doing this was setting up a web service in Drupal using something like the <a href="http://drupal.org/project/services">Services module</a>. External applications could request data over HTTP, and Drupal would respond in JSON. Each request has to wait for Drupal to bootstrap, which uses a lot of memory (every enterprise Drupal site I've ever seen has been bogged down by legacy code that runs on every request), so it's slow and doesn't scale well. Rather than relieving some load from Drupal's LAMP stack by building a separate application, you're just adding more load to both apps. To spread the load, you have to keep adding PHP/Apache/Mysql instances horizontally. Every module added to Drupal compounds the latency of Drupal's hook architecture (running thousands of <span class="geshifilter"><code class="text geshifilter-text">function_exists</code></span> calls for example), so the stakeholders involved in changing the Drupal app has to include the users of every secondary application requesting the data. With a Drupal-Services approach, other apps will always be second-class citizens, dependent on the legacy system, not allowing the "loose coupling" principle of SOA.</p> <p>I've been shifting <a href="http://newleafdigital.com">my own work</a> from Drupal to <a href="http://nodejs.org">Node.js</a> over the last year, but I still have large Drupal applications (such as <a href="http://antiquesnearme.com">Antiques Near Me</a>) which can't be easily moved away, and frankly don't need to be for most use cases. Overall, I tend to think of Drupal as a legacy system, burdened by <a href="http://benbuckman.net/drupal-excessive-complexity">too much cruft</a> and inconsistent architecture, and no longer the best platform for most applications. I've been giving a lot of thought to ways to keep these apps future-proof without rebuilding all the parts that work well as-is.</p> <p>That led me to build what I've called the <strong>"Drupal Liberator"</strong>. It consists of a Drupal module and a Node.js app, and uses <a href="http://redis.io">Redis</a> (a very fast key-value store) for a middleman queue and <a href="http://www.mongodb.org">MongoDB</a> for the final storage. Here's how it works:</p> <ul> <li><p>When a node (or user, or other entity type) is saved in Drupal, the module encodes it to JSON (a cross-platform format that's also native to Node.js and MongoDB), and puts it, along with metadata (an md5 checksum of the JSON, timestamp, etc), into a Redis <a href="http://redis.io/topics/data-types">hash</a> (a simple key-value object, containing the metadata and the object as a JSON string). It also notifies a Redis <a href="http://redis.io/topics/pubsub">pub/sub channel</a> of the new hash key. (This uses 13KB of additional memory and 2ms of time for Drupal on the first node, and 1KB/1ms for subsequent node saves on the same request. If Redis is down, Drupal goes on as usual.)</p></li> <li><p>The Node.js app, running completely independently of Drupal, is listening to the pub/sub channel. When it's pinged with a hash key, it retrieves the hash, <span class="geshifilter"><code class="text geshifilter-text">JSON.parse</code></span>'s the string into a native object, possibly alters it a little (e.g., adding the checksum and timestamp into the object), and saves it into MongoDB (which also speaks JSON natively). The data type (node, user, etc) and other information in the metadata directs where it's saved. Under normal conditions, this whole process from <span class="geshifilter"><code class="text geshifilter-text">node_save</code></span> to MongoDB takes less than a second. If it were to bottleneck at some point in the flow, the Node.js app runs asynchronously, not blocking or straining Drupal in any way.</p></li> <li><p>For redundancy, the Node.js app also polls the hash namespace every few minutes. If any part of the mechanism breaks at any time, or to catch up when first installing it, the timestamp and checksum stored in each saved object allow the two systems to easily find the last synchronized item and continue synchronizing from there.</p></li> </ul> <p>The result is a read-only clone of the data, synchronized almost instantaneously with MongoDB. Individual nodes can be loaded without bootstrapping Drupal (or touching Apache-MySql-PHP at all), as fully-built objects. New apps utilizing the data can be built in any framework or language. The whole Drupal site could go down and the data needed for the other applications would still be usable. Complex queries (for node retrieval or aggregate statistics) that would otherwise require enormous SQL joins can be built using <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a> and run without affecting the Drupal database.</p> <p>One example of a simple use case this enables: Utilize the CMS backend to edit your content, but publish it using a thin MongoDB layer and client-side templates. (And outsource comments and other user-write interactions to a service like Disqus.) Suddenly your content displays much faster and under higher traffic with less server capacity, and you don't have to worry about <a href="http://benbuckman.net/tech/tag/varnish">Varnish</a> or your Drupal site being "<a href="http://en.wikipedia.org/wiki/Slashdot_effect">Slashdotted</a>".</p> <p>A few caveats worth mentioning: First, it's read-only. If a separate app wants to modify the data in any way (and maintain data integrity across systems), it has to communicate with Drupal, or a synchronization bridge has to be built in the other direction. (This could be the logical next step in developing this approach, and truly make Drupal a co-equal player in an SOA.)</p> <p>Second, you could have Drupal write to MongoDB directly and cut out the middlemen. (And indeed that might make more sense in a lot of cases.) But I built this with the premise of an already strained Drupal site, where adding another database connection would slow it down even further. This aims to put as little additional load on Drupal as possible, with the "Liberator" acting itself as an independent app.</p> <p>Third, if all you needed was instant node retrieval - for example, if your app could query MySql for node ID's, but didn't want to bootstrap Drupal to build the node objects - you could leave them in Redis and take Node.js and MongoDB out of the picture.</p> <p>I've just started exploring the potential of where this can go, so I've run this mostly as a proof-of-concept so far (successfully). I'm also not releasing the code at this stage: If you want to adopt this approach to evolve your Drupal system to a service-oriented architecture, I am <a href="http://newleafdigital.com">available as a consultant</a> to help you do so. I've started building separate apps in Node.js that tie into Drupal sites with <a href="http://en.wikipedia.org/wiki/Ajax_(programming)">Ajax</a> and found the speed and flexibility very liberating. There's also a world of non-Drupal developers who can help you leverage your data, if it could be easily liberated. I see this as opening a whole new set of doors for where legacy Drupal sites can go.</p> http://benbuckman.net/tech/12/04/liberate-your-drupal-data-service-oriented-architecture-using-redis-nodejs-and-mongodb#comments drupal mongodb node.js redis SOA Sun, 29 Apr 2012 15:06:20 +0000 ben 7767 at http://benbuckman.net Unconventional unit testing in Drupal 6 with PhpUnit, upal, and Jenkins http://benbuckman.net/tech/11/12/unconventional-unit-testing-drupal-6-phpunit-upal-and-jenkins <p>Unit testing in Drupal using the standard <a href="http://drupal.org/project/simpletest">SimpleTest</a> approach has long been one of my pain points with Drupal apps. The main obstacle was setting up a realistic test "sandbox": The SimpleTest module builds a virtual site with a temporary database (within the existing database), from scratch, for every test suite. To accurately test the complex interactions of a real application, you need dozens of modules enabled in the sandbox, and installing all their database schemas takes a long time. If your site's components are exported to <a href="http://drupal.org/project/features">Features</a>, the tests gain another level of complexity. You could have the test turn on every module that's enabled on the real site, but then each suite takes 10 minutes to run. And that still isn't enough; you also need settings from the variables table, content types real nodes and users, etc.</p> <p>So until recently, it came down to the choice: make simple but unrealistic sandboxes that tested minutia but not the big-picture interactions; or build massive sandboxes for each test that made the testing workflow impossible. After weeks of trying to get a SimpleTest environment working on a Drupal 6 application with a lot of custom code, and dozens of hours debugging the tests or the sandbox setups rather than building new functionality, I couldn't justify the time investment, and shelved the whole effort.</p> <p>Then Moshe Weizman pointed me to his alternate <a href="https://github.com/weitzman/upal">upal</a> project, which aims to bring the <a href="https://github.com/sebastianbergmann/phpunit/">PHPUnit</a> testing framework to Drupal, with backwards compatibility for SimpleTest assertions, but not the baggage of SimpleTest's Drupal implementation. Moshe <a href="https://acquia.com/upal">recently introduced upal</a> as a proposed testing framework for Drupal 8, especially for core. Separately, a few weeks ago, I started using upal for a different purpose: as a unit testing framework for custom applications in Drupal 6.</p> <p>I <a href="https://github.com/newleafdigital/upal">forked the Github repo</a>, started a backport to D6 (copying from SimpleTest-6 where upal was identical to SimpleTest-7), and fixed some of the holes. More importantly, I'm taking a very different approach to the testing sandbox: I've set up an entirely separate test site, copied wholesale from the dev site (which itself is copied from the production site). This means:</p> <ul> <li>I can visually check the test sandbox at any time, because it runs as a virtualhost just like the dev site.</li> <li>All the modules, settings, users, and content are in place for each test, and don't need to be created or torn down.</li> <li>Rebuilding the sandbox is a single operation (with shell scripts to sync MySql, MongoDB, and files, manually triggered in Jenkins)</li> <li>Cleanup of test-created objects occurs (if desired) on a piecemeal basis in <code>tearDown()</code> - <code>drupalCreateNode()</code> (modified) and <code>drupalVariableSet()</code> (added) optionally undo their changes when the test ends.</li> <li><code>setUp()</code> is not needed for most tests at all.</li> <li><code>dumpContentsToFile()</code> (added) replicates SimpleTest's ability to save <code>curl</code>'d files, but on a piecemeal basis in the test code.</li> <li>Tests run <em>fast</em>, and accurately reflect the entirety of the site with all its actual interactions.</li> <li>Tests are run by the <a href="http://jenkins-ci.org/">Jenkins</a> continuous-integration tool and the results are visible in Jenkins using the JUnit xml format.</li> </ul> <h3>How to set it up (with Jenkins, aka Hudson)</h3> <p><em>(Note: the following are not comprehensive instructions, and assume familiarity with shell scripting and an existing installation of Jenkins.)</em></p> <ol> <li>Install upal from <a href="https://github.com/weitzman/upal">Moshe's repo</a> (D7) or <a href="https://github.com/newleafdigital/upal">mine</a> (D6). (Some of the details below I added recently, and apply only to the D6 fork.)</li> <li>Install PHPUnit. The <code>pear</code> approach is easiest.</li> <li>Upgrade drush: the notes say, "You currently need 'master' branch of drush after 2011.07.21. Drush 4.6 will be OK - http://drupal.org/node/1105514" - this seems to correspond to the HEAD of the <code>7.x-4.x</code> branch in the <a href="http://drupal.org/project/drush/git-instructions">Drush repository</a>.</li> <li>Set up a webroot, database, virtualhost, DNS, etc for your test sandbox, and any scripts you need to build/sync it.</li> <li>Configure phpunit.xml. Start with upal's readme, then (D6/fork only) add <code>DUMP_DIR</code> (if wanted), and if HTTP authentication to the test site is needed, UPAL_HTTP_USER and UPAL_HTTP_PASS. In my version I've split the DrupalTestCase class to its own file, and renamed drupal_test_case.php to upal.php, so rename the "bootstrap" parameter accordingly. ** (note: the upal notes say it must run at a URL ending in /upal - this is no longer necessary with this approach.)</li> <li>PHPUnit expects the files to be named .php rather than .test - however if you explicitly call an individual .test file (rather than traversing a directory, the approach I took), it might work. You can also remove the <code>getInfo()</code> functions from your SimpleTests, as they don't do anything anymore.</li> <li>If Jenkins is on a different server than the test site (as in my case), make sure Jenkins can SSH over.</li> <li>To use <code>dumpContentsToFile()</code> or the XML results, you'll want a dump directory (set in phpunit.xml), and your test script should wipe the directory before each run, and rsync the files to the build workspace afterwards.</li> <li>To convert PHPUnit's JUnit output to the format Jenkins understands, you'll need the <a href="https://wiki.jenkins-ci.org/display/JENKINS/xUnit+Plugin">xUnit</a> plugin for Jenkins. Then point the Jenkins job to read the XML file (after rsync'ing if running remotely). [Note: the last 3 steps have to be done with SimpleTest and Jenkins too.]</li> <li>Code any wrapper scripts around the above steps as needed.</li> <li>Write some tests! (Consult the <a href="http://www.phpunit.de/manual/current/en/index.html">PHPUnit documentation</a>.)</li> <li>Run the tests!</li> </ol> <h3>Some issues I ran into (which you might also run into)</h3> <ol> <li>PHPUnit, unlike SimpleTest, stops a test function after the first failure. This isn't a bug, it's <a href="http://stackoverflow.com/questions/5651663/phpunit-multiple-assertions-in-a-single-test-only-first-failure-seen">expected behavior</a>, even with <code>--stop-on-failure</code> disabled. I'd prefer it the other way, but that's how it is.</li> <li>Make sure your test site - like any dev site - does not send any outbound mail to customers, run unnecessary feed imports, or otherwise perform operations not meant for a non-production site.</li> <li>In my case, Jenkins takes 15 minutes to restart (after installing xUnit for example). I don't know why, but keep an eye on the Jenkins log if it's taking you a while too.</li> <li>Also in my case, Jenkins runs behind an Apache reverse-proxy; in that case when Jenkins restarts, it's usually necessary to restart Apache, or else it gets stuck thinking the proxy endpoint is down.</li> <li>I ran into a bug with Jenkins stopping its shell script commands arbitrarily before the end. I worked around it by moving the whole job to a shell script on the Jenkins server (which in turn delegates to a script on the test/dev server).</li> </ol> <p>There is a pending <a href="https://github.com/weitzman/upal/pull/2">pull request</a> to pull some of the fixes and changes I made back into the original repo. In the pull request I've tried to separate what are merely fixes from what goes with the different test-site approach I've taken, but it's still a tricky merge. Feel free to help there, or make your own fork with a separate test site for D7.</p> <p>I now have a working test environment with PHPUnit and upal, with all of the tests I wrote months ago working again (minus their enormous <code>setUp()</code> functions), and I've started writing tests for new code going forward. Success!</p> <p><em>(If you are looking for a professional implementation of any of the above, please <a href="http://newleafdigital.com/contact">contact me</a>.)</em></p> <p><em>Recent related post: <a href="https://benbuckman.net/tech/11/12/making-sense-varnish-caching-rules">Making sense of Varnish caching rules</a></em></p> http://benbuckman.net/tech/11/12/unconventional-unit-testing-drupal-6-phpunit-upal-and-jenkins#comments drupal jenkins phpunit simpletest Wed, 14 Dec 2011 21:53:31 +0000 ben 7480 at http://benbuckman.net Making sense of Varnish caching rules http://benbuckman.net/tech/11/12/making-sense-varnish-caching-rules <p><a href="https://www.varnish-cache.org">Varnish</a> is a reverse-proxy cache that allows a site with a heavy backend (such as a Drupal site) and mostly consistent content to handle very high traffic load. The &#8220;cache&#8221; part refers to Varnish storing the entire output of a page in its memory, and the &#8220;reverse proxy&#8221; part means it functions as its own server, sitting in front of Apache and passing requests back to Apache only when necessary.</p> <p>One of the challenges with implementing Varnish, however, is the complex &#8220;VCL&#8221; protocol it uses to process requests with custom logic. The <a href="https://www.varnish-cache.org/docs/3.0/reference/vcl.html#syntax">syntax</a> is unusual, the <a href="https://www.varnish-cache.org/docs">documentation</a> relies heavily on complex examples, and there don&#8217;t seem to be any books or other comprehensive resources on the software. A recent link on the project site to <a href="http://kristianlyng.wordpress.com/2011/12/01/varnish-training/">Varnish Training</a> is just a pitch for a paid course. Searching more specifically for Drupal + Varnish will bring up many good results - including <a href="http://www.lullabot.com/articles/varnish-multiple-web-servers-drupal">Lullabot&#8217;s fantastic tutorial</a> from April, and older examples for Mercury - but the latest stable release is now 3.x and many of the examples (written for 2.x) <a href="https://www.varnish-cache.org/docs/3.0/installation/upgrade.html">don&#8217;t work</a> as written anymore. So it takes a lot of trial and error to get it all working.</p> <p>I&#8217;ve been running Varnish on <a href="http://antiquesnearme.com">AntiquesNearMe.com</a>, partly to keep our hosting costs down by getting more power out of less [virtual] hardware. A side benefit is the site&#8217;s ability to respond very nicely if the backend Apache server ever goes down. They&#8217;re on separate VPS's (connected via internal private networking), and if the Apache server completely explodes from memory overload, or I simply need to upgrade a server-related package, Varnish will display a themed &#8220;We&#8217;re down for a little while&#8221; message.</p> <p>But it wasn&#8217;t until recently that I got Varnish&#8217;s primary function, caching, really tuned. I spent several days under the hood recently, and while I don&#8217;t want to rehash what&#8217;s already been well covered in <a href="http://www.lullabot.com/articles/varnish-multiple-web-servers-drupal">Lullabot&#8217;s tutorial</a>, here are some other things I learned:</p> <h3>Check syntax before restarting</h3> <p>After you update your VCL, you need to restart Varnish - using <span class="geshifilter"><code class="text geshifilter-text">sudo /etc/init.d/varnish restart</code></span> for instance - for the changes to take effect. If you have a syntax error, however, this will take down your site. So check the syntax first (change the path to your VCL as needed):<br /> <span class="geshifilter"><code class="text geshifilter-text">varnishd -C -f /etc/varnish/default.vcl &gt; /dev/null</code></span></p> <p>If there are errors, it will display them; if not, it shows nothing. Use that as a visual check before restarting. (Unfortunately the exit code of that command is always 0, so you can&#8217;t do check-then-restart as simply as <span class="geshifilter"><code class="text geshifilter-text">check-varnish-syntax &amp;&amp; /etc/init.d/varnish restart</code></span>, but you could <span class="geshifilter"><code class="text geshifilter-text">grep</code></span> the output for the words &#8220;exit 1&#8221; to accomplish the same.)</p> <h3>Logging</h3> <p>The <span class="geshifilter"><code class="text geshifilter-text">std.log</code></span> function allows you to generate arbitrary messages about Varnish&#8217;s processing. Add <span class="geshifilter"><code class="text geshifilter-text">import std;</code></span> at the top of your VCL file, and then <span class="geshifilter"><code class="text geshifilter-text">std.log(&quot;DEV: some useful message&quot;)</code></span> anywhere you want. The &#8220;DEV&#8221; prefix is an arbitrary way of differentiating your logs from all the others. So you can then run in the shell, <span class="geshifilter"><code class="text geshifilter-text">varnishlog | grep &quot;DEV&quot;</code></span> and watch only the information you&#8217;ve chosen to see.</p> <p>How I use this:<br /> - At the top of <span class="geshifilter"><code class="text geshifilter-text">vcl_recv()</code></span> I put <span class="geshifilter"><code class="text geshifilter-text">std.log(&quot;DEV: Request to URL: &quot; + req.url);</code></span>, to put all the other logs in context.<br /> - When I <span class="geshifilter"><code class="text geshifilter-text">pipe</code></span> back to apache, I put <span class="geshifilter"><code class="text geshifilter-text">std.log(&quot;DEV: piping &quot; + req.url + &quot; straight back to apache&quot;);</code></span> before the <span class="geshifilter"><code class="text geshifilter-text">return (pipe);</code></span><br /> - On blocked URLs (cron, install), the same<br /> - On static files (images, JS, CSS), I put <span class="geshifilter"><code class="text geshifilter-text">std.log(&quot;DEV: Always caching &quot; + req.url);</code></span><br /> - To understand all the regex madness going on with cookies, I log <span class="geshifilter"><code class="text geshifilter-text">req.http.Cookie</code></span> at every step to see what&#8217;s changed.</p> <p>Plug some of these in, check the syntax, restart Varnish, run <span class="geshifilter"><code class="text geshifilter-text">varnishlog|grep PREFIX</code></span> as above, and watch as you hit a bunch of URLs in your browser. Varnish&#8217;s internal logic will quickly start making more sense.</p> <h3>Watch Varnish work with your browser</h3> <p><img src="http://dl.dropbox.com/u/3862873/varnish-headers-inspector.png" alt="Varnish headers in Chrome Inspector" style="float: right; margin: 0 0 1em 1em;"><br /> The Chrome/Safari Inspector and Firebug show the headers for every request made on a page. With Varnish running, look at the Response Headers for one of them: you&#8217;ll see &#8220;Via: Varnish&#8221; if the page was processed through Varnish, or &#8220;Server:Apache&#8221; if it went through Apache. (Using Chrome, for instance, login to your Drupal site and the page should load via Apache (assuming you see page elements not available to anonymous users), then open an Incognito window and it should run through Varnish.)</p> <h3>Add hit/miss headers</h3> <ul> <li>When a page is supposed to be cached (not <span class="geshifilter"><code class="text geshifilter-text">pipe</code></span>'d immediately), Varnish checks if there is an existing hit or miss. To watch this in your Inspector, use this logic:</li> </ul> <div class="geshifilter"> <pre class="text geshifilter-text">sub vcl_deliver { std.log(&quot;DEV: Hits on &quot; + req.url + &quot;: &quot; + obj.hits); &nbsp; if (obj.hits &gt; 0) { set resp.http.X-Varnish-Cache = &quot;HIT&quot;; } else { set resp.http.X-Varnish-Cache = &quot;MISS&quot;; } &nbsp; return (deliver); }</pre></div> <p>Then you can clear the caches, hit a page (using the browser technique above), see &#8220;via Varnish&#8221; and a MISS, hit it again, see a HIT (or not), and know if everything is working.</p> <h3>Clear Varnish when aggregated CSS+JS are rebuilt</h3> <p>If you have CSS/JS aggregation enabled (as recommended), your HTML source will reference long hash-string files. Varnish caches that HTML with the hash string. If you clear only those caches (&#8220;requisites&#8221; via Admin Menu or <span class="geshifilter"><code class="text geshifilter-text">cc css+js</code></span> via Drush), Varnish will still have the <em>old</em> references, but the files will have been deleted. Not good. You could simply never use that operation again, but that&#8217;s a little silly.</p> <p>The heavy-handed solution I came up with (I welcome alternatives) is to wipe the Varnish cache when CSS+JS resets. That operation is not hook-able, however, so you have to patch core. In common.js, <span class="geshifilter"><code class="text geshifilter-text">_drupal_flush_css_js()</code></span>, add:</p> <p> <div class="geshifilter"> <pre class="text geshifilter-text">if (module_exists('varnish') &amp;&amp; function_exists('varnish_purge_all_pages')) { varnish_purge_all_pages(); }</pre></div> </p> <p>This still keeps Memcache and other in-Drupal caches intact, avoiding an unnecessary &#8220;clear all caches&#8221; operation, but makes sure Varnish doesn&#8217;t point to dead files. (You could take it a step further and purge only URLs that are Drupal-generated and not static; if you figure out the regex for that, please share.)</p> <h3>Per-page cookie logic</h3> <p>On <a href="http://antiquesnearme.com">AntiquesNearMe.com</a> we have a cookie that remembers the last location you searched, which makes for a nicer UX. That cookie gets added to Varnish&#8217;s page &#8220;hash&#8221; and (correctly) bypasses the cache on pages that take that cookie into account. The cookie is not relevant to the rest of the site, however, so it should be ignored in those cases. How to handle this?</p> <p>There are <a href="https://www.varnish-cache.org/trac/wiki/VCLExampleRemovingSomeCookies">two ways</a> to handle cookies in Varnish: strip cookies you know you don&#8217;t want, as in this old <a href="https://wiki.fourkitchens.com/display/PF/Configure+Varnish+for+Pressflow">Pressflow example</a>, or leave only the cookies you know you <em>do</em> want, as in Lullabot&#8217;s <a href="http://www.lullabot.com/articles/varnish-multiple-web-servers-drupal">example</a>. Each strategy has its pros and cons and works on its own, but it&#8217;s not advisable to <em>combine</em> them. I&#8217;m using Lullabot&#8217;s technique on this site, so to deal with the location cookie, I use <span class="geshifilter"><code class="text geshifilter-text">if-else</code></span> logic: if the cookie is available but <em>not</em> needed (determined by regex like <span class="geshifilter"><code class="text geshifilter-text">req.url !~ &quot;PATTERN&quot; || ...</code></span>), then strip it; otherwise keep it. If the cookie logic you need is more varied but still linear, you could create a series of <span class="geshifilter"><code class="text geshifilter-text">elsif</code></span> statements to handle all the use cases. (Just make sure to roast a huge pot of coffee first.)</p> <h3>Useful add-ons to varnish.module</h3> <ul> <li>Added <span class="geshifilter"><code class="text geshifilter-text">watchdog('varnish', ...)</code></span> commands in varnish.module on cache-clearing operations, so I could look at the logs and spot problems.</li> <li>Added a block to varnish.module with a &#8220;Purge this page&#8221; button for each URL, shown only for admins. (I saw this in an Akamai module and it made a lot of sense to copy. I&#8217;d be happy to post a patch if others want this.)</li> <li>The <a href="http://drupal.org/project/expire">Expire</a> offers plug-n-play intelligence to selectively clear Varnish URLs only when necessary (clearing a landing page of blog posts only if a blog post is modified, for example.) Much better than the default behavior of aggressive clearing &#8220;just in case&#8221;.</li> </ul> <p>I hope this helps people adopt Varnish. I am also available via my consulting business <a href="http://newleafdigital.com">New Leaf Digital</a> for paid implementation, strategic advice, or support for Varnish-aided sites.</p> http://benbuckman.net/tech/11/12/making-sense-varnish-caching-rules#comments antiquesnearme drupal varnish Mon, 12 Dec 2011 20:52:31 +0000 ben 7477 at http://benbuckman.net Parse Drupal watchdog logs in syslog (using node.js script) http://benbuckman.net/tech/11/11/parse-drupal-watchdog-logs-syslog-using-nodejs-script <p>Drupal has the option of outputting its <a href="http://api.drupal.org/api/drupal/includes--bootstrap.inc/function/watchdog/7">watchdog</a> logs to syslog, the file-based core Unix logging mechanism. The log in most cases lives at /var/log/messages, and Drupal's logs get mixed in with all the others, so you need to <code>cat /var/log/messages | grep drupal</code> to filter.</p> <p>But then you still have a big text file that's hard to parse. This is probably a "solved problem" many times over, but recently I had to parse the file specifically for 404'd URLs, and decided to do it (partly out of convenience but mostly to learn how) using <a href="http://nodejs.org">Node.js</a> (as a scripting language). Javascript is much easier than Bash at simple text parsing.</p> <p>I put the code in a <a href="https://gist.github.com/1405720">Gist, <em>node.js script to parse Drupal logs in linux syslog (and find distinct 404'd URLs)</em></a>. The last few lines of URL filtering can be changed to any other specific use case you might have for reading the logs out of syslog. (This could also be used for reading non-Drupal syslogs, but the mapping applies keys like "URL" which wouldn't apply then.)</p> <p>Note the comment at the top: to run it you'll need node.js and 2 NPM modules as dependencies. Then take your filtered log (using the <code>greg</code> method above) and pass it as a parameter, and read the output on screen or output with <code>&gt;</code> to another file.</p> http://benbuckman.net/tech/11/11/parse-drupal-watchdog-logs-syslog-using-nodejs-script#comments drupal linux node.js Tue, 29 Nov 2011 18:06:33 +0000 ben 7422 at http://benbuckman.net Migrating a static 960.gs grid to a responsive, semantic grid with LessCSS http://benbuckman.net/tech/11/11/migrating-static-960gs-grid-responsive-semantic-grid-lesscss <p>The layout of <a href="http://antiquesnearme.com">Antiques Near Me</a> (a startup I co-founded) has long been built using the sturdy <a href="http://960.gs" style="font-weight:bold">960.gs</a> grid system (implemented in Drupal 6 using the <a href="http://drupal.org/project/clean">Clean</a> base theme). Grids are very helpful: They allow layouts to be created quickly; they allow elements to be fit into layouts easily; they keep dimensions consistent; they look clean. But they have a major drawback that always bothered me: the <span class="geshifilter"><code class="text geshifilter-text">grid-X</code></span> classes that determine an element's width are in the HTML. That mixes up markup/content and layout/style, which should ideally be completely separated between the HTML and CSS.</p> <p>The rigidity of an in-markup grid becomes especially apparent when trying to implement <span style="font-weight:bold">&quot;responsive&quot; design</span> principles. I'm not a designer, but the basic idea of responsive design for the web, as I understand it, is that a site's layout should adapt automagically to the device it's viewed in. For a nice mobile experience, for example, rather than create a separate mobile site - which I always thought was a poor use of resources, duplicating the content-generating backend - the same HTML can be used with <span style="font-weight:bold"><span class="geshifilter"><code class="text geshifilter-text">@media</code></span> queries</span> in the CSS to make the layout look "native".</p> <p>(I've put together some <a href="http://delicious.com/thebuckst0p/responsive-design" style="font-weight:bold">useful links on Responsive Design and @media queries</a> using Delicious. The best implementation of a responsive layout that I've seen is on the site of <a href="http://fourkitchens.com">FourKitchens</a>.)</p> <p>Besides the 960 grid, I was using <a href="http://lesscss.org" style="font-weight:bold">LessCSS</a> to generate my styles: it supports variables, mix-ins, nested styles, etc; it generally makes stylesheet coding much more intuitive. So for a while the thought simmered, why not move the static 960 grid into Less (using mixins), and apply the equivalent of <span class="geshifilter"><code class="text geshifilter-text">grid-X</code></span> classes directly in the CSS? Then I read this article in Smashing on <a href="http://coding.smashingmagazine.com/2011/08/23/the-semantic-grid-system-page-layout-for-tomorrow/">The Semantic Grid System</a>, which prescribed pretty much the same thing - using Less with a library called <a href="http://semantic.gs/" style="font-weight:bold">Semantic.gs</a> - and I realized it was time to actually make it happen.</p> <p>To make the transition, I <a href="https://github.com/newleafdigital/semantic.gs">forked</a> semantic.gs and made some modifications: I added .alpha and .omega mixins (to cancel out side margins); for nested styles, I ditched semantic.gs's <span class="geshifilter"><code class="text geshifilter-text">.row()</code></span> approach (which seems to be buggy anyway) and created a <span class="geshifilter"><code class="text geshifilter-text">.nested-column</code></span> mixin instead. I added <span class="geshifilter"><code class="css geshifilter-css"><span class="kw1">clear</span><span class="sy0">:</span><span class="kw2">both</span></code></span> to the <span class="geshifilter"><code class="text geshifilter-text">.clearfix</code></span> mixin (seemed to make sense, though maybe there was a reason it wasn't already in).</p> <p>To maintain the 960.gs dimensions and classes (as an intermediary step), I made a transitional-960gs.less stylesheet with these rules: <span class="geshifilter"><code class="css geshifilter-css"><span class="co1">@columns: 16; @column-width: 40; @gutter-width: 20;</span></code></span>. Then I made equivalents of the .grid_X classes (as <a href="http://drupal.org/project/clean">Clean</a>'s implementation had them) with an <span class="geshifilter"><code class="text geshifilter-text">s_</code></span> prefix:</p> <div class="geshifilter"> <pre class="css geshifilter-css"><span class="re1">.s_container</span><span class="sy0">,</span> <span class="re1">.s_container_16</span> <span class="br0">&#123;</span> <span class="kw1">margin-left</span><span class="sy0">:</span> <span class="kw2">auto</span><span class="sy0">;</span> <span class="kw1">margin-right</span><span class="sy0">:</span> <span class="kw2">auto</span><span class="sy0">;</span> <span class="kw1">width</span><span class="sy0">:</span> <span class="co1">@total-width;</span> .clearfix<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span> <span class="re1">.s_grid_1</span> <span class="br0">&#123;</span> .column<span class="br0">&#40;</span><span class="nu0">1</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span> <span class="re1">.s_grid_2</span> <span class="br0">&#123;</span> .column<span class="br0">&#40;</span><span class="nu0">2</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span> ... <span class="re1">.s_grid_16</span> <span class="br0">&#123;</span> .column<span class="br0">&#40;</span><span class="nu0">16</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span></pre></div> <p>The <span class="geshifilter"><code class="text geshifilter-text">s_grid_X</code></span> classes were purely transitional: they allowed me to do a search-and-replace from grid_ to s_grid_ and remove the 960.gs stylesheet, before migrating all the styles into semantic equivalents. Once that was done, the s_grid_ classes could be removed.</p> <p>960.gs and semantic.gs also implement their columns a little differently, one with padding and the other with margins, so what was actually a 1000px-wide layout with 960.gs became a 960px layout with semantic.gs. To compensate for this, I made a wrapper mixin applied to all the top-level wrappers:</p> <div class="geshifilter"> <pre class="css geshifilter-css"><span class="re1">.wide-wrapper</span> <span class="br0">&#123;</span> .s_container<span class="sy0">;</span> <span class="kw1">padding-right</span><span class="sy0">:</span> <span class="re3">20px</span><span class="sy0">;</span> <span class="kw1">padding-left</span><span class="sy0">:</span> <span class="re3">20px</span><span class="sy0">;</span> .clearfix<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span></pre></div> <p>With the groundwork laid, I went through all the grid_/s_grid_ classes in use and replaced them with purely in-CSS semantic mixins. So if a block had a grid class before, now it only had a semantic ID or class, with the grid mixins applied to that selector.</p> <p>Once the primary layout was replicated, I could make it "respond" to @media queries, using a responsive.less sheet. For example:</p> <div class="geshifilter"> <pre class="css geshifilter-css"><span class="coMULTI">/* iPad in portrait, or any screen below 1000px */</span> <span class="co1">@media only screen and (max-device-width: 1024px) and (orientation: portrait), screen and (max-width: 999px) {</span> ... <span class="br0">&#125;</span> &nbsp; <span class="coMULTI">/* very narrow browser, or iPhone -- note that &lt;1000px styles above will apply here too! note: iPhone in portrait is 320px wide, in landscape is 480px wide */</span> <span class="co1">@media only screen and (max-device-width: 480px), only screen and (-webkit-min-device-pixel-ratio: 2), screen and (max-width: 499px) {</span> ... <span class="br0">&#125;</span> &nbsp; <span class="coMULTI">/* iPhone - portrait */</span> <span class="co1">@media only screen and (max-device-width: 480px) and (max-width: 320px) {</span> ... <span class="br0">&#125;</span></pre></div> <p>Some vitals tools for the process:</p> <ul> <li><a href="http://incident57.com/less/">Less.app</a> (for Mac), or even better, the new <a href="http://incident57.com/codekit/">CodeKit</a> by the same author compiles and minifies the Less files instantly, so the HTML can refer to normal CSS files.</li> <li>The iOS Simulator (part of XCode) and Android Emulator (with the Android SDK), to simulate how your responsive styles work on different devices. (Getting these set up is a project in itself).</li> <li>To understand what various screen dimensions looked like, I added a simple viewport debugger to show the screen size in the corner of the page (written as a Drupal6/jQuery document-ready "behavior"; fills a #viewport-size element put separately in the template):<br /> <div class="geshifilter"> <pre class="javascript geshifilter-javascript">Drupal.<span class="me1">behaviors</span>.<span class="me1">viewportSize</span> <span class="sy0">=</span> <span class="kw2">function</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span> <span class="kw1">if</span> <span class="br0">&#40;</span><span class="sy0">!</span>$<span class="br0">&#40;</span><span class="st0">'#viewport-size'</span><span class="br0">&#41;</span>.<span class="me1">size</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="kw1">return</span><span class="sy0">;</span> &nbsp; Drupal.<span class="me1">fillViewportSize</span> <span class="sy0">=</span> <span class="kw2">function</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span> $<span class="br0">&#40;</span><span class="st0">'#viewport-size'</span><span class="br0">&#41;</span>.<span class="me1">text</span><span class="br0">&#40;</span> $<span class="br0">&#40;</span>window<span class="br0">&#41;</span>.<span class="me1">width</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="sy0">+</span> <span class="st0">'x'</span> <span class="sy0">+</span> $<span class="br0">&#40;</span>window<span class="br0">&#41;</span>.<span class="me1">height</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#41;</span> .<span class="me1">css</span><span class="br0">&#40;</span><span class="st0">'top'</span><span class="sy0">,</span> $<span class="br0">&#40;</span><span class="st0">'#admin-menu'</span><span class="br0">&#41;</span>.<span class="me1">height</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span><span class="sy0">;</span> Drupal.<span class="me1">fillViewportSize</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span> $<span class="br0">&#40;</span>window<span class="br0">&#41;</span>.<span class="me1">bind</span><span class="br0">&#40;</span><span class="st0">'resize'</span><span class="sy0">,</span> <span class="kw2">function</span><span class="br0">&#40;</span>event<span class="br0">&#41;</span><span class="br0">&#123;</span> Drupal.<span class="me1">fillViewportSize</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="br0">&#125;</span><span class="sy0">;</span></pre></div> </li> </ul> <p>After three days of work, the layout is now entirely semantic, and the 960.gs stylesheet is gone. On a wide-screen monitor it looks exactly the same as before, but it now adapts to narrower screen sizes (you can see this by shrinking the window's width), and has special styles for iPad and iPhone (portrait and landscape), and was confirmed to work on a popular Android tablet. It'll be a continuing work in progress, but the experience is now much better on small devices, and the groundwork is laid for future tweaks or redesigns.</p> <p>There are some <strong>downsides</strong> to this approach worth considering:</p> <ul> <li>Mobile devices still load the full CSS and HTML needed for the "desktop" layout, even if not all the elements are shown. This is a problem for performance.</li> <li>The stylesheets are enormous with all the mixins, compounding the previous issue. I haven't examined in depth how much of a problem this actually is, but I'll need to at some point.</li> <li>The contents of the page can only change as much as the <em>stylesheets</em> allow. The <em>order</em> of elements can't change (unless their visible order can be manipulated with CSS floats).</li> </ul> <p>To mitigate these and modify the actual content on mobile devices - to reduce the performance overhead, load smaller images, or put less HTML on the page - would probably require backend modifications that detect the user agent (perhaps using <a href="http://drupal.org/project/browscap">Browscap</a>). I've been avoiding that approach until now, but with most of the work done on the CSS side, a hybrid backend solution is probably the next logical step. (For the images, <a href="https://github.com/filamentgroup/responsive-images">Responsive Images</a> could also help on the client side.)</p> <p>See the <a href="http://antiquesnearme.com">new layout</a> at work, and my <a href="http://delicious.com/thebuckst0p/responsive-design">links on responsive design</a>. I'm curious to hear what other people do to solve these issues.</p> <p><em>Added:</em> It appears the javascript analog to media queries is <a href="https://developer.mozilla.org/en/CSS/Using_media_queries_from_code">media query lists</a>, which are event-able. And here's an approach with <a href="http://www.paulrhayes.com/2011-11/use-css-transitions-to-link-media-queries-and-javascript/">media queries and CSS transition events</a>.</p> http://benbuckman.net/tech/11/11/migrating-static-960gs-grid-responsive-semantic-grid-lesscss#comments 960.gs css drupal lesscss responsive-design Fri, 25 Nov 2011 16:59:38 +0000 ben 7407 at http://benbuckman.net Drupal’s increasing complexity is becoming a turnoff for developers http://benbuckman.net/drupal-excessive-complexity <p>I&rsquo;ve been developing custom applications with Drupal for three years, a little with 4.7 and 5, primarily with 6, and lately more with 7. Lately I&rsquo;ve become concerned with the trend in Drupal&rsquo;s code base toward increasing complexity, which I believe is becoming a danger to Drupal&rsquo;s adoption.</p> <p> In general when writing code, a solution can solve the current scenario in front of us right now, or it can try to account for future scenarios in advance. I&rsquo;ve seen this referred to as <strong>N-case or N+1 development.</strong> N-case code is efficient, but not robust; N+1 code is abstract and complex, and theoretically allows for an economy of scale, allowing more to be done with less code/work. In practice, it also shifts the burden: <strong>as non-developers want the code to accommodate more use cases, the developers write more code, with more complexity and abstraction.</strong></p> <p> Suppose you want to record a date with a form and save it to a database. You&rsquo;d need an HTML form, a timestamp (integer) field in your schema, and a few lines of code. Throw in a stable jQuery date popup widget and you have more code but not much more complexity. Or you could imagine every possible date permutation, all theoretically accessible to non-developers, and you end up with the <strong>14,673 lines in Drupal&rsquo;s <a href="http://drupal.org/project/date">Date</a> module</strong>.</p> <p> Drupal is primarily a content management system, not simply a framework for efficient development, so it <strong>needs to account for the myriad use cases of non-developer site builders</strong>. This calls for abstracting everything into user interfaces, which takes a lot of code. However, there needs to be a countervailing force in the development process, pushing back against increasing abstraction (in the name of end-user simplicity) for the sake of preserving underlying simplicity. In other words, <strong>there is an inherent tension in Drupal (like any big software project) between keeping the UI both robust and simple, and keeping the code robust and simple </strong> - and increasingly Drupal, rather than trying to maintain a balance, has tended to sacrifice the latter.</p> <p> User interfaces are one form of abstraction; N+infinity APIs - which I&rsquo;m more concerned with - are another, which particularly increase underlying complexity. Drupal has a legacy code base built with partly outdated assumptions, and developers adding new functionality have to make a choice: <strong>rewrite the old code to be more robust but less complex, or add additional abstraction layers on top?</strong> The latter takes less time but easily creates a mess. For example: Drupal 7 tries to abstract nodes, user profiles, actions, etc into &ldquo;entities&rdquo; and attach fields to any kind of entity. Each of these still has its legacy ID, but now there is an additional layer in between tying these &ldquo;entity IDs&rdquo; to their types, and then another layer for &ldquo;bundles,&rdquo; which apply to some entity types but not others. The result from a development cycle perspective was a Drupal 7 release that, even delayed a year, lacked components of the Entity system in core (they moved to &ldquo;contrib&rdquo;). The result from a systems perspective is an architecture that has too many layers to make sense if it were built from scratch. <strong>Why not, for example, have everything be a node? </strong> Content as nodes, users as nodes, profiles as nodes, etc. The node table would need to lose legacy columns like &ldquo;sticky&rdquo; - they would become fields - and some node types like &ldquo;user&rdquo; might need fixed meanings in core. Then three structures get merged into one, and the system gets simpler without compromising flexibility.</p> <p> I recently tried to programatically use the <a href="http://drupal.org/project/activity">Activity</a> module - which used to be a simple way to record user activity - and had to &ldquo;implement&rdquo; the Entities and Trigger APIs to do it, requiring hundreds of lines of code. I gave up on that approach and instead used the elegant core module <a href="http://api.drupal.org/api/drupal/includes--bootstrap.inc/function/watchdog/7">Watchdog</a> - which, with a simple custom report pulling from the existing system, produced the <strong>same end-user effect as Activity with a tiny fraction of the code and complexity</strong>. The fact that Views doesn&rsquo;t natively generate Watchdog reports and Rules doesn&rsquo;t report to Watchdog as an action says a lot, I think, about the way Drupal has developed over the last few years.</p> <p> On a Drupal 7 site I&rsquo;m building now, I&rsquo;ve worked with the Node API, Fields API, Entities API, Form API, Activity API, Rules API, Token API... I could have also worked with the Schema, Views, Exportables, Features, and Batch APIs, and on and on. The best definition I&rsquo;ve heard for an API (I believe by Larry Garfield at Drupalcon Chicago) is &ldquo; <strong>the wall between 2 systems</strong>.&rdquo; In a very real way, rather than feeling open and flexible, Drupal&rsquo;s code base increasingly feels like it&rsquo;s erecting barriers and fighting with itself. <strong>When it&rsquo;s necessary to write so much code for so many APIs to accomplish simple tasks, the framework is no longer developer-friendly.</strong> The irony is, the premise of that same Drupalcon talk was the ways APIs create &ldquo;power and flexibility&rdquo; - but that power has come at great cost to the developer experience.</p> <p> I&rsquo;m aware of all these APIs under the hood because I&rsquo;ve seen them develop for a few years. But how is someone new to Drupal supposed to learn all this? (They could start with the <a href="http://definitivedrupal.org/">Definitive Guide to Drupal 7</a>, which sounds like a massive tome.) <strong>Greater abstraction and complexity lead to a steeper learning curve.</strong> <strong>Debugging Drupal - which requires &ldquo;wrapping your head&rdquo; around its architecture - has become a Herculean task. Good developer documentation is scarce because it takes so much time to explain something so complex.</strong></p> <p> There is a cycle: the code gets bigger and harder to understand; the bugs get more use-case-specific and harder to nail down; the issue queues get bloated; the developers have less time to devote to code quality improvement and big-picture architecture decisions. But someone wants all those use cases handled, so the code gets bigger and bigger and harder to understand... as of this writing, Drupal core has 9166 open issues, the Date module has 813, Rules has 494. Queues that big need a staff of dozens to manage effectively, and even if those resources existed, the business case for devoting them can&rsquo;t be easy to make. <strong>The challenge here is not simply in maintaining our work; it&rsquo;s in building projects from the get-go that aren&rsquo;t so complicated as to need endless maintenance.</strong></p> <p> Some other examples of excessive complexity and abstraction in Drupal 7:</p> <ul> <li> <a href="http://drupal.org/node/691078"><strong>Field Tokens</strong></a>. This worked in Drupal 6 with contrib modules; to date with Drupal 7, this can&rsquo;t be done. The APIs driving all these separate systems have gotten so complex, that either no one knows how to do this anymore, or the architecture doesn&rsquo;t allow it.</li> <li> The <a href="http://drupal.org/project/media"><strong>Media</strong></a> module was supposed to be an uber-abstracted API for handling audio, video, photos, etc. As of a few weeks ago, basic YouTube and Vimeo integration didn&rsquo;t work. The parts of Media that did work (sponsored largely by <a href="http://acquia.com">Acquia</a>) <a href="http://drupal.org/node/1139514">didn&rsquo;t conform</a> to long-standing Drupal standards. Fortunately there were <a href="http://drupal.org/project/video_filter">workarounds</a> for the site I was building, but their existence is a testament to the unrealistic ambition and excessive complexity of the master project.</li> <li> The Render API, intended to increase flexibility, has compounded the old problem in Drupal of business logic being spread out all over the place. The point in the flow where structured data gets rendered into HTML strings isn&rsquo;t standardized, so knowing how to modify one type of output doesn&rsquo;t help with modifying another. (Recently I tried to modify a <code>date_select</code> field at the code level to show the date parts in a different order - as someone else <a href="http://drupal.org/node/890236">tried to do</a> a year ago - and gave up after hours. The solution ended up being in the UI - so the end-user was given code-free power at the expense of the development experience and overall flexibility.)</li> </ul> <p> Drupal 8 has an &ldquo;<a href="http://groups.drupal.org/drupal-initiatives">Initiatives</a>&rdquo; structure for prioritizing effort. <strong> I&rsquo;d like to see a new initiative, <em>Simplification</em>: Drupal 8 should have fewer lines of code, fewer APIs, and fewer database tables than Drupal 7.</strong> Every component should be re-justified and eliminated if it duplicates an existing function. And the Drupal 8 contrib space should follow the same principles. I submit that this is more important than any single new feature that can be built, and that if the codebase becomes simpler, adding new features will be easier.</p> <p> A few examples of places I think are ripe for simplifying:</p> <ul> <li> The Form API has too much redundancy. <code>#process</code> handlers are a bear to work with (try altering the <code>#process</code> flow of a date field) and do much the same as #after_build.</li> <li> The render API now has <code>hook_page_build</code>, <code>hook_page_alter</code>, <code>hook_form_alter</code>, <code>hook_preprocess</code>, <code>hook_process</code>, <code>hook_node_views</code>, <code>hook_entity_view</code>, (probably several more for field-level rendering), etc. This makes understanding even a well-architected site built by anyone else an enormous challenge. Somewhere in that mix there&rsquo;s bound to be unnecessary redundancy.</li> </ul> <p> <strong>Usable code isn&rsquo;t a luxury, it&rsquo;s critical to attracting and keeping developers in the project.</strong> I saw a presentation recently on Rapid Prototyping and it reminded me how far Drupal has come from being able to do anything like that. (I don&rsquo;t mean the rapid prototype I did of a <a href="http://benbuckman.net/tech/11/02/drupal-application-framework-bostonphp-competition">job listing site</a> - I mean application development, building something <em>new</em>.) The demo included a massive data migration accomplished with 4 lines of javascript in the MongoDB terminal; by comparison, I recently tried to change a dropdown field to a text field (both identical strings in the database) and Drupal told me it couldn&rsquo;t do that because &ldquo;the field already had data.&rdquo;</p> <p> My own experience is that Drupal is becoming more frustrating and less rewarding to work with. Backend expertise is also harder to learn and find (at the last meetup in Boston, a very large Drupal community, only one other person did freelance custom development). Big firms like Acquia are hiring most of the rest, which is great for Acquia, but skews the product toward enterprise clients, and increases the cost of development for everyone else. If that&rsquo;s the direction Drupal is headed - a project understood and maintained only by large enterprise vendors, for large enterprise users, giving the end-user enormous power but the developer a migraine - let&rsquo;s at least make sure we go that way deliberately and with our eyes open. <strong>If we want the product to stay usable for newbie developers, or even people with years of experience - and ultimately, if we want the end-user experience to </strong><em>work</em><strong> - then the trend has to be reversed toward a better balance.</strong></p> http://benbuckman.net/drupal-excessive-complexity#comments drupal Wed, 10 Aug 2011 20:23:41 +0000 ben 7203 at http://benbuckman.net Drupal 7 / Drush tip: Find all field content using a text format http://benbuckman.net/tech/11/08/drupal-7-drush-tip-find-all-field-content-using-text-format <p>I'm working on a Drupal 7 site and decided one of the <strong>text formats</strong> ("input formats" in D6) was redundant. So I disabled it, and was warned that "any content stored with that format will not be displayed." How do I know what content is using that format? This little shell snippet told me:</p> <div class="geshifilter"> <pre class="bash geshifilter-bash">drush sql-query <span class="st0">&quot;show tables like 'field_data_%'&quot;</span> <span class="sy0">|</span> <span class="kw2">tail</span> -n+<span class="nu0">2</span> <span class="sy0">|</span> <span class="kw1">while</span> <span class="kw2">read</span> TABLE; <span class="kw1">do</span> <span class="re2">FIELD</span>=<span class="sy0">`</span>drush sql-query <span class="st0">&quot;show fields in <span class="es2">$TABLE</span> like '%format%';&quot;</span> <span class="sy0">|</span> <span class="kw2">tail</span> -n+<span class="nu0">2</span> <span class="sy0">|</span> <span class="kw2">awk</span> <span class="st_h">'{ print $1 }'</span><span class="sy0">`</span>; <span class="kw3">echo</span> <span class="st0">&quot;<span class="es2">$TABLE</span> - <span class="es2">$FIELD</span>&quot;</span>; <span class="kw1">if</span> <span class="br0">&#91;</span><span class="br0">&#91;</span> <span class="st0">&quot;<span class="es2">$FIELD</span>&quot;</span> <span class="sy0">!</span>= <span class="st0">&quot;&quot;</span> <span class="br0">&#93;</span><span class="br0">&#93;</span>; <span class="kw1">then</span> drush sql-query <span class="st0">&quot;select * from <span class="es3">${TABLE}</span> where <span class="es3">${FIELD}</span>='old_format''&quot;</span>; <span class="kw1">fi</span> <span class="kw1">done</span></pre></div> <p>You'll need to run that in the terminal from your site's webroot and have <a href="http://drupal.org/project/drush">Drush</a> installed. Rename <span class="geshifilter"><code class="text geshifilter-text">old_format</code></span> to the code name of your text format. (<span class="geshifilter"><code class="text geshifilter-text">drush sql-query &quot;select * from {filter_format}&quot;</code></span> will show you that.) It'll work as a single command if you copy and paste it (as multiple lines or with line breaks stripped - the semi-colons indicate the end of each statement).</p> <p>Breaking it down:</p> <ol> <li>Find all the tables used for content storage.</li> <li>Find all the 'format' fields in those tables. (They'll only exist if the field uses formats.)</li> <li>Find all the rows in those tables matching the format you want to delete. Alternatively, if you want everything to be in one format, you can see what does <em>not</em> use that format by changing the <span class="geshifilter"><code class="text geshifilter-text">${FIELD}=...</code></span> condition to <span class="geshifilter"><code class="text geshifilter-text">${FIELD}&lt;&gt;'new_format'</code></span>.</li> </ol> <p>This won't fix anything for you, it'll just show you where to go - look at the <span class="geshifilter"><code class="text geshifilter-text">entity_id </code></span> columns (that's the <span class="geshifilter"><code class="text geshifilter-text">nid</code></span> if the content is nodes) and go edit that content.</p> <p>Also note, this is checking the field_<strong>data</strong>_ tables, which (as far as I can tell) track the latest revision. If you are using content revisions you might want to change the first query to <span class="geshifilter"><code class="text geshifilter-text">show tables like 'field_revision_%'</code></span>. I'm not sure why D7 duplicates so much data, but that's for another post.</p> <p><em>Update</em>: I modified the title from <em>Find all content</em> to <em>Find all field content</em> because of the comment by David Rothstein below.</p> http://benbuckman.net/tech/11/08/drupal-7-drush-tip-find-all-field-content-using-text-format#comments drupal Tue, 02 Aug 2011 15:51:59 +0000 ben 7185 at http://benbuckman.net Workaround to variables cache bug in Drupal 6 http://benbuckman.net/tech/11/05/workaround-variables-cache-bug-drupal-6 <p>I run my Drupal <a href="http://drupal.org/cron">crons</a> with <a href="http://drupal.org/project/drush">Drush</a> and <a href="http://jenkins-ci.org">Jenkins</a>, and have been running into a race condition frequently where it tells me, <strong><em>Attempting to re-run cron while it is already running</em></strong>, and fails to run.</p> <p>That error is triggered when the <span class="geshifilter"><code class="text geshifilter-text">cron_semaphore</code></span> variable is found. It's set when cron starts, and is deleted when cron ends, so if it's still there, cron is presumably still running. Except it wasn't really - the logs show the previous crons ended successfully.</p> <p>I dug into it a little further: <span class="geshifilter"><code class="bash geshifilter-bash">drush vget cron_semaphore</code></span> brought up the timestamp value of the last cron, like it was still set. But querying the <span class="geshifilter"><code class="text geshifilter-text">`variables`</code></span> table directly for <span class="geshifilter"><code class="text geshifilter-text">cron_semaphore</code></span> brought up nothing! That tipped me off to the problem - it was <strong>caching</strong> the variables array for too long.</p> <p>Searching the issue brought up a bunch of posts acknowledging the issue, and suggesting that people clear their whole cache to fix it. I care about performance on the site in question, so clearing the whole cache every 15 minutes to run cron is not an option.</p> <p>The underlying solution to the problem is very complex, and the subject of several ongoing Drupal.org threads:</p> <ul> <li><a href="http://drupal.org/node/973436"> variable_set() should rebuild the variable cache, not just clear it</a></li> <li><a href="http://drupal.org/node/987768">Optimize variable caching</a></li> <li><a href="http://drupal.org/node/249185">Concurrency problem with variable caching</a></li> </ul> <p>Following Drupal core dev policy now (which is foolish IMHO), if this bug is resolved, it has to be resolved first in 8.x (which won't be released for another 2-3 years), then 7.x, then 6.x. So waiting for that to work for my D6 site in production isn't feasible.</p> <p>As a stopgap, I have Jenkins clear only the <span class="geshifilter"><code class="text geshifilter-text">'variables'</code></span> cache entry before running cron:<br /> <strong><span class="geshifilter"><code class="text geshifilter-text">drush php-eval &quot;cache_clear_all('variables', 'cache');&quot;</code></span></strong></p> <p>That seems to fix the immediate problem of cron not running. It's not ideal, but at least it doesn't clear the entire site cache every 15 minutes.</p> http://benbuckman.net/tech/11/05/workaround-variables-cache-bug-drupal-6#comments drupal Thu, 19 May 2011 04:48:22 +0000 ben 7118 at http://benbuckman.net Three Quirks of Drupal Database Syntax http://benbuckman.net/tech/11/05/three-quirks-drupal-database-syntax <p>Database query syntax in Drupal can be finicky, but doing it right - following the <a href="http://drupal.org/coding-standards">coding standards</a> <em>as a matter of habit</em> - is very important. Here are three "gotchas" I've run into or avoided recently:</p> <p><strong>1. Curly braces around tables:</strong> Unit testing with <a href="http://drupal.org/simpletest">SimpleTest</a> absolutely requires that table names in all your queries be wrapped in <strong>{curly braces}</strong>. SimpleTest runs in a sandbox with its own, clean database tables, so you can create nodes and users without messing up actual content. It does this by using the existing <strong>table prefix</strong> concept. If you write a query in a module like this,<br /> <span class="geshifilter"><code class="php geshifilter-php"><span class="re0">$result</span> <span class="sy0">=</span> db_query<span class="br0">&#40;</span><span class="st0">&quot;SELECT nid from node&quot;</span><span class="br0">&#41;</span><span class="sy0">;</span></code></span><br /> when that runs in test, it will load from the regular <span class="geshifilter"><code class="text geshifilter-text">node</code></span> table, not the sandboxed one (assuming you have no prefix on your regular database). Having tests write to actual database tables can make your tests break, or real content get lost. Instead, all queries (not just in tests) should be written like:</p> <p><span class="geshifilter"><code class="php geshifilter-php"><span class="re0">$result</span> <span class="sy0">=</span> db_query<span class="br0">&#40;</span><span class="st0">&quot;SELECT nid from {node} node&quot;</span><span class="br0">&#41;</span><span class="sy0">;</span></code></span><br /> (The 2nd <span class="geshifilter"><code class="text geshifilter-text">node</code></span> being an optional <em>alias</em> to use later in the query, for example as <span class="geshifilter"><code class="text geshifilter-text">node.nid</code></span> JOINed to another table with a <span class="geshifilter"><code class="text geshifilter-text">nid</code></span> column.) When Drupal runs the query, it will prefix <span class="geshifilter"><code class="text geshifilter-text">{node}</code></span> by context as <span class="geshifilter"><code class="text geshifilter-text">site_node</code></span>, or <span class="geshifilter"><code class="text geshifilter-text">simpletestXXnode</code></span>, to keep the sandboxes separate. Make sure to always curly-brace your table names!<br /> <br/></p> <p><strong>2. New string token syntax:</strong> Quotation marks around <strong>string tokens</strong> are different in Drupal 6 and 7. D7 uses the new <a href="http://drupal.org/developing/api/database">"DBTNG" abstraction layer</a> (backported to D6 as the <a href="http://drupal.org/project/dbtng">DBTNG module</a>). In Drupal 6, you'd write a query with a string token like this:<br /> <span class="geshifilter"><code class="php geshifilter-php"><span class="re0">$result</span> <span class="sy0">=</span> db_query<span class="br0">&#40;</span><span class="st0">&quot;SELECT nid from {node} where title='<span class="es6">%s</span>'&quot;</span><span class="sy0">,</span> <span class="st_h">'My Favorite Node'</span><span class="br0">&#41;</span><span class="sy0">;</span></code></span><br /> Note the single quotation marks around the placeholder <span class="geshifilter"><code class="text geshifilter-text">%s</code></span>.</p> <p>With D7 or DBTNG, however, the same static query would be written:<br /> <span class="geshifilter"><code class="php geshifilter-php"><span class="re0">$result</span> <span class="sy0">=</span> db_query<span class="br0">&#40;</span><span class="st0">&quot;SELECT nid from {node} WHERE title = :title&quot;</span><span class="sy0">,</span> <span class="kw3">array</span><span class="br0">&#40;</span><span class="st_h">':title'</span> <span class="sy0">=&gt;</span> <span class="st_h">'My Favorite Node'</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span></code></span><br /> No more quotes around the <span class="geshifilter"><code class="text geshifilter-text">:title</code></span> token - DBTNG puts it in for you when it replaces the placeholder with the string value.<br /> <br/></p> <p><strong>3. Uppercase SQL commands:</strong> Make sure to use UPPERCASE SQL commands (SELECT, FROM, ORDER BY, etc) in queries. Not doing so is valid syntax 99% of the time, but will occasionally trip you up. For example: the <a href="http://api.drupal.org/api/drupal/includes--database.mysql.inc/function/db_query_range/6">db_query_range</a> function (in D6) does not like lowercase <span class="geshifilter"><code class="text geshifilter-text">from</code></span>. I was using it recently to paginate the results of a big query, like <strong><span class="geshifilter"><code class="text geshifilter-text">select * from {table}</code></span></strong>. The pagination was all messed up, and I didn't know why. Then I changed it to <strong><span class="geshifilter"><code class="text geshifilter-text">SELECT * FROM {table}</code></span></strong> and it worked. Using uppercase like that is a good habit, and in the few cases where it matters, I'll be glad I'm doing it from now on.</p> http://benbuckman.net/tech/11/05/three-quirks-drupal-database-syntax#comments drupal Sun, 08 May 2011 19:45:14 +0000 ben 7099 at http://benbuckman.net Monitoring Drupal sites with Munin http://benbuckman.net/tech/11/04/monitoring-drupal-sites-munin <p>One of the applications I've been working with recently is the <a href="http://munin-monitoring.org/">Munin</a> monitoring tool. Its homepage describes it simply:</p> <blockquote><p>Munin is a networked resource monitoring tool that can help analyze resource trends and "what just happened to kill our performance?" problems. It is designed to be very plug and play. A default installation provides a lot of graphs with almost no work.</p></blockquote> <p><img id="n7055-munin-graph" src="/files/mysql_queries-day.png" style="float: right; width:300px;" alt="Munin graph" title="Munin graph" />Getting Munin set up on an Ubuntu server is very <a href="http://munin-monitoring.org/wiki/LinuxInstallation">easy</a>. (One caveat: a lot of new plugins require the latest version of Munin, which is only available in Ubuntu 10.) Munin works on a "master" and "node" structure, the basic idea being:</p> <ol> <li>On a cron, the master asks all its nodes for all their stats (usually via port 4949, so configure your firewall accordingly).</li> <li>Each node server asks all its plugins for their stats.</li> <li>Each plugin dumps out brief key:value pairs.</li> <li>The master collects all the data and compiles graphes as images on static HTML pages.</li> </ol> <p>Its simplicity is admirable: Each plugin is its own script, written in any executable language. There are common environment variables and output syntax, but otherwise writing or modifying a plugin is very easy. The plugin directory is called <a href="http://exchange.munin-monitoring.org/">Munin Exchange</a>. (The latest version of each plugin isn't necessarily on there, though: in some cases searching for the plugin name brought up newer versions on Github.)</p> <p>I set up Munin for two reasons: 1) get notifications of problems, 2) see historical graphs to spot trends and bottlenecks. I have Munin running on a dedicated monitoring server (also running <a href="jenkins-ci.org">Jenkins</a>), since notifications coming from the web server wouldn't be much use if the web server went down. It's currently monitoring three nodes (including itself), giving me stats on memory (total and for specific processes), CPU, network traffic, apache, mysql, <a href="https://s3.amazonaws.com/">S3</a> buckets, memcached, varnish, and mongodb. Within a few days of it running, a memory leak on one server became apparent, and the "MySql slow query" spikes that coincide with cron (doing a bunch of stats/aggregation) are illuminating.</p> <p>None of this is Drupal specific, but graphing patterns in Drupal simply requires a plugin, and McGo has fortunately given us a <a href="http://drupal.org/project/munin">Munin module</a> that provides just that. (The package includes two modules: Munin API to define stats and queries, and Munin Defaults with some basic node and user queries.) I asked for maintainer access and modified it a little - the 6.x-2.x branch now uses <a href="http://drupal.org/project/drush">Drush</a> for database queries rather than storing the credentials in the scripts, for example. The module generates the script code which you copy to files in your plugins directory.</p> <p>Conclusions so far: getting Munin to show you graphs on all the major stats of a server takes a few hours (coming at it as a total beginner). Setting up useful notifications is more complicated, though, and will probably have to evolve over time through trial and error. For simple notifications on servers going down, for example, it's easier to set up a simple cron script (on another server) with <span class="geshifilter"><code class="text geshifilter-text">curl</code></span> and <span class="geshifilter"><code class="text geshifilter-text">mail</code></span>, or use the free version of <a href="https://www.cloudkick.com/free-basic-checks">CloudKick</a>. Munin's notifications are more suited to spotting spikes and edge cases.</p> http://benbuckman.net/tech/11/04/monitoring-drupal-sites-munin#comments munin Fri, 22 Apr 2011 18:41:45 +0000 ben 7055 at http://benbuckman.net Setting up Drupal on DotCloud's server automation platform http://benbuckman.net/tech/11/04/setting-drupal-dotclouds-server-automation-platform <p>Managing a properly configured server stack is one of the pain points in developing small client sites. Shared hosting is usually sub-par, setting up a VPS from scratch is overkill, and automation/simplification of the server configuration and deployment is always welcome. So I've been very interested in seeing how <a href="https://www.dotcloud.com/">DotCloud</a> might work for Drupal sites.</p> <p>DotCloud is in the same space as <a href="http://www.heroku.com/">Heroku</a>, an automated server/deployment platform for Rails applications. DotCloud is trying to cater to much more than Rails, however: they currently support PHP (for websites and "worker" daemons), Rails, Ruby, Python, MySql, Postgresql, and have Node.js, MongoDB and a whole bunch of other components on their <a href="http://docs.dotcloud.com/components/roadmap/">roadmap</a>.</p> <p>The basic idea is to automate the creation of pre-configured <a href="http://aws.amazon.com/ec2/">EC2</a> instances using a shell-based API. So you create a web and database setup and push your code with four commands:</p> <div class="geshifilter"><pre class="text geshifilter-text">dotcloud create mysite dotcloud deploy -t php mysite.www dotcloud deploy -t mysql mysite.db dotcloud push mysite.www ~/code</pre></div> <p>Each "deployment" is its own EC2 server instance, and you can SSH into each (but without root). The "push" command for deployment can use a Git repository, and files are deployed to the server <a href="https://github.com/capistrano/capistrano">Capistrano</a>-style, with symlinked releases and rollbacks. (This feature alone, of pushing your code and having it automatically deploy, is invaluable.)</p> <p><strong>Getting it to work with Drupal</strong> is a little tricky. First of all, the PHP instances run on <a href="http://wiki.nginx.org/">nginx</a>, not Apache. So the usual .htaccess file in core doesn't apply. Drupal can be deployed on nginx with some contortions, and there is a <a href="https://github.com/perusio/drupal-with-nginx">drupal-for-nginx</a> project on Github. However, I write this post after putting in several hours trying to adapt that code to work, and failing. (I've never used nginx before, which is probably the main problem.) I'll update it if I figure it out, but in the meantime, this is only a partial success.</p> <p>The basic process is this:</p> <ul> <li>Set up an account (currently needs a beta invitation which you can request)</li> <li>Install the dotcloud client using python's <span class="geshifilter"><code class="text geshifilter-text">easy_install</code></span> app</li> <li>Set up a web (nginx) instance with <span class="geshifilter"><code class="text geshifilter-text">dotcloud deploy</code></span></li> <li>Set up a database (mysql or postgres) instance</li> <li>Set up a local Git repo, download Drupal, and configure settings.php (as shown with <span class="geshifilter"><code class="text geshifilter-text">dotcloud info</code></span></li> <li>Push the repository using <span class="geshifilter"><code class="text geshifilter-text">dotcloud push</code></span></li> <li>Navigate to your web instance's URL and install Drupal.</li> <li>To use your own domain, set up a CNAME record and run <span class="geshifilter"><code class="text geshifilter-text">dotcloud alias</code></span>. ("Naked" domains, i.e. without a prefix like www, don't work, however, so you have to rely on DNS-level redirecting.)</li> <li>For added utility, SSH in with <span class="geshifilter"><code class="text geshifilter-text">dotcloud ssh</code></span> and install <a href="http://drupal.org/project/drush">Drush</a>. (Easiest way I found was to put a symlink to the executable in ~/bin.)</li> </ul> <p>The main outstanding issue is that friendly URL's don't work, because of the nginx configuration. I hope to figure this out soon.</p> <p>Some other issues and considerations:</p> <ul> <li>The platform is still in beta, so I experienced a number of API timeouts yesterday. I <a href="http://twitter.com/#!/dot_cloud/status/60151655211085824">mentioned</a> this on Twitter and they said they're working on it; today I had fewer timeouts.</li> <li>The server instances don't give you root access. They come fully configured but you're locked into your home directory, like shared hosting. I understand the point here - if you changed the server stack, their API and scaling methodologies wouldn't work - but it means if something in the core server config is wrong, tough luck.</li> <li>The shell (bash in Ubuntu 10.04 in my examples) is missing a Git client, vim, and nano, and some of its configuration (for <span class="geshifilter"><code class="text geshifilter-text">vi</code></span> for instance) is wonky out of the box.</li> <li>The intended deployment direction is one-way, from a local dev environment to the servers, so if you change files on the server, you need to rsync them down. (You can SSH in with the dotcloud app and put on a normal SSH key for rsync.)</li> <li>Because the webroot is a symlink, any uploaded files have to be outside the webroot (as a symlink as well). This is normal on Capistrano setups, but unusual for most Drupal sites (running on shared or VPS hosting).</li> <li>It's free now, but only because it's in beta and they haven't announced pricing. It is yet to be seen if this will be cost-effective when it goes out of beta.</li> <li>They promise automated scaling, but it's not clear how that actually works. (Nowhere in the process is there a choice of RAM capacity, for instance.) Does scaling always involve horizontally adding small instances, and if so, does that make sense for high-performance applications?</li> </ul> <h3>Conclusion so far</h3> <p>The promise of automated server creation and code deployment is very powerful, and this kind of platform could be perfect for static sites, daemons, or some custom apps. If/when I get it working in Drupal, it could be as simple as a shell script to create a whole live Drupal site from nothing in a few seconds.</p> <p><a href="http://dotcloud.com">Try it out!</a> I'm very curious what others' experience or thoughts are on this approach. I'd especially love for someone to post a solution to the nginx-rewrite issue in the comments.</p> <h3>Update 4/22:</h3> <p>Their support staff recommended reducing the nginx.conf file to one line:</p> <p><code>try_files $uri $uri/ /index.php?q=$uri;</code></p> <p>And that worked. However, it leaves out all the other recommended <a href="https://github.com/perusio/drupal-with-nginx">rules</a> for caching time, excluding private directories, etc. I asked about these and am still waiting for a reply.</p> <p>Also, to get <strong>file uploads</strong> to work properly, you'll need to put your files directory outside of the webroot, and symlink sites/default/files (or equivalent) to that directory, using a <a href="http://docs.dotcloud.com/build/">postinstall</a> script. Mine looks like this (after creating a <span class="geshifilter"><code class="text geshifilter-text">~/drupal-files</code></span> directory):</p> <div class="geshifilter"><pre class="bash geshifilter-bash"><span class="kw2">chmod</span> g+<span class="kw2">w</span> <span class="sy0">/</span>home<span class="sy0">/</span>dotcloud<span class="sy0">/</span>current<span class="sy0">/</span>sites<span class="sy0">/</span>default <span class="sy0">&amp;&amp;</span> \ <span class="kw2">ln</span> <span class="re5">-s</span> <span class="sy0">/</span>home<span class="sy0">/</span>dotcloud<span class="sy0">/</span>drupal-files <span class="sy0">/</span>home<span class="sy0">/</span>dotcloud<span class="sy0">/</span>current<span class="sy0">/</span>sites<span class="sy0">/</span>default<span class="sy0">/</span>files <span class="sy0">&amp;&amp;</span> \ <span class="kw3">echo</span> <span class="st0">&quot;Symlink for files created.&quot;</span></pre></div> <p>That runs whenever you run <span class="geshifilter"><code class="text geshifilter-text">dotcloud push</code></span>, and is similar to sites deployed with Capistrano, where the same kind of symlink approach is generally used.</p> http://benbuckman.net/tech/11/04/setting-drupal-dotclouds-server-automation-platform#comments dotcloud hosting nginx Tue, 19 Apr 2011 16:10:05 +0000 ben 7039 at http://benbuckman.net A showcase of red flags: How do web shops get away with this? http://benbuckman.net/tech/11/04/showcase-red-flags-how-do-web-shops-get-away <p>I recently had occasion to review the <a href="http://www.citigroup.com/citi/citizen/community/index.html">new website</a> of a major bank's <a href="http://en.wikipedia.org/wiki/Community_Reinvestment_Act">CRA</a>/charity wing. As a web developer, I'm always curious how other sites are built. This one raised a number of red flags for me, so I'd like to write about it as a showcase. I have three questions on my mind:</p> <ol> <li>How do professional web shops get away with such poor quality work? </li> <li>How do clients know what to look for (and avoid)? </li> <li>With plenty of good web shops out there, why aren't big clients connecting with them?</li> </ol> <p>I don't have the answers yet, but I do want to raise the questions. First, reviewing the site from my developer's perspective:</p> <ul> <li>The page contents are loaded with Javascript. With Javascript turned off, there's a little bit of left nav, and the main area is <a href="http://www.citigroup.com/citi/citizen/community/community_by_map.html">essentially blank</a>. This means the site is unreadable to <a href="http://en.wikipedia.org/wiki/Screen_reader">screen readers</a> (browsers for blind people), so the site is not <a href="http://en.wikipedia.org/wiki/Section_508_Amendment_to_the_Rehabilitation_Act_of_1973">508 compliant</a>. Maybe more importantly, it means the contents of the page are invisible to search engines. (See Google's <a href="http://webcache.googleusercontent.com/search?q=cache:hHFViVvObisJ:www.citigroup.com/citi/citizen/community/index.html+http://www.citigroup.com/citi/citizen/community/index.html&amp;cd=1&amp;hl=en&amp;ct=clnk&amp;gl=us&amp;source=www.google.com">cached copy</a> of the homepage for example.)</li> <li>The Javascript that pulls in the page contents is loading an XML file with <a href="http://en.wikipedia.org/wiki/Ajax_(programming)">AJAX</a> (see line 72 of the <a href="view-source:http://www.citigroup.com/citi/citizen/community/index.html">homepage source</a>). <a href="http://en.wikipedia.org/wiki/XML">XML</a> is meant for computers to talk to each other, not for human-readable websites, and AJAX is meant for interactive applications, not the main content area of every page. (I can only imagine the workflow for editing the content on the client side: does their CMS output XML? Do they manually edit XML? Or can the content never change without the original developers?)</li> <li>The meta tags are all generic: The <a href="https://developers.facebook.com/docs/opengraph/">OpenGraph</a> page title (used by Facebook) across the site is "ShareThis Homepage". (<a href="http://sharethis.com/">ShareThis</a> has a "social" widget which I assume they copied the code from, but having those meta values is probably worse than having none at all.)</li> <li>None of the titles are links, so even if Google could read the site, it would just see a lot of <em>Read More</em>'s.</li> <li>From a usability perspective, the 11px font size for most of the content is difficult to read.</li> <li>The <a href="http://www.citigroup.com/citi/citizen/community/community_by_map.html">Initiatives by State map</a> is built in Flash, which makes it unviewable on non-Android mobile devices. Flash is also unnecessary for maps now, given the slew of great HTML5-based mapping tools. Not to mention the odd usability quirks/bugs of the map's interface.</li> </ul> <p>I could go on, but that's enough to make the point. So what's going on here? I've seen enough similar signs in other projects to feel confident in speculating about this one.</p> <p>The vendor wasn't entirely incompetent - the hundreds of lines of Javascript code needed some technical proficiency to write - yet the site ignores so many core principles of good web development circa 2011. Whatever skills were applied here, were misplaced. The "web" has to accommodate our phones, <a href="http://www.intel.com/inside/smarttv/">TVs</a>, even our <a href="http://www.ford.com/technology/sync/">cars</a>, with "mobile" browsers (broadly defined) expected to eclipse the desktop in the not-too-distant future. That means <a href="http://accessites.org/site/2007/02/graceful-degradation-progressive-enhancement/">progressive enhancement</a> and basic HTML quality are critical. Web users also have an infinity of sites to visit, so to justify the investment in <em>yet another</em> site, you need some basic <a href="http://en.wikipedia.org/wiki/Search_engine_optimization">Search Engine Optimization</a> for people to find you. Building a site that is readable only to a narrow subset of desktop browsers constitutes an unfinished product in my book.</p> <p>On the client side, any site with more than one page, that needs to be updated more than once in a blue moon, needs a <a href="http://en.wikipedia.org/wiki/Content_management_system">content management system</a>. I don't see the tell-tales of any common CMS here, and the way the contents are populated with AJAX suggests the CMS under the hood is weak or non-existent. Reinventing the wheel with entirely custom code for a site makes it difficult to maintain in the long run: developers with expertise in common frameworks/CMSs won't want to touch it, and whoever does will need a long ramp-up/head-scratching period to understand it. It's also unnecessary with so many tested tools available. So clients need to <em>insist on a CMS</em>, and if a vendor tries to talk them out of one, or claim it will be 3x the price, they need to find a better vendor. I work with <a href="http://drupal.org">Drupal</a> and think it's the best fit for many sites (and free of license fees), but there are many good options.</p> <p>The site doesn't say who built it, and searching for relevant keywords doesn't bring up any clearly proud vendors. Was it a web shop at all, or an ad agency that added some token web services to their roster? (General rule: <em>avoid those vendors.</em>) Clients need to see their sites not as another piece of throwaway marketing material, but as a long-term, <em>audience-building</em> investment. Thinking of websites as advertisements that only need to be viewed on Windows running Internet Explorer is missing the point.</p> <p>I wonder, given the client (with <a href="http://en.wikipedia.org/wiki/Citigroup">$10 billion</a> in profit in 2010), how much this site cost. It's not a brochure site, but it's not particularly complex either. The only really custom piece is the map, and the same style could probably be implemented with <a href="http://openlayers.org/">OpenLayers</a> (or <a href="http://code.google.com/apis/maps/index.html">Google Maps</a> with some compromise from the client on color requirements). Whatever they paid, I suspect they could have paid one of the top Drupal shops the same price to build a maintainable, standards-based, truly impressive website, for visitors, internal staff, and reviewing developers alike.</p> <p>Then again, being such a large client means the vendor likely had to deal with all kinds of red tape. Maybe the really good web shops don't connect with that class of client because it's not worth the hassle. But surely the U.S. House of Representatives, in the process of <a href="http://buytaert.net/us-house-of-representatives-using-drupal">moving to Drupal</a>, has its own brand of red tape, and the <a href="http://www.phase2technology.com/">vendor</a> has project managers who can handle it.</p> <p>Websites are complex beasts and evaluating them from the client perspective is not the same as watching a proposed TV commercial. So how do client without core competencies in web development know what to avoid? <a href="http://www.google.com/search?sourceid=chrome&amp;ie=UTF-8&amp;q=web+development+best+practices">Googling it</a> will only get them so far. But the burden is ultimately on them: we all consume products about which we lack core expertise, and big corporations (as consumers and clients themselves) need to figure out the same <a href="http://en.wikipedia.org/wiki/Heuristic">heuristics</a> as everyone else. Trusting reputable vendors is one approach, but it's a vicious cycle if they're locked into one vendor (as companies with existing B2B relationships often are).</p> <p><em>Diversifying the advice you get</em> is critical. Big projects should have <a href="http://en.wikipedia.org/wiki/Request_for_proposal">RFPs</a> and a bidding process. (That helps enforce a realistic division of labor: little shops like <a href="http://newleafdigital.com">mine</a> don't respond to RFPs, but big shops that can afford that investment are happy to manage the project and outsource some development so suit their own core competencies.)</p> <p>The bidding process could even involve the vendors defending their proposals in front of their competitors. Then the top-tier CMS shop can eliminate the static-HTML or Cold Fusion shop from the running before it's too late. There are no silver bullets - there's a potential to be fleeced in any market - but in most of them, consumers have figured out ways to spot red flags and protect themselves. Website clients need to educate themselves and catch up.</p> http://benbuckman.net/tech/11/04/showcase-red-flags-how-do-web-shops-get-away#comments clients Thu, 14 Apr 2011 14:29:58 +0000 ben 7000 at http://benbuckman.net Drupal.org on Git: how to preserve GitHub repositories for existing modules http://benbuckman.net/tech/11/02/drupalorg-git-how-preserve-github-repositories-existing-modules <p>I've been keeping all my code on <a href="http://github.com/newleafdigital">GitHub</a>, waiting for <a href="http://drupal.org">Drupal.org</a> to move from CVS to Git. Well, <a href="http://drupal.org/node/1068664">it just did</a>! No more bitching about the lousy CVS process, time to start maintaining my module releases again.</p> <p>I want to keep my GitHub repositories' histories, of course. (Keeping the actual code on GitHub isn't critical, it's the commit/branches/tags that are important.) The new repositories on d.o were copied from the old CVS repositories, which contained (because I was lazy / holding out for Git) only final releases at best, or nothing at all, so I want the GitHub history to take precedence.</p> <p><a href="http://git-scm.org">Git</a> is "distributed," meaning the full repository is cloned everywhere it's used, including multiple "remotes." (This is unlike CVS or SVN, where there is a single remote server, and local "checkouts.") Anyway, this isn't the place for a Git tutorial (see the great [free] <a href="http://progit.org/book/">Pro-Git book</a> to learn Git and some of my past <a href="http://benbuckman.net/tech/tag/git">Git tricks</a>). I had assumed when d.o moved to Git, I could simply add a new remote and it would work automagically. It almost does, but needs a little extra work.</p> <p>There is some official d.o documentation for <a href="http://drupal.org/node/1059322#copy-repo">Copying your repository onto Drupal.org from an existing repository</a>. It clones the GitHub repo as a "mirror" and pushes the merged repo to d.o. This didn't work for me, for some reason. Maybe it'll work for you - try it first and see if it does. This worked for me instead.</p> <p>This is all done through the terminal, from the directory where I've cloned my existing GitHub repository. My remote name for Github is <em>origin</em> and the branch is <em>master</em> (the standard convention). I'm going to leave GitHub at <em>origin</em> and add drupal.org as <em>drupal</em>. To get the exact path to your git.drupal.org repository, go to the Git Instructions tab in your project page. (I assume you've already agreed to the new TOS and set up SSH keys as it explains there.)</p> <p><em>Add a new remote to the existing (but different) d.o repository:</em><br /> <span class="geshifilter"><code class="text geshifilter-text">git remote add drupal USER@git.drupal.org:project/PROJECT.git</code></span></p> <p>(A digression: if you try at this point, <span class="geshifilter"><code class="text geshifilter-text">git push drupal master</code></span>, it'll throw an error - </p> <pre> ! [rejected] master -> master (non-fast-forward)</pre><p> - because the repository histories are different and can't be merged normally.)</p> <p>Instead, pull the git.d.o branch alongside your existing one. (Note that the new 'drupal' remote has its own 'master' branch separate from the one we want. Hence we're <span class="geshifilter"><code class="text geshifilter-text">fetch</code></span>ing and not <span class="geshifilter"><code class="text geshifilter-text">pull</code></span>ing.)<br /> <span class="geshifilter"><code class="text geshifilter-text">git fetch drupal</code></span></p> <p>Then merge, but keeping the existing history ("ours") as the correct version:<br /> <span class="geshifilter"><code class="text geshifilter-text">git merge remotes/drupal/master --strategy=ours</code></span></p> <p>One problem at this point: the git.d.o migration made a good change: "Stripping CVS keywords" (like <span class="geshifilter"><code class="text geshifilter-text">$Id$</code></span>). That's now gone, because we've dismissed the d.o history. So we get it back with a cherry-pick: <span class="geshifilter"><code class="text geshifilter-text">git log</code></span> shows the commit hash from the migration, so copy the start of the hash, and re-apply it:<br /> <span class="geshifilter"><code class="text geshifilter-text">git cherry-pick ####</code></span>.</p> <p>Check your code to make sure it's good... then<br /> <span class="geshifilter"><code class="text geshifilter-text">git push drupal master</code></span><br /> And it's all up!</p> <p>To create a tag for a new release (example):<br /> <span class="geshifilter"><code class="text geshifilter-text">git tag 6.x-1.0-alpha1</code></span><br /> <span class="geshifilter"><code class="text geshifilter-text">git push drupal 6.x-1.0-alpha1</code></span></p> <p>(Interested in any feedback saying why this is stupid / why the other approaches should have worked / why this is causing the d.o infrastructure horrible damage... all I know is, it seemed to work for me.)</p> http://benbuckman.net/tech/11/02/drupalorg-git-how-preserve-github-repositories-existing-modules#comments drupal git Fri, 25 Feb 2011 20:13:29 +0000 ben 6865 at http://benbuckman.net Drupal as an Application Framework: Unofficially competing in the BostonPHP Framework Bakeoff http://benbuckman.net/tech/11/02/drupal-application-framework-bostonphp-competition <p>BostonPHP hosted a <a href="http://www.meetup.com/bostonphp/events/16011906/">PHP Framework Bake-Off</a> last night, a competition among four application frameworks: <a href="http://cakephp.org/">CakePHP</a>, <a href="http://www.symfony-project.org/">Symfony</a>, <a href="http://framework.zend.com/">Zend</a>, and <a href="http://codeigniter.com/">CodeIgniter</a>. A developer coding in each framework was given 30 minutes to build a simple <a href="https://bostonphp.mybalsamiq.com/projects/bostonphpjobboard/grid">job-posting app</a> (wireframes publicized the day before) in front of a live audience.</p> <p>I asked the organizer if I could enter the competition representing <a href="http://drupal.org">Drupal</a>. He replied that Drupal was a Content Management System, not a framework, so it should compete against <a href="http://wordpress.org/">Wordpress</a> and <a href="http://joomla.org/">Joomla</a>, not the above four. My opinion on the matter was and remains as follows: <ol> <li>The differences between frameworks and robust CMSs are not well defined, and Drupal straddles the line between them.</li> <li>The test of whether a toolkit is a framework is whether the following question yields an affirmative answer: “Can I use this toolkit to build a given application?” Here Drupal clearly does, and for apps far more advanced that this one.</li> <li>The exclusion reflects a kind of coder-purist snobbery ("it's not a framework if you build any of it in a UI") and lack of knowledge about Drupal's underlying code framework.</li> <li>In a fair fight, Drupal would either beat Wordpress hands-down building a complex app (because its APIs are far more robust) or fail to show its true colors with a simple blog-style site that better suits WP.</li> </ol> <p>Needless to say, I wasn't organizing the event, so Drupal was not included.</p> <p><strong>So I entered Drupal into the competition anyway.</strong> While the first developer (using CakePHP) coded for 30 minutes on the big screen, I built the app in my chair from the back of the auditorium, starting with a clean Drupal 6 installation, recording my screen. Below is that recording, with narration added afterwards. (Glance at the <a href="https://bostonphp.mybalsamiq.com/projects/bostonphpjobboard/grid">app wireframes</a> first to understand the task.)</p> <p>Worth noting: <ul> <li>I used Drupal 6 because I know it best; if this were a production app, I would be using the newly released <a href="http://drupal.org/drupal-7.0">Drupal 7</a>.</li> <li>I start, as you can see, with an empty directory on a Linux server and an Apache <a href="http://httpd.apache.org/docs/2.2/vhosts/">virtualhost</a> already defined.</li> <li>I build a small custom module at the end just to show that code is obviously involved at anything beyond the basic level, but most of the setup is done in the UI.</li> </ul> <p><br/></p> <p><iframe src="http://player.vimeo.com/video/20286577" width="681" height="383" frameborder="0"></iframe></p> <p>One irony of the framework-vs-CMS argument is that what makes these frameworks appealing is precisely the automated helpers - be it scaffolding in Symfony, baking in CakePHP, raking in Rails, etc - that all reduce the need for wheel-reinventing manual coding. After the tools do their thing, the frameworks require code, and Drupal requires (at the basic level) visual component building (followed, of course, by code as the app gets more custom/complex). Why is one approach more "framework"-y or app-y than the other? If I build a complex app in Drupal, and my time spent writing custom code outweighs the UI work (as it usually does), does that change the nature of the framework?</p> <p>Where the CMS nature of Drupal hits a wall in my view is in building apps that aren't compatible with Drupal's basic assumptions. It assumes the basic unit - a piece of "content" called a "node" - should have a title, body, author, and date, for example. If that most basic schema doesn't fit what you're trying to build, then you probably don't want to use Drupal. But for many apps, it fits well enough, so Drupal deserves a spot on the list of application frameworks, to be weighed for its pros and cons on each project just like the rest.</p> http://benbuckman.net/tech/11/02/drupal-application-framework-bostonphp-competition#comments boston drupal php Wed, 23 Feb 2011 16:20:29 +0000 ben 6848 at http://benbuckman.net Drupal: Re-Sync Content Taxonomy from core taxonomy http://benbuckman.net/tech/11/01/drupal-re-sync-content-taxonomy-core-taxonomy <p>I'm working on a major Drupal (6) site migration now, and one of the components that needs to be automatically migrated is taxonomies, mapping old vocabularies and terms to new ones. Core taxonomy stores node terms in the <span class="geshifilter"><code class="text geshifilter-text">term_node</code></span> table, which is fairly easy to work with. However, two taxonomy supplements are making the process a little more complicated:</p> <p>First, we're using <a href="http://drupal.org/project/content_taxonomy"><strong>Content Taxonomy</strong></a> to generate prettier/smarter taxonomy fields on node forms. The module allows partial vocabularies to be displayed, in nicer formats than the standard multi-select field. The problem with Content Taxonomy for the migration, however, is that it duplicates the term_node links into its own CCK tables. If a node is mapped to a term in term_node, but not in the Content Taxonomy table, when the taxonomy list appears on the node form, the link isn't set.</p> <p>Ideally, the module would have the ability to re-sync from node_term built in. There's an issue thread related to this - <a href="http://drupal.org/node/368918"><em>Keep core taxonomy &amp; CCK taxonomy synced</em></a> - but it's not resolved.</p> <p>So I wrote a <a href="http://drupal.org/project/drush">Drush</a> command to do this. To run it, rename "MODULE" to your custom module, <strong><em>backup your database</strong></em>, read the code so you understand what it does, and run <span class="geshifilter"><code class="text geshifilter-text">drush sync-content-taxonomy --verbose</code></span>.</p> <p style="font-weight: bold;">Warning: This code only works properly on *shared* CCK fields, that is, fields with their own tables (<span class="geshifilter"><code class="text geshifilter-text">content_field_X</code></span> tables, not a common <span class="geshifilter"><code class="text geshifilter-text">content_type_Y</code></span> table). Don't use this if your fields are only used by one content type.</p> <script src="https://gist.github.com/789963.js?file=Drupal%20-%20Re-Sync%20Content%20Taxonomy.php"></script><p><span style="font-size:.8em">[<a href="https://gist.github.com/789963">Embedded Gist - if it's not showing up, click here</a>.]</span></p> <p>The other taxonomy supplement that needs to be migrated is <a href="http://drupal.org/project/primary_term">Primary Term</a>. I'll be writing a similar Drush script for this in the next few days.</p> <p><em>Update 1/27:</em> There was a bug in the way it cleared the tables before rebuilding, should be good now. (Make sure to download the latest Gist.)</p> http://benbuckman.net/tech/11/01/drupal-re-sync-content-taxonomy-core-taxonomy#comments drupal drush taxonomy Fri, 21 Jan 2011 17:19:55 +0000 ben 6807 at http://benbuckman.net Google Analytics Custom Variables http://benbuckman.net/tech/10/12/google-analytics-custom-variables <p>I have a need on several projects to use Google Analytics for lists of "Most Popular" content. Drupal's core node_counter stats are DB write-heavy and don't work on sites behind a full-page cache like Varnish or Akamai. So querying the <a href="http://code.google.com/apis/analytics/docs/">Analytics API</a> for the most popular pages is a good alternative.</p> <p>Since I last looked into this over a year ago, several modules have filled in big pieces of this. The <a href="http://drupal.org/project/google_analytics_api">Google Analytics API</a> module handles API calls. <a href="http://drupal.org/project/ga_importer">Google Analytics Importer</a> claims to import stats into node_counter, but there’s no code released yet. <a href="http://drupal.org/project/ga_tokenizer">Google Analytics Tokenizer</a> could be used to put custom tokens into the Analytics tracker. <a href="http://drupal.org/project/google_analytics_counter">Google Analytics Counter</a> “can extrapolate page view count for cached values” and does a bunch of other stuff. The last one is the most interesting, but it runs on the fly, rather than on cron, which (running every 15 minutes or so) seems to make more sense for a global and relatively real-time list.</p> <p>Also, none of these modules track the node ID; they use the URL/alias as a proxy. That’s not necessarily a bad approach, but aliases can change, and nid’s would be more direct. Analytics offers five "custom variable" slots, why not use them for core variables like nid, user ID (if it can bypass page caching), and any other backend-set variables? So I wrote a <a href="http://github.com/newleafdigital/ga_customvars">Google Analytics Custom Variables</a> module (on GitHub not drupal.org; I'll gladly push another remote when d.o moves to Git), it sets nid and uid by default and allows the other 3 to be set with a hook. I haven’t been able to test it yet on the Analytics-report side, but I’ve verified that it’s putting the right JS into the page according to the API documentation.</p> <p>Next step is to use the Google Analytics API module to query the data on cron. I can probably use 90% of the Google Analytics Counter code with some tweaks.</p> <p><em>Update:</em> I added a Drupal.org <a href="http://drupal.org/project/ga_customvars">project page</a> pointing to the Github code. </p> <p><em>Update 12/7:</em> The module now retrieves node statistics from the custom 'nid' variable.</p> http://benbuckman.net/tech/10/12/google-analytics-custom-variables#comments analytics drupal Wed, 01 Dec 2010 22:29:43 +0000 ben 6711 at http://benbuckman.net Redefine PHP functions at runtime with runkit http://benbuckman.net/tech/10/11/redefine-php-functions-runtime-runkit <p>This is new to me: the <a href="http://pecl.php.net/package/runkit">runkit</a> extension available through PECL allows PHP functions to be <a href="http://us2.php.net/manual/en/function.runkit-function-redefine.php">redefined</a>, and <a href="http://us2.php.net/runkit">other code-level changes</a> to be made at runtime. I came across this through the <a href="http://drupal.org/project/path_alias_xt">Extended Path Alias module</a> for Drupal, which <a href="http://drupalcode.org/viewvc/drupal/contributions/modules/path_alias_xt/README.txt?view=markup">requires</a> runkit for some functionality. The potential for this is huge: it could redefine how whole frameworks are built, allowing for core functions to be cleanly redefined instead of hooked as is currently done in Drupal.</p> <p>Update: Looks like the <a href="http://drupal.org/project/drupal_override_function">Drupal Override Function module</a> implements the extension. Also, the <a href="http://us2.php.net/manual/en/function.runkit-lint.php">runkit_lint</a> function validates PHP syntax (like <span class="geshifilter"><code class="text geshifilter-text">php -l</code></span>) at runtime, making <span class="geshifilter"><code class="text geshifilter-text">eval()</code></span> much more stable.</p> http://benbuckman.net/tech/10/11/redefine-php-functions-runtime-runkit#comments drupal Thu, 04 Nov 2010 03:52:37 +0000 ben 6612 at http://benbuckman.net Drupal Dojo 9/29: Git workflows http://benbuckman.net/tech/10/09/drupal-dojo-929-git-workflows <p>Learned tonight at the unofficial Drupal Dojo in JP (copied from <a href="http://bh.starswithstripes.com/node/1">BHirsch</a>):</p> <blockquote><p> Helpful links about forking (w github)<br /> <a href="http://help.github.com/forking/">http://help.github.com/forking/</a><br /> <a href="http://help.github.com/pull-requests/">http://help.github.com/pull-requests/</a></p> <p>Undo a local change to a specific file do:<br /> (note this checkout trick only works with modified files, not new files)<br /> <span class="geshifilter"><code class="text geshifilter-text">$ git checkout filename</code></span><br /> OR<br /> <span class="geshifilter"><code class="text geshifilter-text">$ git checkout path/to/file</code></span></p> <p>Note: -- (double dash) separates all your options from the file path. E.g...<br /> <span class="geshifilter"><code class="text geshifilter-text">$ git command op1 op2 op3 -- path/to/file</code></span></p> <p>Undo a committed change to a single file<br /> <span class="geshifilter"><code class="text geshifilter-text">$ git checkout 915108a7 -- filename</code></span></p> <p>Undo all uncommitted local changes<br /> <span class="geshifilter"><code class="text geshifilter-text">$ git reset --hard</code></span></p> <p>Undo everything up to this point<br /> <span class="geshifilter"><code class="text geshifilter-text">$ git reset --hard thispoint</code></span><br /> e.g...<br /> <span class="geshifilter"><code class="text geshifilter-text">$ git reset --hard HEAD</code></span><br /> <span class="geshifilter"><code class="text geshifilter-text">$ git reset --hard 915108a71552</code></span></p> <p>Undo a specific commit (only that one commit), use revert<br /> <span class="geshifilter"><code class="text geshifilter-text">$ git revert 915108a71552</code></span></p> <p>Undo a bunch of commits all at once...<br /> <span class="geshifilter"><code class="text geshifilter-text">$ git revert 915108a71..b05f9b935e29fd</code></span><br /> Note: This (above) makes you log a message for each commit that is undone here, and each individual reversion is its own commit.</p> <p>Deleting a local branch (where example is the name of an example branch)<br /> <span class="geshifilter"><code class="text geshifilter-text">$ git branch example</code></span><br /> <span class="geshifilter"><code class="text geshifilter-text">$ git checkout example</code></span><br /> <span class="geshifilter"><code class="text geshifilter-text">$ git push origin example</code></span><br /> <span class="geshifilter"><code class="text geshifilter-text">$ git checkout master</code></span><br /> <span class="geshifilter"><code class="text geshifilter-text">$ git branch -D example # now local branch is deleted</code></span><br /> <span class="geshifilter"><code class="text geshifilter-text">$ git push origin :example # this deletes the branch from remote</code></span></p> <p>Undo a bunch of commits all at once with a single commit...<br /> <span class="geshifilter"><code class="text geshifilter-text">$ git revert -n 915108a71..b05f9b935e29fd</code></span><br /> <span class="geshifilter"><code class="text geshifilter-text">$ git commit -m &quot;this is the only message i need because it all gets done in one commit.&quot;</code></span> </p></blockquote> http://benbuckman.net/tech/10/09/drupal-dojo-929-git-workflows#comments drupal git Thu, 30 Sep 2010 00:57:44 +0000 ben 6477 at http://benbuckman.net Abandoned GMap.module, json_server FAIL; is contrib quality slipping? http://benbuckman.net/tech/10/09/abandoned-gmapmodule-jsonserver-fail-contrib-quality-slipping <p>I've been building a lot of custom functionality over the last few weeks on top of the <a href="http://drupal.org/project/gmap">GMap</a> module for Drupal, but was constantly pushing the limits of the module and finally decided to drop it completely. The module still uses v2 of the GMaps API, which is officially deprecated, and the module abstracts the GMaps API into its own API, probably making it easier to do direct PHP-to-map logic but adding unnecessary complexity when client-side API code is needed. (That's needed for anything the module doesn't implement, which is a lot.) The more custom code I wrote, the less value the module seemed to be providing, to the point that it seems like more of a hindrance than a utility now.</p> <p>So I decided to switch to a custom implementation of GMaps API v3. To get the data out of Drupal, it seemed like <a href="http://drupal.org/project/json_server">json_server</a> (a supplement to <a href="http://drupal.org/project/services">Services</a>) would be a robust way of building the map data as a private web service to feed into the client-side map building code. But json_server is absurdly buggy: I had to create and/or apply 2 patches (<a href="http://drupal.org/node/924058"><em>Use drupal_json instead of drupal_to_js</em></a> and <a href="http://drupal.org/node/839388"><em>Error response failure</em></a>) just to get it to work properly, and other patches like <a href="http://drupal.org/node/239847"><em>Use Drupal.settings.basePath</em></a> to apply other best practices. Two hours to get a module that's been out since July 2007 to work is absurd IMHO.</p> <p>I don't know if this is a general trend of just my experience lately, but I feel like so much of the contrib Drupal code I've tried lately has been really lacking. The whole Drupal development experience seems to have gone downhill the last few months. I hope that changes soon, but the delays with the D7 release make me wonder if there's something fundamentally wrong with the Drupal community's development model.</p> http://benbuckman.net/tech/10/09/abandoned-gmapmodule-jsonserver-fail-contrib-quality-slipping#comments drupal Mon, 27 Sep 2010 15:00:53 +0000 ben 6454 at http://benbuckman.net Contributed: Blogger Importer module to import Blogger blogs into Drupal http://benbuckman.net/tech/10/09/contributed-blogger-importer-module-import-blogger-blogs-drupal <p>As part of <a href="http://newleafdigital.com">New Leaf Digital's</a> commitment to contributing back to the <a href="http://drupal.org">Drupal</a> community, I created the <a href="http://drupal.org/project/blogger_importer">Blogger Importer module</a>, to import blogs from <a href="http://blogger.com">Blogger/Blogspot</a> into Drupal. (The Data Liberation Front had <a href="http://www.dataliberation.org/google/blogger">solutions</a> for Wordpress and other CMS's but not for Drupal.)</p> <p>My use case was the migration of my <a href="http://benb-xc-06.blogspot.com/">2006 motorcycle trip blog</a> into Drupal. (Ever since Blogger limited the number of posts on a page, breaking the monthly archives, the blog was very difficult to navigate.) I'm in the process of migrating the content (comments are next, then theming), and I'll post an update here when it's moved.</p> <p>There is a funny history to this: the very first module I wrote for Drupal was meant to solve the same use case - that time with my <a href="http://benbuckman.net">personal blog</a> - but it relied on Blogger's FTP publishing (now defunct) and was aptly titled "Crazy Blogger Migrator," not suitable for public consumption. This one uses Blogger's XML export format and should be much better!</p> <p>The code lives on <a href="http://github.com/newleafdigital/drupal_blogger_importer">NewLeafDigital's GitHub space</a>, and will (unfortunately) be sync'd manually to Drupal's CVS repository until d.o is on Git. Releases can be downloaded at <strong><a href="http://drupal.org/project/blogger_importer">http://drupal.org/project/blogger_importer</a></strong> or from <a href="http://github.com/newleafdigital/drupal_blogger_importer">GitHub</a> (but the latter is likely to be more up-to-date).</p> <p>I'm very curious to hear who else has a use for this module - please <a href="http://benbuckman.net/contact">let me know</a> if you do, or if you have suggestions for additional functionality to add.</p> <p><strong>Update 9/12:</strong> The latest dev release now imports <strong>comments</strong> in addition to nodes. The release node should be <a href="http://drupal.org/node/909270">here</a> within the next 12 hours (when the d.o batch runs), in the meantime check out the latest from Github.</p> http://benbuckman.net/tech/10/09/contributed-blogger-importer-module-import-blogger-blogs-drupal#comments drupal Tue, 07 Sep 2010 20:08:13 +0000 ben 6397 at http://benbuckman.net DrupalCamp "Productivity Hacks" Session Video Online http://benbuckman.net/tech/10/08/drupalcamp-productivity-hacks-session-video-online <p>The video from my DrupalCampNYC session, <em>A Developer's Arsenal of Productivity Hacks</em>, is now online thanks to Joel Moore of <a href="http://banghousestudios.com/">Bang! House Studios</a>. <a href="http://benbuckman.net/tech/10/07/drupalcampnyc8-session-developers-arsenal-productivity-hacks">Enjoy!</a></p> http://benbuckman.net/tech/10/08/drupalcamp-productivity-hacks-session-video-online#comments drupal Sun, 15 Aug 2010 01:08:20 +0000 ben 6358 at http://benbuckman.net Drupal: Using CCK Fields's Inconsistent Tables in Custom Queries http://benbuckman.net/tech/10/08/drupal-using-cck-fieldss-inconsistent-tables-custom-queries <p><a href="http://drupal.org">Drupal</a> has an inconsistent data structure for <a href="http://drupal.org/project/cck">CCK</a> fields: if a field is only in one content type, it's stored in a <span class="geshifilter"><code class="text geshifilter-text">content_type_XX</code></span> table as a column, but if it's shared across multiple content types, it moves to its own <span class="geshifilter"><code class="text geshifilter-text">content_field_XX</code></span> column. <a href="http://drupal.org/project/views">Views</a> figures out where fields are located automatically, but in custom SQL queries, this can be a real pain - you can write a query that works one day, then share a field, and the query breaks.</p> <p>I asked for a solution to this on <a href="http://drupal.org/irc">IRC</a> and was pointed to this post by drewish, <a href="http://drewish.com/content/2010/06/correctly_accessing_cck_fields_in_sql_queries"><em>Correctly accessing CCK fields in SQL queries</em></a>. I adapted that method a little to create this helper function:</p> <div class="geshifilter"><pre class="php geshifilter-php"><span class="co4">/** * function to get the TABLE or COLUMN for a CCK FIELD * method adapted from http://drewish.com/content/2010/06/correctly_accessing_cck_fields_in_sql_queries * * @param $field_name * @param $type 'table' or 'column' * * to get both table and column, run function twice with each $type */</span> <span class="kw2">function</span> helper_cck_field_sql<span class="br0">&#40;</span><span class="re0">$field_name</span><span class="sy0">,</span> <span class="re0">$type</span><span class="br0">&#41;</span> <span class="br0">&#123;</span> <span class="re0">$field</span> <span class="sy0">=</span> content_fields<span class="br0">&#40;</span><span class="re0">$field_name</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="re0">$db_info</span> <span class="sy0">=</span> content_database_info<span class="br0">&#40;</span><span class="re0">$field</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span><span class="re0">$type</span> <span class="sy0">==</span> <span class="st_h">'table'</span> <span class="sy0">&amp;&amp;</span> <span class="kw3">isset</span><span class="br0">&#40;</span><span class="re0">$db_info</span><span class="br0">&#91;</span><span class="st_h">'table'</span><span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span> <span class="kw1">return</span> <span class="re0">$db_info</span><span class="br0">&#91;</span><span class="st_h">'table'</span><span class="br0">&#93;</span><span class="sy0">;</span> <span class="br0">&#125;</span> <span class="kw1">elseif</span> <span class="br0">&#40;</span><span class="re0">$type</span> <span class="sy0">==</span> <span class="st_h">'column'</span> <span class="sy0">&amp;&amp;</span> <span class="kw3">isset</span><span class="br0">&#40;</span><span class="re0">$db_info</span><span class="br0">&#91;</span><span class="st_h">'columns'</span><span class="br0">&#93;</span><span class="br0">&#91;</span><span class="st_h">'value'</span><span class="br0">&#93;</span><span class="br0">&#91;</span><span class="st_h">'column'</span><span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span> <span class="kw1">return</span> <span class="re0">$db_info</span><span class="br0">&#91;</span><span class="st_h">'columns'</span><span class="br0">&#93;</span><span class="br0">&#91;</span><span class="st_h">'value'</span><span class="br0">&#93;</span><span class="br0">&#91;</span><span class="st_h">'column'</span><span class="br0">&#93;</span><span class="sy0">;</span> <span class="br0">&#125;</span> <span class="kw1">return</span> <span class="kw4">NULL</span><span class="sy0">;</span> <span class="br0">&#125;</span></pre></div> http://benbuckman.net/tech/10/08/drupal-using-cck-fieldss-inconsistent-tables-custom-queries#comments drupal Mon, 02 Aug 2010 14:37:02 +0000 ben 6304 at http://benbuckman.net