Tech Blog :: Identify differences between Drupal databases with Drush, mysqldump, and diff

Jul 7 '10 9:22pm

Identify differences between Drupal databases with Drush, mysqldump, and diff

(This code will be part of my DrupalCamp NYC session on A Developer's Toolkit of Productivity Hacks.)

I had to do a brute-force deployment today, synchronizing functionality and content in 2 directions between a development and live site that split months ago. So I needed to see exactly what changed on each site's database to plan the deployment process. This scriptlet, run on the server with both sites, helped a lot.

(You'll want to do all this from a directory outside the webroot because you don't want the public to download SQL dumps.)

• Set up shortcuts to each site, as user 'admin', passing all prompts (not crucial for the rest but generally useful):

DEV="drush --root=/path/to/dev/site -u admin -y "
PROD="drush --root=/path/to/live/site -u admin -y "

• Make shortcuts to mysqldump each site's database, sorting by primary key and with full insert's - so each record is its own line and can be matched to the other. (You can use $DEV sql-connect and $PROD sql-connect, and copy/paste from there:

PRODDUMP="mysqldump --order-by-primary --skip-extended-insert -hHOST -uUSER -pPASS PRODDB "
DEVDUMP="mysqldump --order-by-primary --skip-extended-insert -hHOST -uUSER -pPASS DEVDB "

• Now get a list of all your tables. (This assumes one site has a more complete set):
$DEV sql-query "show tables" > tables.log

(Alternately, you could do this for both tables by combining them with >>, piping (|) to uniq and back to the file, for the full list from both sides.)

• Now get each table from development and production, and diff them. (The semicolons allow this to be consolidated to 1 line; I separate it here for readability.)

mkdir tables;
cat tables.log | while read TABLE; do 
  $DEVDUMP $TABLE > tables/${TABLE}-dev.sql;
  $PRODDUMP $TABLE > tables/${TABLE}-prod.sql; 
  diff -u tables/${TABLE}-prod.sql tables/${TABLE}-dev.sql > tables/${TABLE}.diff;
  echo "Done with $TABLE";

• Now delete the files with no differences, they're just clutter. Those seemed to be 10 lines long, so I useD that as the magic number:

cat tables.log | while read TABLE; do 
  LCOUNT=`cat tables/${TABLE}.diff | wc -l`; 
  echo "$TABLE : $LCOUNT";
    if [[ $LCOUNT -lt 11 ]]; then 
      rm tables/${TABLE}.diff && rm tables/${TABLE}-dev.sql && rm tables/${TABLE}-prod.sql && echo "DELETED $TABLE"; 

You can probably also ignore the cache and session tables. But this will give you a good view of what nodes, variables, users, and so on have changed.

If anyone wants to turn this into a Drush plugin (a la drush sql-diff PROD DEV --destination=tables), please do!

What kind of hacks, shortcuts, and tricks do you use in your development workflow? Let me know so I can include them in the session!

Why not use a MySQL diff tool - for example: (lists several)

These are a lot less fiddly than diffing SQL dumps as text, and have the ability to compare structure and data separately (and table by table etc). Some of them also have the ability to determine the delta in terms of MySQL commands to get from one to the other.

I might mention that the first iteration of AutoPilot catered to this style of change management. While somewhat effective, it also had two flaws:

1. it required developers to have INTIMATE knowledge of table structures of the modules they were using.
2. it did not allow for any content that spanned two environments. In particular, I'm referring to node IDs. Of course, this applies across the board for any table that uses a sequence.

While I think there's some utility in diffing records between tables on different environments, I think the solution to reconciling the two environments likely lies in APIs and architecture. It really is the role of the developer to make sure that content can be migrated.

Have you tried SchemaSync?

This is a fabulous tutorial. Really useful stuff for tracking changes. Thank you very much!!