Tech Blog :: unix


Jul 7 '10 9:22pm
Tags

Identify differences between Drupal databases with Drush, mysqldump, and diff

(This code will be part of my DrupalCamp NYC session on A Developer's Toolkit of Productivity Hacks.)

I had to do a brute-force deployment today, synchronizing functionality and content in 2 directions between a development and live site that split months ago. So I needed to see exactly what changed on each site's database to plan the deployment process. This scriptlet, run on the server with both sites, helped a lot.

(You'll want to do all this from a directory outside the webroot because you don't want the public to download SQL dumps.)

• Set up shortcuts to each site, as user 'admin', passing all prompts (not crucial for the rest but generally useful):

DEV="drush --root=/path/to/dev/site -u admin -y "
PROD="drush --root=/path/to/live/site -u admin -y "

• Make shortcuts to mysqldump each site's database, sorting by primary key and with full insert's - so each record is its own line and can be matched to the other. (You can use $DEV sql-connect and $PROD sql-connect, and copy/paste from there:

PRODDUMP="mysqldump --order-by-primary --skip-extended-insert -hHOST -uUSER -pPASS PRODDB "
DEVDUMP="mysqldump --order-by-primary --skip-extended-insert -hHOST -uUSER -pPASS DEVDB "

• Now get a list of all your tables. (This assumes one site has a more complete set):
$DEV sql-query "show tables" > tables.log

(Alternately, you could do this for both tables by combining them with >>, piping (|) to uniq and back to the file, for the full list from both sides.)

• Now get each table from development and production, and diff them. (The semicolons allow this to be consolidated to 1 line; I separate it here for readability.)

mkdir tables;
cat tables.log | while read TABLE; do 
  $DEVDUMP $TABLE > tables/${TABLE}-dev.sql;
  $PRODDUMP $TABLE > tables/${TABLE}-prod.sql; 
  diff -u tables/${TABLE}-prod.sql tables/${TABLE}-dev.sql > tables/${TABLE}.diff;
  echo "Done with $TABLE";
 done

• Now delete the files with no differences, they're just clutter. Those seemed to be 10 lines long, so I useD that as the magic number:

cat tables.log | while read TABLE; do 
  LCOUNT=`cat tables/${TABLE}.diff | wc -l`; 
  echo "$TABLE : $LCOUNT";
    if [[ $LCOUNT -lt 11 ]]; then 
      rm tables/${TABLE}.diff && rm tables/${TABLE}-dev.sql && rm tables/${TABLE}-prod.sql && echo "DELETED $TABLE"; 
    fi;  
 done

You can probably also ignore the cache and session tables. But this will give you a good view of what nodes, variables, users, and so on have changed.

If anyone wants to turn this into a Drush plugin (a la drush sql-diff PROD DEV --destination=tables), please do!

What kind of hacks, shortcuts, and tricks do you use in your development workflow? Let me know so I can include them in the session!

Apr 23 '10 2:53pm
Tags

Bash tip: find the extensions of large files

I was exploring a site today with a webroot that filled more than 15 GBs. To see the full list of big files (>1 MB), I did:

find . -size +1024k

And to find the file extensions of the biggest files, I did:

find . -size +1024k | while read BIGFILE; do echo "$BIGFILE" | awk '{print tolower($0)}' | awk -F . '{print $NF}'; done | sort | uniq