Tech Blog :: svn

Aug 4 '10 10:51pm

Moving to Git, part 1

I've been using Subversion for years and it's high time to switch to Git. So I finally got moving on that last night, starting with various non-site projects like my contributed Drupal modules.

So far:

  • Set up an account at GitHub (I've been hosting my own SVN repo and don't want to bother with that anymore; besides, GitHub offers so much more than code repositories.)
  • Install Git for Mac package.
  • Install SVN2Git.
  • On my server, backup my SVN repo with tar (in case I lose some critical files or file history in the process).

My projects are all in subdirectories of one SVN repo, but in Git they're supposed to be separate. For each project in SVN:

  • Create a repo in Github
  • Create a holder for the new repository locally: mkdir PROJECT
  • Initialize a repository in that directory: git init
  • Import from SVN: svn2git https://path/to/svn/dir --rootistrunk
  • Connect the local repo to the Github repo: git remote add origin
  • Push/sync it: git push origin master

In some cases I was pulling the SVN directories via svn:externals into other sites. So as a stopgap, I switched the external URL to GitHub's read-only SVN access, and it moved over seamlessly.

SVN2Git imports the entire history of the files, including trunk/tags/branches if applicable (in this case the --rootistrunk flag indicates a flat structure), so I could then delete the files from my SVN repo.

Next steps: * Read Pro Git (free) and other Git resources I've collected. * Learn how to establish a Git-centric development/Drupal workflow. * Start moving whole sites from SVN to Git.

For anyone interested, the code for my Drupal modules and other code I'll release publicly over time will live at (Unfortunately for now, contributed Drupal modules still have to be copied to CVS, but that's moving to Git eventually as well.)

May 5 '10 7:13pm

Upgrade SVN to 1.6 on Ubuntu 8

If you're running Ubuntu 8 ("Hardy") - a common setup for Slicehost servers including the one running this site - then you're probably still using SVN (client) v1.4.6. That's the latest version in apt-get with the standard libraries. But this post explains how to get 1.6. The only unexpected bit was a dependency on an Apache upgrade, but the whole process took less than 2 minutes, with less than 10 seconds of Apache downtime.
Apr 2 '10 1:09pm

GitHub SVN Support

Apparently this was not an April Fools joke despite the timing: GitHub now supports SVN [read-only] access to its Git repositories. So a project can be primarily in SVN but pull in Git files with svn:externals. That could be extremely useful on a lot of projects I've seen.

Dec 16 '09 11:13am

How to add an unversioned live site into SVN

Recently at work, I've been moving a bunch of older sites into SVN. (The plan is to migrate everything to Git soon.) I've done it on 3 big sites so far and got it down to a simple process.

Some background on the circumstances I documented this in:
  • This is all done on a *nix server via an SSH'd bash shell.
  • These sites have live and staging versions which, because they weren't versioned, were out of sync. Diff'ing all the differences between the sites seemed prohibitively complicated, so after consulting with everyone who uses the staging site, I put the live site into SVN and pushed it over to staging.
  • These sites have various mini-sites working inside them, some Drupal and some Java apps which periodically write static HTML files to the server. Years of these files have accumulated to many gigabytes of data, most of which didn't need to be in SVN. So I made liberal use of svn:ignore.
  • These sites are based on a main server which pushes out to multiple load-balanced servers, which are what visitors actually interact with. So coordination with our sysadmin to pause that export process momentarily prevented complications in the last steps.
  • You might need sudo access to do some of these steps. Be careful about preserving the file ownership/permissions.

So here's how it goes. I'm using generic site and directory names, substitute as needed:
ssh webserver
## backup the webroot
cp -R /var/www /var/www-backup
# and/or
tar -czf ~/mysite-backup.tar.gz /var/www
# copy the site outside the live webroot (into home dir)
# (rsync parameters preserve links and permissions; see 'man rsync' for details)
mkdir ~/mysite-copy/
sudo rsync -rlpogt /var/www/ ~/mysite-copy/
# should be the same size
du -sh ~/mysite-copy; du -sh /var/www
# make svn space
svn mkdir http://svn/repo/mysite -m"creating import space for mysite"
svn mkdir http://svn/repo/mysite/tags -m"creating tags dir"
## import the folders that should be fully versioned from the copy
## don't import the folders that should be svn:ignore'd
## (here I'm making a date-stamped static "tag" copy and then copying to trunk for the future)
cd ~/mysite-copy
svn import somedir http://svn/repo/mysite/tags/mysite-webserver-MMDDYY/somedir -m"importing mysite somedir"
# ... repeat for other directories
  # check what's worth versioning in the root
  du -hc --max-depth 1
    ## (might need to fix ownership and permissions for import to work, e.g.
      sudo chgrp -R webgroup .
      sudo chmod -R g+rwx .
    ## or just sudo the whole import
    ## if import fails, restart it (server seems to handle, no partial imports)
## checkout the [partially imported] site
mkdir ~/mysite-checkout
cd ~/mysite-checkout/
svn co http://svn/repo/mysite/tags/mysite-webserver-MMDDYY/ ~/mysite-checkout/
# copy everything over checkout
rsync -rlpogt ~/mysite-copy/ ~/mysite-checkout/ --progress
## put other folders in svn:ignore
## (can be done several ways, this requires manual editing but is simple)
## find folders in mysite-copy root, pipe thru sort, save to .svnignore
for D in `find . -type d -mindepth 1 -maxdepth 1`; do 
  echo ${D##./}
done | sort > .svnignore
## then remove the imported files/folders imported, remove .svn
nano .svnignore
# once .svnignore is good,
svn propset svn:ignore . -F .svnignore
# (remember to keep those in sync, so the file is always a reference for propset'ing)
# confirm ignores, no changes
svn status
# or to see everything,
svn status --no-ignore
# DO NOT DO 'svn add *' !!! -- that will add the ignored dirs!
# add root files 
# (this only adds what appears in the status report; I'm borrowing this from someone else)
svn status | sed -ne "/^?/ {s/^? *//;s/ /\\\ /g;p;}" | xargs svn add
# commit
svn commit ~/mysite/checkout -m"A VERY DESCRIPTIVE MESSAGE"
## (I ran into conflicts here with case-sensitive duplicates; 
## in that case, identify and delete the extras, svn cleanup, commit again)
## make sure /trunk doesn't exist already (otherwise it'll go into a subfolder)
svn ls http://svn/repo/mysite/trunk
svn cp http://svn/repo/mysite/tags/mysite-webserver-MMDDYY http://svn/repo/mysite/trunk -m"copying mysite-webserver tag to trunk"
## make sure all subsequent commits go to the trunk
svn switch http://svn/repo/mysite/trunk
## checkout again into a directory parallel to the webroot
sudo mkdir /var/www-WC
cd /var/www-WC/
# (using sudo here, then fixing the permissions; might not be necessary in all cases)
sudo svn co http://svn/repo/mysite/trunk/ /var/www-WC/
sudo chgrp -R webgroup /var/www-WC
sudo chmod -R g+rw /var/www-WC/
## copy everything else (and recently updated files)
sudo rsync -rlpogt /var/www/ /var/www-WC/ --progress
## (run twice, 2nd time should have little/no output)
# check size & differences (excluding SVN data)
du -sh --exclude .svn /var/www/; du -sh --exclude .svn /var/www-WC/
diff --recursive --exclude .svn /var/www /var/www-WC
# (assuming it's identical or close enough...)
## commit anything newer...
## at this point it's worth checking with your sysadmin (if you're not a sysadmin yourself) that any load-balancer exports or the like are shut off
## flip the webroot
sudo mv /var/www /var/www-OLD; sudo mv /var/www-WC /var/www;
## test it, du and diff again
## keep the -OLD directory for a while just in case, or tarball it and put it somewhere else
That's pretty much it... let me know if I missed anything, if you have any questions, comments, etc. Good luck!
Nov 30 '09 11:18pm

SVN Checksum Mismatches

I've been working on some very web apps lately, with codebases that weren't version controlled, multiple versions of the site with no process for syncing (except tarballs and FTP), PHP mixed in with SSI, etc. A huge cross-site project included a number of these sites, so I decided to make some basic tuneup/cleanup a prerequisite. The code's all in SVN now (Git's on the medium-term agenda), which has its benefits of course, but also introduces a whole new slew of problems.

The latest is hundreds of "checksum mismatch" errors on the server when I try to update (or sometimes commit) a versioned file. My understanding is that SVN uses checksums (hashes of contents and metadata) to quickly identify local modifications. But the production team is FTPing files directly over the versioned copies, and somewhere in there, I'm hypothesizing, the metadata is broken, SVN can't recognize the file anymore, and it becomes effectively corrupted.

I'm sort of stuck on this one now, but I need to resolve it fast. Does anyone know any fixes for this problem?

Nov 10 '09 3:33pm

Shell shortcut for SVN URL

This took me some time to figure out. It's a shell command to extract the URL from an svn info command. Run this (or put it in .bash_profile):

alias svnurl="svn info | egrep '^URL: (.*)' | sed s/URL\:\ //"

Then you can do things like svn ls $(svnurl) or svn log $(svnurl) to run commands on the remote repository. I expect this'll save me a bunch of time in the future.

And this is another one I've used for a while, to view changes in a working copy, especially useful when there are externals that output text even when there are no changes:

alias svnnew='svn st | grep -e "^[!M?~ACDR ]"'
Oct 9 '09 1:12pm

SVN Rollback

A great reference tip via @madeleinep77:

svn merge -rHEAD:[revision#] [svn-repository-url] [path-to-local-copy]
Jul 23 '09 4:54pm

Using SVN Merge for Dev-Prod Web Deployments

Note: As of SVN 1.5 and newer, merging has gotten a little easier, so some of this may be obsolete.

We use Subversion (SVN) to manage our code, and typically branch the code from 'dev' to 'prod' at launch time, with each project's tech lead determining the best way to manage the two repositories in parallel for subsequent development. SVN, like any decent versioning system, has branching and merging functionality, but until recently I had little success (and some disaster) trying to use it. So I used the cruder but tried-and-true method of locally rsync'ing dev to prod (with -rC for recursive and ignoring SVN metadata) and committing the changes to each separately. That mostly works fine, except it can be a very tedious process (using FileMerge to check every file and figure out which branch's version is correct); it makes SVN's logs very difficult to understand (with no clear history to a file and duplicate commits of everything); it's easy to make a mistake and mess up a branch; and it seemed unnecessary given SVN's built-in functionality. In some cases, the dev and prod branches got so out of whack that the developers had given up on merging them, and all development was done on local copies of the production branch, an inefficient (and dangerous) method that made branches pointless.

So on my latest post-launch project, I decided to figure out SVN merging. The first deployment process took 2+ hours with a splitting headache at the end; then 1 more lucid hour; now it's 5 minutes, so I do this even for extremely urgent fixes. svn help merge describes multiple ways to use the command; this is how I've gotten it to work.

SVN Merge essentially diff's one branch (e.g. dev) from R1 (some revision) to R2 (some other revision, generally HEAD, the latest commit), then applies those differences/patches to a second branch (e.g. prod, in the same repository). This is all done locally, on working copies of each branch. The result is local changes to the second branch, which you then commit to that branch. Since the branches are in the same repository, the commit numbers are sequential: changes to dev might be in commits/revisions 101, 102, 103, then merged to prod, committed at r104. Subsequent commits to dev start at r105, etc. If all development is done on dev, R1 is generally the revision at which the 2nd branch (prod) was last merged and committed.
So an example merge logic would be, take all changes to dev since r104 [to HEAD], and copy them to prod. But I'm getting ahead of myself.

The most important thing to keep track of is the revisions at which each branch or merge are done. I keep a log as a GDoc; you can run 'svn log' each time to get the same information.
Also before I get into syntax, some precautions:

  • always use the --dry-run tag with 'svn merge' the first time, to check what it wants to do.
  • backup all the copies of the site that might be changed (local and servers' dev and prod).
  • very important: make sure your local working copies are all updated ('svn up'), so the HEAD revisions match.

On this particular project, I have a /project folder, under which are dev and prod folders, in each a working copy of their subsequent branches (of the same site). So /project gives me a good vantage point: I can 'svn up dev; svn up prod' to update both, and the folder names make for clear syntax.
For the initial launch, I had branched dev (via svn cp) to prod at r833. I had then done some post-launch debugging directly on the prod branch, so it needed to be merged back to dev, to resume development there.
The syntax (in a Unix terminal on OSX), from /project, was like this:
svn merge -r833:HEAD prod@HEAD dev

The logic: Take all the changes on prod since revision #833, and copy them to dev.
(It can take a while to run the merge, so don't worry if you don't see any response for a while.)
When that's done, run 'svn st dev' make sure that everything looks right, and commit dev. Keep track of the new revision #, you'll need it later. In this case let's say it was r900.

With dev now an accurate copy of prod, I then made some changes to the dev code, so dev is now at r950.
Fast forward a couple of days, I want to deploy the new code from dev back to prod.
From /project again, with /project/dev and /project/prod updated and all wanted changes in dev committed to the dev branch, I did:
svn merge -r900:HEAD dev@HEAD prod

Note r900 -- that's the committed revision from the last merge. Prod hasn't changed since r900, but dev has. All the changes I want are on dev between r900 and HEAD. So this does exactly that: copies all those changes from dev since r900, from my local dev (@HEAD since it's updated), to my local prod. Again run a dry run first to make sure it's not doing anything wonky --
svn merge -r900:HEAD dev@HEAD prod --dry-run

-- then a live run, then commit again to prod.
I said dev was previously at r950, so the new merged revision on prod should be r951.

So a week later dev's at r999, I want to merge to prod again:
svn merge -r951:HEAD dev@HEAD prod

That creates r1000 on prod ... a few days later,
svn merge -r1000:HEAD dev@HEAD prod

... and so on. The key is to keep track of that merged revision number, because all changes made subsequent to that merge on the other branch will need to be merged back.

If I wanted to copy prod back to dev, let's say because users have uploaded files on the live server which I've committed to the prod branch, I could use the original prod-to-dev syntax, like so:
svn merge -rXXXX:HEAD prod@HEAD dev

(this time the source being changes on prod, the destination being my local dev, the opposite of the others.)

It could get tricky if the same files are being modified simultaneously on both branches, but in our work this shouldn't be necessary -- one site branch is considered the active development branch at any time, everyone on the team is notified, and conflicts are handled locally. What I have done successfully is committed uploaded files to the prod branch, and merged from dev to prod, with the new prod files (missing in the dev branch) being left alone, as they should. I suspect with more complicated simultaneous modifications, anyway, that the principle would be the same, it would just be the quantity needed to wrap your head around that would increase. I haven't tested scenarios like that yet.

Points to remember:

  • keep track of the revision # from the last committed merge
  • think in terms of copying changes in a source since some point to a destination.
  • only work on 1 development active branch at a time
  • work on updated local working copies
  • use --dry-run first and back everything up
  • have Advil available for the first few times.

The beauty of this approach, in the end, is a clear history of each file in the logs, no duplication, synchronized branches, and a much cleaner deployment process. I'm going to use it on all the future projects that I can, and recommend that my colleagues do the same.

(Please let me know if anything I've written here is unclear or blatantly wrong. ben at echoditto dot com.)
Good luck!

More resources: