Tech Blog :: SVN Checksum Mismatches


Dec 1 '09 12:18am
Tags

SVN Checksum Mismatches

I've been working on some very web apps lately, with codebases that weren't version controlled, multiple versions of the site with no process for syncing (except tarballs and FTP), PHP mixed in with SSI, etc. A huge cross-site project included a number of these sites, so I decided to make some basic tuneup/cleanup a prerequisite. The code's all in SVN now (Git's on the medium-term agenda), which has its benefits of course, but also introduces a whole new slew of problems.

The latest is hundreds of "checksum mismatch" errors on the server when I try to update (or sometimes commit) a versioned file. My understanding is that SVN uses checksums (hashes of contents and metadata) to quickly identify local modifications. But the production team is FTPing files directly over the versioned copies, and somewhere in there, I'm hypothesizing, the metadata is broken, SVN can't recognize the file anymore, and it becomes effectively corrupted.

I'm sort of stuck on this one now, but I need to resolve it fast. Does anyone know any fixes for this problem?

SVN keeps the checksum of the original file in the hidden .svn folders. It compares the original file to the modified file to see what's changed. It also compares the checksum to the original file, to make sure the original files have not changed.

If you modify the files in the .svn folders, you'll get trouble. If you transfer files via SSH/SFTP or FTP and the ascii/image file transfer mode gets messed up, there's a possibility that the files in the .svn folder are transferred in ASCII mode instead of image (aka binary), and the end-of-line (EOL) chars are getting mistranslated. This is especially possible if you're running cross-platform.

SVN may have some auto EOL correction that I don't know about, in which case this whole post is moot. But at least it's something to check for.

You can investigate yourself.. do a sha1 on all the files in one of the (smaller) .svn folders on your local HD, and then do the same on the server that has the "checksum" issues, and then a third time on an unmodified working copy that you checkout on the server itself. My guess is that they will be different, which will show you what has changed. Good luck, we're trying to figure this out on our end as well.

Post new comment

Don't bother putting in spam links. They'll be set to rel=nofollow and will be removed and reported as spam shortly after submitting.

The content of this field is kept private and will not be shown publicly.
CAPTCHA
Are you human?