Tapestry Training -- From The Source

Let me help you get your team up to speed in Tapestry ... fast. Visit howardlewisship.com for details on training, mentoring and support!

Saturday, September 15, 2007

Mac OS X bundles vs. Subversion

If you work on Mac OS X, you may have noticed how cool Macs deal with complex documents, things like Keynote presentations or applications themselves. They're stored as directories. The Finder hides this, making them look and act like individual files. This works nicely, often the contents of a bundle are simple text and XML files ... whereas the equivalent under Windows is either a very proprietary (and potentially fragile) binary format, or multiple files and folders that YOU have to treat as a unit.

Alas, this all breaks down when using Subversion. You can't just check in MyPresentation.key into SVN ... it will create those pesky .svn directories inside the bundle, and those will be destroyed every time you save your presentation.

My solution to this is to convert the bundles into an archive, and check in the archive. The bundle folders are marked as svn:ignore. I guess this reveals that I mostly use SVN as a safe, structured backup.

In any case, manually creating those archives can be a pain. So ... out comes my solution to many problems: Ruby.

The goal here is to find bundles that need to be archived; do it efficiently (only update the archive if necessary) and do it recursively, seeking out bundles in sub-directories.


# Used to prepare a directory for commit to Subversion. This is necessary for certain file types on Mac OS X because what appear to be files in the Finder
# are actually directories (Mac uses the term "bundle" for this concept). It is useless to put the .svn folder inside such a directory, because it will
# tend to be deleted whenever the "file" is saved.  
# Instead, we want to compress the directory to a single archive file; the bundle will be marked as svn:ignore.
# We use tar with Bzip2 compression, which is resource intensive to create, but 
# compresses much better than GZip or PKZip.
# The trick is that we only want to create the acrhive version when necessary; when 
# the archive does not exist, or when any file
# in the bundle is newer than the archive.

require 'optparse'

# Set via command line options

$extensions = %w{pages key oo3 graffle}
$recursive = true
$dry_run = false

# Queue of folders to search (for bundles)

$queue = []

def matching_extension(name)
  dotx = name.rindex('.')
  return false unless dotx
  ext = name[dotx + 1 .. -1]
  return $extensions.include?(ext)

# Iterate over the directory, identify bundles that may need to be compressed and (if recursive) subdirectories
# to search.
# path: string path for a directory
def search_directory(dirpath)
  Dir.foreach(dirpath) do |name|
    # Skip hidden files and directories
    next if name[0..0] == "."
    path = File.join(dirpath, name)
    next unless File.directory?(path)
    if matching_extension(name)
      update_archive path
    if $recursive
      $queue << path

def needs_update(bundle_path, archive_path)
  return true unless File.exists?(archive_path)
  archive_mtime = File.mtime(archive_path)
  # The archive exists ... can we find a file inside the bundle thats newer?
  # This won't catch deletions, but that's ok.  Bundles tend to get completly
  # overwritten when any tiny thing changes.
  dirqueue = [bundle_path]

  until dirqueue.empty?
    dirpath = dirqueue.pop
    Dir.foreach(dirpath) do |name|
      path = File.join(dirpath, name)
      if File.directory?(path)
        dirqueue << path unless [".", ".."].include?(name)
      # Is this file newer?
      if File.mtime(path) > archive_mtime
        return true
  return false

def update_archive(path)
  archive = path + ".tar.bz2"
  return unless needs_update(path, archive)

  if $dry_run
    puts "Would create #{archive}"

  puts "Creating #{archive}"
  dir = File.dirname(path)
  bundle = File.basename(path)
  # Could probably fork and do it in a subshell
  system "tar --create --file=#{archive} --bzip2 --directory=#{dir} #{bundle}"


$opts = OptionParser.new do |opts|
  opts.banner = "Usage: prepsvn [options]"

  opts.on("-d", "--dir DIR", "Add directory to search (if no directory specify, current directory is searched)") do |value|
    $queue << value

  opts.on("-e", "--extension EXTENSION", "Add another extension to match when searching for bundles to archive") do |value|
    $extensions << value
  opts.on("-N", "--non-recursive", "Do not search non-bundle sub directories for files to archive") do
    $recursive = false
  opts.on("-D", "--dry-run", "Identify what archives would be created") do
    $dry_run = true
  opts.on("-h", "--help", "Help for this command") do
    puts opts

def fail(message)
    puts "Error: #{message}"
    puts $opts

rescue OptionParser::InvalidOption
    fail $!

# If no --dir specified, use the current directory.

if $queue.empty?
  $queue << Dir.getwd

until $queue.empty? 
  search_directory $queue.pop

I do love Ruby syntax, it is so minimal, and lets me follow my personal mantra less is more.

I'm sure there's some edge cases that aren't handle well, such as spaces in path names and maybe issues related to permissions. But it works for me.

You do need to have tar installed, in order to build the archives. I can't remember if that's built in to Mac OS X (probably) or whether I obtained it using Fink.

In any case, you need to remember to execute prepsvn in your workspace, to spot file bundles that need archiving, before you check in. It would be awesome if Subversion supported some client-side check-in hooks to do this automatically.


Tobias Roeser said...

Wouldn't it be a better solution to copy the (hidden) .svn directory by side to work with the file-like directory and copy it back before commiting or updating.

Doing it this way there would be the following advantages:

- copying by side is much easier that packing and unpacking

- you will place each part of your virtual macosx file under version control separately, which allows you to better see the differences. In most time only the xml file will change but not the other content (e.g. in Omni Graffle the xml changes more often than some included tiff's)

- it is more natural for subversion to store (many) small text files instead of one huge compressed and therefore binary file, esp. as directories are versioned too.

When dealing with Open Document files with subversion I sometime try the opposite of your suggestion. I unpack the zip (odt) file and commit unpacked the directory to subversion. After checking out and before editing the file I have to compress it again, of course. Your script could come in handy for this case. :)

Jim said...

Hi Howard!

I was contemplating doing something similar with some presentations I'm putting together. I would probably base it on Rake.

Here's a simplified version of what you have:

PRESENTATION = "rake.key"

task :pre_commit => ARCHIVE

  sh "tar --create --file=#{ARCHIVE} --bzip2 #{PRESENTATION}"

It only does a single (hardcoded) presentation, but by basing it on Rake, I get the "update the archive only if a file has changed" behaviour automatically. Turning it into a full "find all docs and presentations" implementation like yours would still be about ten lines of code altogether.

Steven Grimm said...

I prefer to use a version control system such as git or Mercurial that doesn't pollute my subdirectories with a bunch of extra files; no special hacks required. (In the case of git, you can even use it to communicate with a central Subversion repository if you really need to have Subversion in the picture for whatever reason.)

Merlyn Albery-Speyer said...

My reaction was the same as Tobias's. Of course I'd probably do something silly like write it in Erlang..

mielvanacker said...

I also agree with Tobias. I keep Keynote bundles in subversion as directories.

To overcome the overwrite problem, I keep two copies. One copy is a subversion working copy, the other copy is the one I open with Keynote.

I then regularly import changes from Keynote-copy into the svn working copy with a custom script "wcimport", which you can find here: http://ssel.vub.ac.be/svn-gen/bdefrain/svnscripts/

The script basically makes the working copy exactly as some other copy, but it does not touch the ".svn" files in the working copy. I find it very useful in a general context, for example when following some upstream source.

neffs said...

you should escape the variables in the tar command

system "tar --create --file=\'#{archive}\' --bzip2 --directory=\'#{dir}\' \'#{bundle}\'"

now it works with filenames containing spaces