Switched to SVN

We made the switch from CVS to SVN at OSAF last week and I have to say that so far it's been fairly un-eventful. Not to say it was an issue-free conversion, we had more than our share, but they have all been either foreseen or manageable.

Like all good conversions, this one started with research, a plan, testing and then finally a good dose of patience :)

Let me apologize in advance for the length of this journal entry -- unfortunately it's a lot of info to cover! At the very end is an example of the conversion script I ran.

The Research

I've known about the cvs2svn tool for a while now and I've even used it on a couple smaller (i.e. personal) projects before, but that was a while ago so I figured I would grab the latest and read it's docs. Wow, this tool has really "grown up" -- heck, it even has it's own tigris project page now.

I also googled the heck out of all the various combinations of "CVS to SVN conversion" I could think of and found, surprisingly, not that many hits. The ones I did find that had more than just a mention of the fact that they performed a conversion all seemed to use either home-grown script (not a path I wanted to take to be honest) or cvs2svn. I took that as a good sign for the cvs2svn tool :)

After doing a cursory reading of the cvs2svn --help page, I ran it against a backup of the CVS repository to see how painful it was going to be :) and was pleased to find out that it seemed to be complaining about things in the CVS repository that needed to be complained about - more later in the more detailed part of the journal.

After the initial testing I started on the plan, something I figured would be basic - right. How hard can a conversion be?

Ahem

Well

Let's just say that any reasonably large repository will have three basic things you need to plan for:

  1. The data itself -- cvs2svn takes care of that
  2. The people checking into the repository -- frequent emails and links to SVN clients
  3. Non-people uses of the data

That last one is what caused the most issues I will be honest about. Even after making the plan and doing test runs I still found some external processes that were consumers of the repository that needed to be tweaked.

Testing

I've jumped to the testing part because in some ways it's the easiest :) and also I wanted to leave the planning part for last.

Testing comes down to doing the following steps over and over until you have no question marks left in your notes:

  1. Run cvs2svn on a backup copy of your CVS repository
  2. Create and run SVN load on the generated dump file (reasons for using a dump file come later - trust me its easier)
  3. Check-out the same branch/tag from both CVS and SVN into fresh directories
  4. Compare all files between the two (removing .svn/ and CVS/ directories)

Once you have this basic conversion process step down cold you are ready to finalize the plans on how your SVN repository will look.

Planning

Subversion requires a different way of thinking about the items being checked into the repository basically because of how it handles branching and tagging. We chose to go with the common Trunk/Branches/Tags layout and that meant I didn't have to come up with any elaborate importing or post-import scripts to move things around.

Because our CVS repository was rather large and included a number of sub- projects, we decided to move the sub-projects into SVN repositories of their own. This necessitated an interesting pre-conversion step. My first thought was to run cvs2svn on each root folder of the CVS repository and generate dump files for each one and then just svn import them into the appropriate new SVN repository. Sounded good on paper :)

But it seems that the svn import utility will not let you import different dump files into an existing repository except in a specific location. It's easier to show with an example than it is to say it in words:

This is how I wanted it to look:

        trunk/

            module1/

            module2/

            module3/

        branches/

            module1/

            module2/

            module3/

But after you run your first "svnadmin load /repos/path < foo.dump" you get this:

        trunk/

            module1/

        branches/

            module1/

and then when you run "svnadmin load /repos/path < bar.dump" you get this error "adding path : trunk ...svn: File already esists: filesystem..." -- basically telling you that you already have a root directory named trunk. My first thought was "well duh - so do it anyway!" but then I realized that most likely this was to prevent someone from loading a dump of the same structure on top of an existing repository.

After fooling around for a couple minutes with the parent-dir parameter I realized that the simplest way was to create the dump file with the directory structure we wanted and then run cvs2svn on it to create a single dump file. Turned out to be as easy as mv'ing the appropriate module directories into a new tree and cp'ing the CVSROOT directory so cvs2svn would be happy.

Ok, so that takes care of step 1 above -- well, ok, not quite but enough for me to continue. Step 2 was really easy - just point people to the home page of the subversion project -- it already has quite a few links for the many clients that are available.

That leaves step 3 ...

Let me just outline the items I finally ended up with on my todo list:

  1. pre and post commit scripts for CVS that had to be converted to SVN
  2. converting cvs keyword replacement to svn's property settings
  3. converting .cvsignore to svn:ignore properties
  4. upgrading viewcvs to handle SVN
  5. "upgrading" tinderbox clients to deal with SVN
  6. "upgrading" bonsai to know about SVN
  7. finding all the cron jobs that use CVS or other tools to do their work
  8. cleaning up file and directory permissions in the SVN repository
  9. cleaning up the left-over CVS files in the repository

Item 1 -- pre and post commit scripts

Subversion comes with excellant templates that make life much easier for this task. At OSAF we had one of each to deal with. The pre-commit script we run is a custom program that checks to make sure all .py files have no tabs and also that all files use linux line endings. Once I had converted the existing program over it was just a matter of telling svn to run it:

      #!/bin/sh

      REPOS="$1"

      TXN="$2"

      SVNLOOK=/usr/bin/svnlook

      python /svn/scripts/notabs.py $REPOS $TXN $SVNLOOK || exit 1

      exit 0

The post-commit script we previously used fed cvstoys the information needed to generate a mailing list message about the checkin. Since all we were using of cvstoys was the notify part, I was able to actually simplify my life by using the most excellent SVN::Notify perl package and I set it up to run like so:

        #!/bin/sh

        REPOS="$1"

        REV="$2"

        AUTHOR=`/usr/bin/svnlook author -t $REV $REPOS`

        /usr/bin/svnnotify -p "$1" -r "$2" -l /usr/bin/svnlook

            -P "[commits] ($AUTHOR)" -d -H HTML::ColorDiff

            -t "commits@osafoundation.org"

The only custom part to the above is at OSAF we were used to having the commiter's nickname appear on the subject line so I grabbed that information from a call to "svnlook author" and injected it into the -P argument.

Item 2 -- cvs keyword replacements. This turned out to be a non-issue -- the latest version of cvs2svn takes care of the more common ones (ID, Author, Revision) already! But just in case someone reading this still needs to know how to do it, this is what I had planned on needing:

      egrep -rl '$Id: '       * | grep -v /.svn/ | xargs svn propset

svn:keywords Id

      egrep -rl '$Author: '   * | grep -v /.svn/ | xargs svn propset

svn:keywords Author

      egrep -rl '$Revision: ' * | grep -v /.svn/ | xargs svn propset

svn:keywords Revision

Item 3 -- converting .cvsignore to svn:ignore properties. Another non-issue as the latest cvs2svn performs this task for you, but unlike Item 2 I hadn't worked out exactly what I was going to do so I don't have a sample to show.

Item 4 -- upgrading viewcvs to handle svn. Again a non-issue - the latest viewcvs handles both svn and cvs. Another reason to not be the one with the arrows in your back :)

Item 5 -- "upgrading" tinderbox clients to deal with SVN. This one required mostly brute force: walking thru the code and replacing CVS with SVN as the SVN folks have made most of the common CVS commands the same in SVN. The biggest issue I had to deal with was that all of the tinderbox client code worked with the -D option for a checkout to ensure that the checkout received only those items for that moment. Remember that you could possibly get a file from a SVN checkout request that was part of a whole slew of changes but the server just hasn't received yet as CVS commits are not atomic. Now that SVN are atomic transactions that issue is gone.

Item 6 -- "upgrading" Bonsai to know about SVN. This is currently a work in progress -- I'll post another journal entry when I've finished this task.

Item 7 -- finding all the cron jobs that use CVS or other tools to do their work. This is the part that has been the most fun. I managed to get most of them on the day of the switch, but I'm sure there are some that only run weekly or the like that I haven't found yet.

Item 8 -- cleaning up file and directory permissions in the svn repository. Since we run our svnserve process not as root but as a restricted user, many times the permissions of the repository have to be cleaned up immediately after the import:

        chown -R apache:svn /svn/repository

        chmod -R g+w /svn/repository

Item 9 -- cleaning up the left-over CVS files in the repository. After the conversion has "settled", take the time to run thru and svn delete all of the .cvsignore files and any others that may be left-over.

Sample Conversion Script

        echo Backing up CVS repository

        tar czf /svn/work/cvsrep_backup.tgz /usr/local/cvsrep

        cd /svn/work

        tar xzf cvsrep_backup.tgz

        echo dumping

        cd /svn/work/cvs2svn-1.2.1

        ./cvs2svn --dump-only --use-cvs --dumpfile /svn/work/chandler.dump

/svn/work/usr/local/cvsrep

        echo creating new repositories

        svnadmin create --fs-type=fsfs /svn/chandler

        echo updating owner and privs

        chown -R apache:svn /svn/chandler

        chmod -R g+w /svn/chandler/db

        echo loading

        svnadmin load /svn/chandler < /svn/work/chandler.dump

Mentions