Sunday 10 February 2013

Moving source control from svn to git

Ok, git tricked me here by having a clone operation that isn't really a clone. In my book the result of a clone should be identical to the original. In the git world that's not quite true - the clone doesn't have branches like the original. Sure it has the branches tucked away somewhere inside the .git folder, and you can check them out from the remote and track them as a local branch if you need to, but if you list the branches available to you you'll find just master. Now, if you take it one step further and clone the first clone you made, the second clone seems to have no visibility of the branches at all...

Now, in general use, the no-branches-in-a-clone workflow makes a lot of sense - you don't want to know about someone elses branches by default, it would just get too messy. However when you want to clone a repository to move it to a new server (or in my case to move it from svn to git), the git clone operation is a problem.

Fortunately the solution is to run a fairly simple script to checkout each remote branch and track it as a local branch.

for remote in `git branch -r | grep -v trunk `; do git checkout -b $remote $remote ; done




I'm working in a repository that came from svn - hence the 'trunk' check. If you are pulling from git, replace the grep -v trunk with grep -v master.



To do the actual pull from svn to git is pretty simple using the git svn command


git svn clone "svn://server/repository"  "repository" -T trunk -b branches [-t tags]




Once you've done that, do the remote branch checkout script above to track all the remote branches. Note that it's best to access your svn repository via a network protocol (svn:// https:// svn+ssh://) rather than a raw svn repository on a local path - this is because git svn only understands really old svn repository formats, so will fail with repositories made by a recent version of svn.


I'm using gitolite to host my git repositiories on my linux box - that seems to expect raw or 'bare' git repositories. I used the standard gitolite methods to add a new repository, then deleted the newrepo.git folder that got created and copied the .git folder into its place. Fix the ownership/rights on the new folder if necessary (chown -R git.git newrepo.git ; chwon -R 700 newrepo.git) and you're done.



Friday 8 February 2013

Rewriting History - in an SVN Repository

I've been thinking about moving one of my personal projects from SVN source control to git. Mainly because of the offline commit ability in git which has been 'coming soon' for ever in SVN.

Git has a fairly nice import functionality to pull an SVN repository into git, but it's really designed around the standard SVN repository layout of /trunk, /branches and /tags. My repository doesn't look like that - for some reason (most likely it was the first time I had ever used SVN, many years ago) I created it without any of those conventional folders. Bad idea #1....

My SVN repo also has a lot of separate projects in it as folders at the top level - this is probably bad idea #2. Then, a couple of years back I needed to branch one of my projects, so I created a top level /trunk folder, moved all the folders into that, created a top level /branches and made a branch there. Bad idea #3 I'm thinking.

So, my repository is a mess and the project I'm most interested in - SharpCap - can be found in /trunk/SharpCap and /branches/SharpCap/[branch]. Yuck.

Maybe I could pull the whole lot into git and prune and chop it into shape afterwards - however my git-fu is still weak and I'm not going to try that yet. Instead I decided to try to rewrite the history of my SVN repo to make it fit the standard layout more nicely. Of course I'm doing this on a *copy* of the SVN repository, not the real thing.

Firstly, I'm working on Linux - Ubuntu 12.04 LTS to be precise. I expect all the steps below can be run on Windows, but you'll be messing about installing perl and python and goodness knows what else - easier on Linux by far.

Anyway, first the tools for the job :

The svndumpfilter tools you'll just need to download the script files - svn-dump-reloc can be installed via cpan - install perl via apt-get if you don't have it, then run cpan, work through the initial questions and then do install SALVA/SVN-DumpReloc-0.02.tar.gz.





The basic technique is to use svnadmin dump to dump the repository to file, use one of the filter tools to modify the repository, then use svnadmin load to bring the modified dump back into a new repository. So, assuming that you repository is at /home/svn/myrepo, you might do this...

svnadmin dump /home/svn/myrepo > /tmp/myrepo.dmp
cat /tmp/myrepo.dmp | svndumpfilter3 --untangle /home/svn/myrepo /path/to/keep/in/repository > /tmp/filtered.dmp
mkdir /tmp/filtered && svnadmin create /tmp/filtered
svnadmin load /tmp/filtered < /tmp/filtered.dmp

What's going on there? First we dump the original repository to a file, then we pass it through svndumpfilter3 to only keep a particular path or paths, then load back into a new repository. svndumpfilter3 needs to know about the location of the actual repository the dump file came from - in some cases it goes back to the repository to dig out extra information to help it deal with moves, copies, etc in the repository.

So, in my case the svndumpfilter command is

cat /tmp/myrepo.dmp | svndumpfilter3 --untangle /home/svn/myrepo /SharpCap /trunk/SharpCap /branches/SharpCap > /tmp/filtered.dmp

This pulls out the bits of the repository I'm interested in and throws out the rest. 

Now, it's not quite as simple as it looks to reload this dump into a repository - if you try it, you'll find it just fails. This is because we haven't included the creation of /trunk or /branches in our filter, so the first revision that tries to do something into one of those folders will fail to load because the folder is missing. You'll get an error like this :

svnadmin: File not found: transaction '307-8j', path 'branches/SharpCap'

Here's how to step around that by creating the parent folders first.

mkdir /tmp/filtered && svnadmin create /tmp/filtered
svn mkdir -m "make trunk" file:///tmp/filtered/trunk
svn mkdir -m "make branches" file:///tmp/filtered/branches
svnadmin load /tmp/filtered < /tmp/filtered.dmp


Now, while svndumpfilter3 seems to be the best choice for pruning the repository (it gets confused much less often than svndumpfilter2 or the original svndumpfilter), it doesn't have an option to drop empty revisions from the dump file. If you've pruned out a significant chunk of a repository, you'll most likely want to get rid of those, and this is where svndumpfilter2 comes in handy.

cat /tmp/filtered.dmp | svndumpfilter2  --drop-empty-revs --renumber-revs /tmp/filtered trunk branches SharpCap > /tmp/renumbered.dmp
So, what we've done there is reload the 'filtered' dump into a temporary repository - this is because we need the repository for svndumpfilter2 to work with - and then process the dump again to drop the empty revisions. By specifiying 'trunk' and 'branches' and 'SharpCap' to svndumpfilter2 I have told it to include everything in the source dump (add tags too, if you use those), so I'm just using it to renumber revisions rather than filter anything here.


 With me so far? Good... Now for the tricky bit - we need to re-arrange the history of the repository folder structure. Basically the mapping I want to do is as follows:

/SharpCap -> /trunk
/branches/SharpCap/<branch> -> /branches/branch

First try is just to use svn-dump-reloc three times - ie.

cat /tmp/renumbered.dmp | svn-dump-reloc '/trunk/SharpCap' '/trunk' | svn-dump-reloc '/SharpCap' '/trunk' | svn-dump-reloc '/branches/SharpCap' '/branches' > moved.dmp

This should move any item that was historically in /trunk/SharpCap into /trunk, and the same with anything in /SharpCap. The final bit should move anything in /branches/SharpCap into /branches. Unfortunately the dump won't load. 

The reason the dump won't load is that I have a very interesting revision in it right now. Before the relocation, that revision used to say 'copy the contents of /SharpCap to /trunk/SharpCap and then delete /SharpCap'.  After the relocation it now says 'copy the contents of /trunk to /trunk and then delete /trunk'. Ooops. In the actual dumpfile, the revision looked like this :

Revision-number: 149
Prop-content-length: 126
Content-length: 126

K 8
svn:date
V 27
2010-09-14T19:39:50.182787Z
K 7
svn:log
V 26
move main folders to trunk
K 10
svn:author
V 5
robin
PROPS-END

Node-path: trunk
Node-kind: dir
Node-action: add
Node-copyfrom-rev: 148
Node-copyfrom-path: trunk

Node-path: trunk
Node-action: delete

So, of course the next revision after that one that tries to do anything to /trunk fails with the old
svnadmin: File not found: transaction '159-4f', path '/trunk'
 error. Basically all I needed to do was to get rid of this revision from my dump file - vi was quite sufficient to do that on my (~150Mb) dump. For bigger dumps you might need to use a more powerful editor or work out another way to remove it - the svnadmin dump command allows you to specify a revision range, so you could load the renumbed.dmp file and then dump it in two parts -r 0:148 and -r 150:HEAD sort of thing, then cat the two together.

So, finally I have an svn repository that has 'always' had the structure I want. Now to try loading it into git...