Friday, April 04, 2008

Searching SVN trees with Ack (for Ruby on Ubuntu)

Searching through a subversion tree with grep is a pain in the ass. It wastes time searching through files I don't want to look into and its syntax is inconvenient for the task. Fortunately, there is a tool called ack which is suited specifically for matching text in VCS trees (including svn trees). Here's how to set it up on Ubuntu Gutsy (apparently you won't need to do all this in the next Ubuntu release; there's a package for Hardy).

First, you're going to need some relevant libraries. sudo apt-get install perl-doc to start. Next, download the File-Next archive from CPAN. Extract the archive and cd into the directory, then use the series of commands perl Makefile.PL && make && make test && sudo make install to install it.

Now for ack itself. svn checkout http://ack.googlecode.com/svn/trunk/ ack-read-only to checkout the source tree from Google Code and cd into the directory. Now if you want to edit some of the types that are built into the program, here's where to do it. For example, I want to also search for haml files with the --ruby flag. I grep -R rhtml * to see that four files (actually, less, but let's go ahead and edit all of them anyways...) require modification. The ruby line in Ack.pm, for instance, now reads ruby => [qw( rb rhtml rjs rxml erb haml )]. (Types can also be set on the command line using the --type-add and --type-set flags). Once you're finished tweaking types, use the same perl Makefile.PL && make && make test && sudo make install command to install ack.

For extra credit, you can also add flags to a ~/.ackrc file that get executed upon each invocation of ack. Personally, I add the --ruby flag because it seems to be that .rb, .haml and .rhtml files are all that I'm searching for these days.

And voila! Now a simple ack -f gives me a listing of all the ruby files in my svn tree (and no vi swp files, svn files, log files, etc.). ack "regex" returns a grep-like search of the relevant parts of the subversion tree with the parts of the line that matched the regex highlighted. SWEET! Get more information on the command line with ack --help for more help, ack --help type for a listing of matching types.

Do you know how hard this would have been with find/grep/xargs? Well, tough. grep -RE "regex" * | grep -v svn | grep -v log | ...etc... searches all files in the tree and might exclude matching lines if the contain one of the strings being filtered out by the later grep instances. That syntax can be improved to a alternating regular expression with grep -RE "regex" * | grep -v -E \(log\|svn\) but it still has the same problems. grep also has an --exclude flag which can exclude certain filenames that match a glob from being searched (although the manpage would seem to imply otherwise), but it doesn't have regex capabilities. So maybe we can use find and xargs? Nope, a svn tree output will kill xargs, which will compain that the "argument line is too long" if too much gets sent to it: grep -E "regex" `find . -regex '.*\.\(rhtml\|rb\|haml\)' | xargs -0` Perhaps find's --exec flag could be of use? Well, only if you want to kill your computer with the overhead of creating a new grep process and forking and context switching a couple times (etc.) upon each file match... find . -regex "regex" -exec grep -E "regex" '{}' \; will really do your cpu in. Then there's the cumbersome grep -E "regex" $(find . -regex "regex" | grep -vE \(log\|svn\)) ... well, let's just say I'm glad I found ack at this point.

UPDATE:
Put --type-add=ruby=.haml in your .ackrc to add haml as a type dynamically.

1 comment:

Wolf said...

Regular expression is really wonderful to parsing HTML or matching pattern. I use this a lot when i code. Actually when I learn any new langauge, first of all I first try whether it supports regex or not. I feel ezee when I found that.

http://icfun.blogspot.com/2008/04/ruby-regular-expression-handling.html

Here is about ruby regex. This was posted by me when I first learn ruby regex. So it will be helpfull for New coders.