Thursday, February 21, 2008

Ruby (vs Python): a preliminary assessment

I've been using Ruby for a little over a month now. While this hasn't been enough time to explore the depths of the language, it has given me ample time to evaluate how I feel about some of the more superficial aspects of Ruby. There are some things I like about it, a few things I don't care about, and a lot of things I don't like. In some aspects it is quite similar to Python, my favorite programming language. In the following sections I lay out what I do, kinda, and don't like about Ruby, sometimes in comparison to or in contrast with Python to put things in context.

Things I like about Ruby


Naming conventions are enforced The less surprises there are in code, the easier it is to read. I would be very surprised, for example, if a C VARIABLE_IN_ALL_CAPS were not a constant. It only takes one joker working on code to completely screw with convention and degrade readability of code for everyone else working on it. Ruby heads off this problem at the pass. Instance variables look like @this, class variables look like @@this, constants look like THIS, global variables look like $this, etc. This is a Good Idea. I can easily live without the flexibility to name my class h0We\/ER_I_want. This also makes the code easier to parse and syntax highlight.

Block Comments I'm not sure how Python missed the boat on this one. The "large chunks of code shouldn't be commented anyway, so take them out" and "get a decent text editor that can :50,70s/^/#/" and "that's what doc comments are for" school of thought just doesn't resonate with me. Too often there is some large chunk of code lying around that just isn't applicable anymore and quickly commenting it out is a temporary solution.

Constants There is just something about the assurance that certain variables are read-only that makes me sleep well at night. Otherwise, I need to worry about things getting overwritten, and that's no fun. I guess there's always CONVENTION to prevent this in Python, but that doesn't stop someone from breaking CONVENTION += 1

Examples are the norm in documentation Examples are the quickest way to learn about new code. However, the docs could benefit from some non-example elaboration most of the time. How is one supposed to figure out corner cases? I guess "examples first" goes along with the whole Ruby-has-no-formal-spec thing.

'case' syntax The syntax of Ruby's case statement (analogous to switch in other languages) is pretty rad. It's a great feature for when multiple properties of one variable need to be tested. The case statement accepts comparisons, ranges, regular expressions, etc. Very cool.

Things I could go either way on

Regular expressions built into the language Regular expressions are cool, and I use them a lot, but I just don't know if they have a place amongst the pantheon of hashes, lists and the like. They are convenient, but overuse might turn your code into *gasp* Perl. No one likes unreadable ASCII-splatter, although it's tough for Ruby to get that bad. Plus, half the time I am using regexes some other function might do just as well, like rindex. Maybe that's just my laziness that Ruby caters to.

Interned strings It's like a pointer, I get it. It saves memory over using a string object, cool. Was it really worth including in the language? Not sure. It seems like their most common uses, such as keys in hashes, could be implemented under the hood anyway (well, at least in a real language like Python where hash keys are immutable).

! and ? as legal characters in method names I can understand that does_this_work? is a natural way to read that a function returns a boolean. I get that destroy! suggests an internal change of state of the object. But then there's the problems. First, not every method that returns a boolean or changes an object's state is forced to use these constructs. If they were enforced like all-caps constants and @ before instance members, maybe I'd be more bullish on them. Second, what qualifies as a change of state can get kind of messy: maybe a method only conditionally changes its object's state; is an exclamation mark appropriate then? The exclamation mark can also be misread, especially if parentheses are omitted (looks an awful lot like a logical not operator, doesn't it?). And now for my nitpicky objections: ? and ! are harder to type than letters and vim doesn't auto-complete the full method name by default (it completes everything but the last character) if one of those characters is included.

Marriage to Rails On the one hand, it's nice to have a large support community for tools that one is likely to use. On the other hand, it's annoying to wade through a bunch of comments by people on forums and mailing lists that are completely irrelevant to what you're trying to do with the language. These people are normally exclusively concerned with getting their website up and running, aren't terribly gifted programmers and don't really care about the language per se. Some bloggers assume that any talk about Ruby is in the context of Rails because... well, isn't that what you're using it for? What else could you possibly do with Ruby?

Indentation Python has a unique indentation system. In most cases (Makefiles) whitespace indentation is a horrible idea, but Python manages to make it work. It reduces noise on the screen and is pretty intuitive. I'm not convinced that Python wouldn't be more effective, however, with a traditional bracket-defined indentation scheme which Ruby is close to. It's a toss-up for me.

Things I don't like about Ruby


TMTOWTDI There's more than one way to do it in Ruby, alright -- you're smothered with options. This is the dark path that leads to Perl. Take looping as an example. In Python, you have for and while. That's it. In Ruby, you have for, while, times, upto, downto, begin/end and each (there's probably more that I missed). What's wrong with for? It's really difficult to be intimately familiar with this many options. And this symptom doesn't just plague looping, it also plagues the standard library. Do we really need every one of <<, push and + to append to the end of an array? Why have both brackets and "end" to enclose a block? Has anyone ever used all three versions of defining a class method in any mood but malice and spite? I guess we can all take solace in the many ways in Ruby via active_support to compute the amount of bytes in a gigabyte: 1.gigabyte, 1024.megabytes, 1048576.kilobytes, etc.

Go with the Flow There is a prevailing attitude with Ruby users, especially in Rails, to favor convention over configuration. This means that tools that espouse this philosophy reward going with the flow and punish breaking out of the mold and customizing your applications. My attitude towards this philosophy is this: unless you are creating the most pedestrian and generic of applications, you are going to need to customize the application at some point. When that happens, it is much better to have a tool that allows flexibility than one that disdains it. Otherwise you are going to need to go down to the core of the tool to make minor changes to your application, and that will cost you in time and sanity.

Hashes are broken In Python, it's impossible to change the keys in a dictionary because those keys must be immutable when they go in. Python also nicely throws a KeyError when something you're looking for in a dictionary is not there. Ruby does neither of these things; the programmer is expected to maintain the immutability of hash keys. High-level languages are supposed to take care of important details like these for us humans who are busy thinking about other things. In fact, the Python FAQ lists this as an "unacceptable solution" for an alternative implementation of hashes: "Allow lists as keys but tell the user not to modify them. This would allow a class of hard-to-track bugs in programs when you forgot or modified a list by accident." Default return values aren't that much of a design travesty, but I still prefer the error. Besides, if you really want a default return value in python, you can just use setdefault when accessing the element that may or may not be there instead of setting a default value for the whole hash at creation.

Ranges are broken This absolutely baffles me. Why did Ruby not choose to incorporate Python's elegant slicing syntax and instead chose a less powerful and more confusing one? Why did Ruby choose to create an unnecessary class called Range when they just could have had a function called, oh I don't know, range that is much more powerful, simple and general and that returns lists/arrays? Why do you need to call range.to_a to convert your useless Range object to an array in Ruby when they could have just used a function like range and gotten to the point? Furthermore, why are there two ways to create ranges, .. (two periods) and ... (three periods) that look EXACTLY THE SAME! Argh.

Optional omission of syntax elements Sure, it's less typing in Ruby, but remember what I said earlier about less surprises equaling more readable code? It holds here as well. Is this_thing a variable or a function call? This also excludes the convenient syntax for introspection that calling function() without parentheses in Python affords. And the fact that sometimes you need the "do" keyword and sometimes you don't is also a design decision that just begs for unnecessary tripping over the syntax.

Inline control flow constructs When I first saw these, I thought "Wow! I can now do in one line what previously took me three in Ruby!" And then I started reading a lot of lines and wondering why they weren't executing. And then I read those lines a bit further and saw that they had conditional statements tacked on to the end of them. And then I vowed to never use them again. One example of how this can go bad is tacking on "and return" to the end of a long statement. If you're looking at the code, and not scrolling right far enough (and your lines are that long that you need to scroll, which may or may not be a good idea), it looks like the code falls through after every statement instead of returning. Confusing.

Different names for the same method find_all? What the hell does that do? Oh, it's the same as select. Collect? What's that? Oh, it's the same as map. Too bad I only found these things out after going to the documentation. Why can't one function just have one name, so we can all have a common point of reference? Oh, right, because Ruby "allows you to program in your natural language." Which means confusing everybody else, including myself.

Different precedence for seemingly identical operators ! binds more tightly than not. && and || bind more tightly than "and" and "or". This is incredibly confusing; why not just make them the same precedence? Changing one form to the other form when it is right near another operator whose precedence is between the two could alter the control flow of your code.

Inability to override = (and non-obvious performance gotcha) There is some fine print to go along with the "every operator is just a function" propaganda. One annoying exception to the rule is the = operator. Now, why would anyone want to override the equals sign? First let me point out that it's not just =, but also -=, +=, *=, etc. that also are incapable of being overridden, so it affects a substantial set of important operators that are literally expanded to x = x + y, etc. Now here's the kicker: these methods are orders of magnitude slower than their true method counterparts for arrays and strings. So I want to override += for arrays to have it call concat or << which are both efficient as a way of aliasing all +='s in the existing codebase. In C++ and other languages, this would be quite possible (never mind that you wouldn't be facing this problem in C++ to being with) but in C Ruby, a design decision has been made that no one can override the sacred = and therefore coders will inevitably trip over this performance gotcha. Why? I'm not sure... maybe this or that post may offer some insight. In any event, don't use += and its cousins for anything but numbers or risk serious performance hits (changing one to the other is further complicated by their difference in precedence... both behave differently within a conditional operator, for example. Grrr.)

Confusing operator overloading << is a way to print things out (a la C++), define a class method, and append to the end of an array. Only the last one makes sense to me; the other two seem out of place. Remember what I said about less surprises = better code readability? It's beginning to become quite a theme...

No list comprehensions [heavy_sigh for each_time_I_can_not_do_this in Ruby if I_am_particularly_exasperated] I guess there's always select and collect. Too bad I need to decide if I want all the items in the list or only a subset before I can select the correct method to use (instead of at the end in Python, where I can insert if statements if necessary). I also haven't discovered a way in Ruby (yet?) to do the related task of a one-line instantiation of a hash. In Python, this would be d = dict((x, None) for x in [1,2,3])

Perl-like operators << <=> and =~? Sorry, I just can't get into it. Keep it simple, please. Function calls with letters are good.

Lists and hashes are printed strangely puts {1=>2,2=>3,3=>4} gives me 122334 in Ruby. WTF? I thought I was dealing with a hash, not a number. print {1: 2, 2: 3, 3: 4} gives me {1: 2, 2: 3, 3: 4} in Python. puts [1,2,3] gives me numbers on separate lines in Ruby. print [1,2,3] gives me [1, 2, 3] in Python. The seemingly small detail that data structures are printed out exactly how they are written in code makes a world of a difference.

Strings are unintuitive Strings are iterated over by word, not by character in Ruby: "one two".each {|x| puts x} in Ruby outputs "one" and "two" on separate lines. Compare this to for i in "one two": print i in Python, which prints out a character on each line. To iterate over each character in a string in Ruby, you'll have to use each_byte (in which case you'll find that Ruby will print out numbers unless you convert the numbers with .chr... this problem also occurs when selecting a character from a string, as in str[5] even though the docs say otherwise) or each_char.

Bad decisions on naming Why is there a built in variable called $DEBUG? Doesn't everyone have a global variable called that in their program at some point or another? Well, overwriting this variable unknowingly will cause your code do very strange things, like disregard begin/rescue/ensure blocks. I found this out the hard way. Python's is __debug__. Who is going to create a variable called __debug__ in their program? Nobody, that's who. The closest thing Ruby has to a constructor is called initialize. Why so long? __init__ works fine, and programmers who failed literature can actually spell it.

Shortcuts for everything "p" for print. The attr family of keywords for attribute access. Why not just create usable constructs and stick with them, instead of inventing unwieldy constructs and then working around them with shortcuts?

Hype Ruby is definitely the flavor of the week in programming languages. This results in a bunch of n00bs flocking to the language when they want to learn to program. Now, I have nothing against people who want to learn how to program. In fact, I'm thrilled that anyone wants to take a little time to learn the art. But the experience of a community has benefits: better tools and better discussions. As a result the conversations in Ruby forums and mailing lists are generally of a lower quality than that on Python mailing lists which tend to be frequented by more knowledgeable, veteran coders.

Japanese Origins As an English speaker, I appreciate that most of the cutting-edge blog posts and PEPs and what not are in English. Ruby originated from Japan, so this is not the case there. It's frustrating for me to see a link like "the solution to your problem is here" only to be led to a page which to me might as well be in... well, Japanese. I am sure I would feel differently if I were Japanese, of course.

OSX/TextMate-centric community A large number of Ruby users use Macs. Not sure why, that's just the way it is. They also seem to like TextMate. Me? I hate Apple and love vim. The more people that use the tools that you use, the higher probability that someone has already solved the configuration problem that is bugging you. Although I will note that the default settings for Ruby in vim are very nice... there's at least one knowledgeable vim user writing Ruby out there.

Younger Language Python has been around since the late 80s. Ruby has been around since 1993 (both for certain definitions of "around"). Python has a head start on Ruby in terms of developer time. A more mature language is likely to have fewer bugs, more features, less experimentation and and more users. The same goes for libraries written for that language: a greater number of better-implemented libraries will be available to users of a language that has been around longer. All other things being equal, the older language has to win the stability and reliability battle.

Unfamiliar keywords Instead of try/catch (or except)/finally, Ruby has begin/rescue/ensure. Instead of continue, Ruby has next. Instead of else if or elif, Ruby has elsif. Instead of switch/case, Ruby has case/when. I guess every language commits this foul to some extent, but it's just one more thing to trip over.

Implicit returns Another Perlism that I despise. To me, a function looks like it doesn't return anything if it doesn't have the "return" keyword in there at some point. It's much easier to read (and maintain) code if the points at which a function could return something are explicitly outlined.

Things I don't have an opinion on... yet

Introspection/Reflection Haven't really dug that deeply into what Ruby can do here yet...

Package management The gems system vs letting apt-get do everything. Neither has failed me in any way thus far, but then again, I haven't really tried to do anything remotely complicated with either and I'm not a sysadmin presiding over a myriad of different configurations...

Threading Ruby doesn't support native threads. What are Python's options? This looks like a good article and what about Stackless Python?

Object Oriented stuff Namespaces, modules, Inheritance, etc. I'm sure there's some Python vs Ruby flamewar going on out there, I just have to find the thread...