RailsConf 2009 Day Two

Day Two got off to a good start. Engine Yard did a promotional pitch — the speakers could have been a bit more polished, but it was interesting stuff about their one-button-deployment, and overall not bad for an advertisement.

Next up was Chris Wanstrath. He started with a lead in regarding how to become a famous Rails developer — focusing on yourself, your blog readership numbers, your twitter follower count, etc. Later, he talked about how he went from being an unemployed college dropout to co-founder of the very successful GitHub, due to sharing code. His point was that in his eyes, it’s better to focus more on the community: share code, contribute to open source projects, even write documentation for existing projects. Being a good developer trumps being a famous developer. The complete text of the talk is online here .

For the first session of the day, it was a tough call between Rack/Sinatra and Metric Fu. I finally went with:

Using metric_fu to Make Your Rails Code Better – Jake Scruggs

The central theme of this talk was how to use automated code analysis to direct you on where to spend your refactoring cycles. He used Carlin’s law (Anyone going slower than me is stupid; anyone going faster than me is crazy), but applied to programming. As your programming skills change over time, you see the same code differently.

He touched on coverage as a baseline that you should be doing as a part of your code analysis, then went on to complexity analysis, reviewing two tools available to analyze the “complexity” of your code: Flog and Saikuro. Flog examines your code (flog -g app for a Rails app) and gives you a (somewhat arbitrary) numeric range measuring the relative complexity of your code. Basically, 0-10 is awesome (and practically unatainable), 11-20 is okay for real world methods, and it goes downhill from there. If you have 200+ complexity scores, refactor immediately! Flog is somewhat opinionated about what is good/bad or more/less complex, but generally does a good job in helping you avoid the “icebergs”.

Saikuro gives a more concrete result — the “score” is the number of branches through a method, including tertiary operators, foo if/unless bar, etc. This is a plus over Flog, and usually indicates about how many tests you should have (one per branch). The downside to Saikuro is that it does not pick up on dynamically defined methods, where Flog does.

Next, we walked through a refactoring example, where Jake showed using high Flog scores as a hit list of where to refactor next. He also mentioned that better readability trumps lower complexity scores, one thing to keep in mind — as being generated by automated tools, the scores should be taken as a guide, not a law. A good point was brought up during Q&A: there is currently no way to “flag” a high-scoring method as acceptable, so if you have a justifiably complex method that you choose to live with (e.g. for readability), then you’ll have to live with the flogging you will receive. I’m sure patches would be welcome if someone wanted to fix this!

On to code smell and Reek and Roodi, tools to identify smells (overly large methods, etc.) Reek tends to warn over smaller issues than Roodi, and can indicate false positives. Roodi generally tends to have fewer complaints — if it warns about something, it should probably be fixed!

Next up was Flay, which detects non-DRY-ness in code, anything from strict copy-n-paste to functionally identical blocks with different variable names to do..end blocks matching curly-brace blocks.

Also covered was a way to track source control churn. At this point, you’re probably thinking “How can I keep up with all this?” Luckily, there’s metric_fu, a way to wrap all this up into one package and get all this code analysis goodness in your project. Install the gem, then run rake metrics:all. For more info, installation instructions, etc., see: http://metric-fu.rubyforge.org/ Looking forward to adding this bag of tricks to our CI toolset.

Rails 3: Step off of the Golden Path – Matt Aimonetti

Matt started off with a history of programming languages and how Ruby came to be, including some of Matz’ core philosophies embodied in the language. Moving along to Rails, he talked about the growth of Rails and the desire for increased performance and options that led to the split between Rails and Merb. This led us in to the discussion of the current and future state of affairs for Rails 3.

Currently, as DHH mentioned in the opening keynote, there is no official release for Rails 3. However, much work has been done, and a direction / ideas are emerging that will be implemented once an official release is ready. These include:

improved performance
increased modularity
agnosticism
public api
mountable apps

Matt emphasized that there will be no drastic changes, and by default, rails app will generate a very similar application to what you would get today under 2.×. However, there will no longer be the idea of “the one true Rails way” of building an app — the framework will be less opinionated. However, you should go through a process of justification to see if you really need something different than the default stack.

Some of the options you will be able to choose from:

JavaScript frameworks, including jQuery, YUI, ExtJS, MooTools, Prototype, or the ability to write your own, and plug it in.
Different templating engines: HAML, ERb (this is already doable in Rails)
Different ORMs: ActiveRecord, DataMapper, SEQUEL, Hibernate, non-RDBMS stores like CouchDB, Tokyo Cabinet, etc.

At this point, Matt gave a demo of some of the nicer features in DataMapper, contrasted with ActiveRecord:

DataMapper re-uses existing Ruby object for both sides of a has_many / belongs_to relationship. In other words, if I load parent and child records from the database, and look at parent.object_id as compared to child.parent.object_id, under DataMapper, these will point to the same object automatically, while with ActiveRecord, these will be separate objects. (Note that inverse_of was recently checked into rails, which enables this in ActiveRecord as well )
DataMapper does automatic lazy loading as well as strategic eager loading, so in this scenario:

@parent = Parent.find(12345)
@parent.children.each do |child|
puts child.name
end

ActiveRecord would need some hints (:include => :children) to be added to the original query to avoid the N+1 iteration problem, where DataMapper is clever enough to figure that out and generate 2 SQL queries for you automatically.

The ability to have multiple repositories (which looks like it means databases), and a copy method on models to clone data from one database to another — one use case would be an automatic archive or backup process that copies data generated within the last week to a backup database.
Query Path, allowing more flexibility in SQL condition generation (WHERE name LIKE ‘foo’)
one potential gotcha that was mentioned: DataMapper does not support STI and Polymorphic associations as well as ActiveRecord does

Finally, he highlighted some options that would be available for even further customization, such as defining your own:

file structure
router DSL
request handling
But he voiced the opinion that the vast majority of Rails apps will not need anything like this — make sure your need justifies coloring outside the lines.

All in all a good presentation, maybe a bit much focus on DataMapper specifically. However, I personally enjoyed the DataMapper bits, and might have to try it out on a project, if it’s a fit.

Art of the Ruby Proxy for Scale, Performance, and Monitoring – Ilya Grigorik

I skipped out on the afternoon sessions, so my next talk was Ilya’s — never disappointing. Ilya spoke about EM-Proxy, his event machine based proxy. He gave good example code of how EM proxy could be used to implement transparent and intercepting proxies.

It started with an itch at PostRank, Ilya’s blog aggregation solution. An effective staging environment should closely resemble the production environment — the problem was that their production environment spanned nearly 80 (virtual) servers on EC2’s cloud. Spinning up that many servers just as a staging environment was an expensive proposition. Also, simulating production traffic then becomes a challenge, as you end up trying to store production logs and “replay” them into the staging environment. The way that they chose to solve the problem was to separate a group of the servers into a staging app server pool, set up a proxy that would transparently (to the end user) intercept incoming requests, send them to both the production and staging pools simultaneously, then return only the production response to the user, using the staging response internally for benchmarking, testing output, etc. With this strategy, the more static parts of the system (web servers, load balancers, etc.) can be shared across environments, and the staging environment is testing the part that actually changes (the application servers).

The first example code was a transparent proxy that simply forwarded a request from one port to another. Ilya built on this to show how you could dynamically alter request/response data on the fly. Finally, he built up to the original scenario: duplexing a single request across two (or multiple) backend servers, but returning only a specific response. As he mentioned in the talk, one strategy for servicing a specific request as fast as possible might be to send the request to all machines in your pool, then respond with whichever request completed first.

These examples were centered around HTTP requests, but he went on to show some other examples of how this is not protocol-specific: you are just dealing with data over a socket connection, so as long as you understand the underlying protocol, EM-Proxy could be useful. His examples showed SMTP proxies for accepting/rejecting incoming mail by email address and implementing a spam filter by forwarding the incoming mail to Defensio before passing it along to your real SMTP server. The final example was pretty clever: an implementation of EM-Proxy to reduce the memory overhead of beanstalkd by selectively delaying queue inserts based on the scheduled execution time — basically buffering far future jobs into the database instead of immediately inserting into the work queue.

Slides are available here: http://bit.ly/ruby-proxy

Another packed day at RailsConf 09, one more to go!