On the Inception of the Ruby Object System, at ChicagoRuby

Class, superclass, metaclass, singleton class, eigenclass, class <<self, class Class, Class.new. We all know that Ruby is an object oriented language, and specifically it is a class based language. One of the strengths of Ruby is that it can be extremely usable without having to really understand what’s going on behind the scenes. Drift from the happy path, however, and you can find yourself wandering in a dream-world of class<<self or trying to grok the difference between ‘include’ and ‘extend’. This talk will try to explain why we have classes and metaclasses, and how it all hangs together (and a bit about the movie Inception, to keep things from getting too boring.)

Comments Off

Debugging the Rails Asset Pipeline with Heroku Buildpacks

When I was working on a Rails 3.0.3 application trying to speed up the user experience, tools like YSlow and PageSpeed kept tell me to do things that would have been so much easier in the asset pipeline – combine, minify, and so on.

In my current 3.1.1 application, I’ve found that assets can be really slow in development mode – probably why they optimized dev mode for 3.2. One solution involved tracking down some misconfigured static asset serving in my rails engines. Another step involved turning off a lot of autoloading during asset serving.

Debugging the Asset Pipeline on Heroku

There are a number of challenges to run the asset pipline on heroku, especially when you insist on running an unconventional rails layout, and requirejs-rails to boot. The biggest challenges are that the error messages are awful, and the uniqueness of the environment, which means that local testing of the asset compile step only gets you halfway there. You also have to get the precompile step to run at all.

Get it to compile locally

I skipped this step at first. There are number of simple things which could be wrong – such as having config.assets.compile or config.serve_static_assets off. While I have no reason to run precompiled assets locally, it’s still possible to run rake assets:precompile.

Start running the rake tasks from the rails directory to make sure the basic process is working. Once that runs, run rake assets:clean and try again.

Get it to compile from the repo root directory

I ran into a lot of trouble here because the asset precompile process actually shells out to new rake instances, which start from scratch, context wise. Since I had made loading rails conditional in my base rakefile to make running other tasks faster, this didn’t work out of the box. I set up the asset tasks to depend on a task which changed the directory. Then when rake shelled out, it ran in the rails directory and picked up the standard rails rakefile.

Compile with a bogus database address

If you have set up your project to use Heroku-style DATABASE_URL by default, this is just a matter of passing the right variable.

DATABASE_URL=postgres://foo/bar rake assets:precompile

Sometimes code will connect to the database during precompile, but Heroku doesn’t have a database available at that time. I ran into this problem with ActiveAdmin I had to add a hack to prevent loading ActiveAdmin routes in routes.rb (fixed upstream)

break if ARGV.join.include?('assets:precompile')

Get asset precompilation to TRY to run on Heroku

My application was coming up as a Ruby/Rack application. This was enough to get it to run, but asset precompilation wasn’t running, which broke some pages. The odd directory structure was throwing it off, so we’ll need a few concessions to standard rails directory structure. But which ones? Reading the Heroku Ruby Buildpack source tells us that it needs two files to exist:

config/environment.rb (to show up as Rails 2+)
config/application.rb (containing Rails::Application, to also be Rails 3+)

Debug asset compilation with a custom buildpack

At first the error I was getting looked like a Ruby syntax error. The only sense I could make sense of it, however, was as a Javascript syntax error. Unfortunately, the reference was to a line number in a combined file, before minification destroyed the line numbers. This can be worked around by temporarily setting config.assets.compress = false in production.rb. Now I could see that javascript was choking on Coffeescript which hadn’t been translated.

But what is actually going wrong? Some information can be gotten by running rake with --trace – but we don’t control the precompile command on Heroku. However, we can.

On the latest Cedar stack, Heroku has built a much more versatile platform, that can and does run systems other than Rails and Ruby. This process is orchestrated by buildpacks, which literally run the slug compile by calling a small number of programs, which then do pretty much anything they like. The Ruby buildpack is conveniently written in Ruby.

Since the buildpack is just code, we can modify the asset call to include the --trace flag. (You could use my ruby buildpack fork, but I might make more changes at any time, and heroku appears to use it ‘live’.

Heroku’s platform has multiple buildpacks. Each includes a detect script that tries to determine if a repo is appropriate for it. I don’t know whether a custom buildback becomes exclusive, or just get’s first crack at the project. In any case, it appears that the only way to use a buildpack is to set it at instance creation time, so I created a new one for asset debugging.

heroku create --stack cedar --buildpack http://github.com/heroku/heroku-buildpack-ruby.git

Use Your Node

The problem turned out to be a missing node binary. In order to enforce it’s total isolation, Heroku gives each slug compile it’s own local, relative bin/node Since I had proven locally that the easiest way to make the forking rake behave was to cd into the rails subdirectory, it no longer had node at the correct relative path. I created a railshost/bin/node script to call the appropriate path (make sure to pass the arguments!) Heroku Support later suggested a link, though I was leery of running a link through the version control system.

Checklist

  • Check your config variables
  • Run it locally
  • Change directories as necessary
  • Add stubs to make project detect as Rails
  • Make sure you aren’t connecting to the database
  • Make sure you can use the project-relative path bin/node
  • Turn off asset compression to investigate JS errors (and don’t forget to turn it back on)
  • Use --trace to debug rake tasks
  • Use a custom buildpack to debug the process on Heroku.
Comments Off

I Spilled Queue All Over Myself

My program has a data import process, which will need to get refreshed from time to time. I started out focusing on the code. Tests ran things for a long time. I prepared fairly high level operations, and then wrapped Rakefile around them. There was still a lot of setup code, so I extracted an even higher level set of operations from the rake, so they could be just as easily driven by code. Now I just need some way to trigger those operations.

The application is hosted on Heroku at present. Heroku provides a scheduler add-on that goes up to a day (though a week would probably be sufficient for my purposes). However, Heroku wants large jobs done by worker processes, and the policy is to cut off others that run too long.

This naturally leads to the use of some sort of communication system to instruct the background workers do something. Commonly, some sort of queue is used, and Resque in particular has been getting some buzz lately. There is a free Redis add-on at Heroku, which is limited by size – not a concern for a communication structure.

Resque queues a class name and a set of parameters. Class names are handy because they are global constants, which can be looked up at the other end – if you want to get really technical, it doesn’t even have to be the same class. Resque calls a perform method on the class, passing the parameters.

This sounds pretty handy, at first blush. Just define a perform method (and which queue to put it into) and pretty much anything can become a background job, without having to inherit from some special class or anything of the sort. The problem of course, is that you can put it everywhere.

This became especially evident when I added heroku worker autoscaling to avoid having an expensive process running all day to support what will often be a few seconds of work each day. (IronWorker looks interesting, but can’t access Heroku’s shared databases yet.) The autoscaling module has be mixed into to each class which is acting as a worker. As I started out, this involved a fair bit of repetition. Now, there are of course methods to handle the repetition, but between that and sprinkling perform and queue around I was getting queue all over the application.

So, time for a new gem. The background gem has a module to wrap up the autoscaling and other common configuration. It also becomes the only piece of the application which has a direct dependency on the underlying queuing system (should Reque prove inappropriate at a later time) The background module depends on any other gems that needs in order for the background jobs to get their work done.

Comments Off

Putting Rails in it’s Place

Rails likes to own your application. It gets pride of place in the project’s root directory, setting up it’s app, config, and db. If Rails is just a delivery mechanism, why does it get the root directory?

I’ve got an application which ended up being more backend logic than CRUD. I’m starting to wonder if Rails is even an appropriate vehicle. In any case, I’ve broken out Rails-independent pieces into vendor/modules. This has worked fabulously, expect I’m getting really tired to typing vendor/modules, even with auto-complete.

Vendor is a piece of rails convention that says it’s something alien and outside the application, tacked on. In fact Rails is feeling more and more alien. Getting it out of the way was a multiple step process.

  1. Extract the core functionality into a Rails engine, hopefully flushing out some dependencies.
  2. Lift the Rails host application into vendor/modules
  3. Chop out vendor/modules so that the app is a directory full of gems.

The first step is fairly standard gem extraction – filling out a gemspec, shuffling Gemfile dependencies, adjusting some paths. There was one little hiccup with Bundler that I want to write about separately, but it comes down to making sure that the new engine requires all it’s dependencies from lib/mygem.rb. The rest is paths, paths, paths.

Autoload Paths

Once the main Rails app is at the same level as the modules, the autoload path has to change from Dir["#{config.root}/vendor/modules/**/lib/"] to Dir["#{config.root}/../**/lib/"]

Gemfile

As a precursor to the refactoring, I changed from a block form of path, to an imperative form, giving it the same effect as source. This allowed gemspecs to search the right places for my custom gems, without having to list each one in the Gemfile. Once the modules got dropped a few levels, the master Gemfile had to change from specifying a path of 'vendor/modules' to '.'.

My master Gemfile has lot more direct references than I would like, but gemspecs don’t allow for the kind of grouping that the standard Rails bundler configuration thrives on.

The rails command

The rails terminal command looks for script/rails to determine whether it’s in a rails project. Without that, you do have to type out railshost/script/rails, but from there everything is normal.

git

I found that the default .gitignore wasn’t covering some files in the moved rails directory, so I copied the file and adjusted both copies appropriately for their new positions.

Heroku

A lot of changes relate to running on Heroku, because it’s designed as a generalized hosting system and can’t know about my custom directory structure. Fortunately between Rails, Rack, and Heroku there are enough hooks to make everything work out. (Note: I’m running on the cedar stack.)

Gemfile

I already mentioned Gemfile a little, but remember that Heroku is only going to install a Gemfile at the project root.

config.ru

I knew from some previous research on running Rails in a subdirectory on Heroku that it was possible. Essentially, Heroku looks for a config.ru “rackup file”, which Rails includes by default. Heroku just needs a rackup file at the root level, pointing at the appropriate place.

Rakefile

Many tasks happen by way of the base level rake file. In particular it’s how background workers and other processes are often launched. A copy of the Rails Rakefile with appropriately adjusted paths should do the job. (Myself, I’m using a modified copy for tasks that aren’t part of the rails app.)

Procfile/Foreman

I started using Foreman a little while ago, both to identify my own worker command, and to start up a set of support processes for local development. A well configured rakefile will obviate some needs, but I had to adjust the path to the rails command for running the server, as well as some configuration paths for Postgres and Redis in my development environment.

database.yml

Under normal circumstances, Heroku can detect that you’ve got a Rails app and update the database.yml file to parse out their DATABASE_URL. Move the file from it’s standard location, however, and they just aren’t that clever. You can see what Heroku’s standard database.yml looks like by running a heroku console on a standard rails app and simply printing it out.

>> puts File.read('config/database.yml')

I ended up with dash of Sequel on top of ActiveRecord, so I had already developed a DatabaseSpecification gem to translate between schemes, which made for a less verbose database config file. I was able to copy the dynamic development configuration to the production block and carry on with little trouble.

Files to watch

  • application.rb
  • config.ru
  • database.yml
  • Gemfile
  • .gitignore
  • Procfile
  • Rakefile
  • calls to rails
Comments Off

Rails 3.2 Autoloading, In Theory

While I was digging into the implementation of autoload_paths and company, I started out in edge rails and saw some of he work under way for 3.2. The material seemed relevant since my last article is shortly to become a little obsolete, but it carries the heavy disclaimer that I haven’t had to fight with this yet – I’ve just read the source.

Rails 3.2 will introduce lazier autoloading. It will only reload files if some file has been changed. I initially read this as saying that only changed files would be reloaded, but on further inspection (and more careful reading of the changelog), all it’s doing is changing the triggering mechanism for the same thing it was doing before.

Most things actually work the same – only the first and last steps change. Our last step (to_cleanup) in the < 3.2 system was to clear out the autoloaded constants. instead, we have a first step (to_prepare) which examines all the autoloadable files to see if any have changed. If any file has changed, we resume with the step as before – clear out the constants so that they will trigger autoloading again.

The process is optional, controlled once again by a config variable.

config.reload_classes_only_on_change

Here’s the process in another form

`reload_classes_only_on_change = false’ (old behavior)

  1. Do the request, loading constants as needed
  2. Clear the autoloaded constants

`reload_classes_only_on_change = true’ (new behavior)

  1. Clear the autoloaded constants if any potentially loadable file has changed
  2. Do the request, loading constants as needed

It wasn’t entirely true to say ‘any potentially loadable file’ The new process has it’s own set of config variables. However, our old friend autoload_paths gets appended to it down in the bowls of Rails. This means that:

  1. You probably don’t have to configure anything new
  2. You can’t stop it from watching those paths either (short of turning the whole feature off)

    config.watchable_files
    config.watchable_dirs

In truth, however, you aren’t stuck with anything: the file watcher itself is completely configurable. Just set config.file_watcher to your own class. Look up ActiveSupport::FileUpdateChecker for an example and documentation. Of course, it’s less likely to be ‘your own class’ than a class optimized for your platform, provided by a gem.

Fewer Restarts?

The file_watcher gets added to an array of reloaders. The one of those reloaders watches the routes file. Meanwhile, the database schema gets added to watchable_files. There might be fewer reasons to restart your server in the future.

Comments Off

A Delegate Matter

From time to time I hear rallying cries to use the Ruby standard library instead of building things from scratch. One piece I’ve been experimenting with is delegation. What I’ve learned is that delegation isn’t all rainbows and unicorns. It still complects your program, and you still have have a deep understanding of what is going on in order to understand the problems that arise.

Meet the Delegation

delegate.rb is a piece of pure Ruby included in the standard library. The source is readable, if a little obtuse if you aren’t comfortable with Ruby metaprogramming. It includes some comment documentation, which provides a few usage examples.

Delegation is forwarding of responsibilities to something else. In object-oriented programming, it means sending some or all messages to another object. In Ruby’s case, it generally means that you want to delegate all messages except for the one’s explicitly defined on the object.

Ruby’s delegate.rb defines three classes.

  • Delegator is intended as an abstract base class for the others. You could use it directly, but you usually won’t.
  • SimpleDelegator seems to be intended for use when you want to hand out a reference to an object, and then change the object that is actually receiving messages without having to tell everybody you gave the reference too, a kind of proxy. I’ve used it as a superclass without much trouble, to override a few methods and forward the rest.
  • DelegateClass, the comments claim, should be the most common use case. It expects you to name the class of the object you want to forward to, at which point it defines forwarding methods for exactly that set of methods. This requires having a class, possibly abstract, that represents the set of messages you might want to send. I haven’t used it because I tend towards the message-sending view of OOP, where you don’t need to know the type of the object beforehand.

I was using SimpleDelgator as a superclass to set up a kind of psuedo-prototypal relation. I was was fearful of doing a more direct prototypal style. I really out to benchmark it some time.

In any case, I set up series of tests and my delegation scheme worked well.

And Then, I Added Rails

Some time ago, I added the necessary bits to serve XML and JSON from the Rails app, which at the moment is mostly serving as an API backend. That also got set up and worked fine.

Later, I took some of the objects being returned through the API, and wrapped some SimpleDelegator Subclasses around them, to provide information enhanced by another API. This seemed to work well, and the HTML results reflected the updated information.

You probably noticed I said “HTML results”. When I went back to grab a JSON snapshot for another test, I found a problem. (Actually I found a couple problems, but I’m only going to discuss one here.)

The JSON results were showing data for the unwrapped object. I re-ran my tests and everything was fine. I looked at the HTML results, and they were splendid. For some reason, when Rails tried to convert my object into JSON, it grabbed the wrong data.

It may be worth noting that the object included a custom to_hash method, and earlier experimentation had proven that this method was used by the JSON serialization routines to convert objects. (Unless they are ActiveRecord objects, in which case you need a serializable_hash. Okay, so I’m discussing two bugs.)

One possible explanation is that when the delegated object gets to_hash, it uses it’s own properties instead of going through the delegator. The problem is I foresaw this issue, and I had the tests to prove that it worked. Outside Rails.

Lets get perfectly clear on what is happening. For the sake of argument, lets pretend we are getting data from service A, say a movie database. We then want to call a second service B, and use data from this second service to update how we represent the user. Let’s say B is a list of playtimes at nearby theaters. So I want to return one result for each play time, with each item providing a time attribute, and overriding the title to include the time.

When we ask a result for it’s rating, the call goes

Playtime:22 => Movie:11#rating

Where I’m using ’11′ to stand in for object identity. When we ask for the title, it returns the modified one:

Playtime:22#title

The actual ancestor chains are a bit more more complicated, but this is what matters for our purposes. Lets just get one thing straight: say we haven’t redefined a basic method, like inspect. It will get handled because it’s defined on Object.

Playtime:22 => Movie:11 -> Object:11#inspect

If you are paying careful attention, I’ve already let the cat out of the bag, so to speak. But there’s still a plot twist to come.

To catch Rails about it’s dirty business, I sent in a spy.

class Spy
  def to_hash
    {}
  end

  def method_missing(method, *args)
    log.debug {"CALL: #{method}(#{args.inspect}) {#{block_given?}}"}
    log.debug {caller(6).take(2)}
    return nil
  end

  def respond_to?(method)
    log.debug {"PROBE: #{method}"}
    log.debug {caller(6).take(2)}
    super
  end
end

What this tells us is that Rails is probing, in order (respond_to?) as_json and then to_hash

So, it checks for as_json, doesn’t find it, and then checks for to_hash, right?

The backtrace rewards investigation:

activesupport-3.1.1/lib/active_support/json/encoding.rb:149:in `as_json'

Wait, we didn’t have an as_json to call right? Why did it go on to check for to_hash?

149:    if respond_to?(:to_hash)

But it says it’s in as_json?

148:  def as_json(options = nil) #:nodoc:

Wait for it….

147: class Object

Oh. I guess we did have one. Three cheers for “Freedom Patching”. (I think I heard three… somethings.)

Okay, so we’ve got an as_json method. It just sniffs out to_hash and calls it. Big deal right?

Remember how I helpfully included object identity in the message diagrams earlier?

Playtime:22 => Movie:11 -> Object:11#as_json

Perfectly logical, right. But we have to be very careful about understanding of message sending. When an object sends a message (to_hash) to itself, it doesn’t know anything about upstream delegators.

Movie:11 -> Object:11#to_hash

And so our JSON has the Movie version of the properties, without any helpful playtimes.

Update: I seem to have rediscovered self schizophrenia.

Being A Little Forwardable

The comments in delegate.rb include a helpful pointer to Forwardable, another standard library module. Forwardable requires you to explicitly list messages to be forwarded.

extend Forwardable
def_delegators :@movie, :rating, :type

Since the object forwarded is explicit, you can also send messages to different objects. As an added bonus, Forwardable is a module instead of a class, so it doesn’t take up the one and only inheritance slot.

What is significant for the present purpose is that Forwardable won’t forward unknown messages like as_json. Since the message doesn’t get passed down, the object identity doesn’t change, and we don’t get such surprising behavior.

Playtime:22#rating => Movie:11#rating

Playtime:22 -> Object:22#as_json
Playtime:22#to_hash

There is another module called SingleForwardable that I don’t completely understand. I think it sends the messages to a module instead of an instance variable.

Comments Off

Rails Autoloading: cleaning up the mess

Just when I thought I understood Rails autoloading, it threw two more curve balls at me.

Managing Database Connections with to_cleanup

When a database application is running along nice in production, you don’t want to be creating and destroying connections all the time. ActiveRecord and company can handle this easily in development because they are library code, which doesn’t get reloaded between requests.

I, however had a rails-independent library. It started out as database conversion, so I used Sequel, which allowed me to throw hashes at the database without undue ceremony (other than setting up a schema) Eventually, a query got added as well, with a database connection kept around as a class variable.

Problem: when the class gets unloaded, the reference to the connection gets lost, but not actually disconnected. Not only does the app have to connect again (minor) but the database eventually runs out of connections, and my app gets an exception. (In the case of Postgres, PGError: FATAL: sorry, too many clients already)

Fortunately, we can hook into the reloading process and request to be disconnected beforehand. Reloading appears to be managed by ActionDispatch::Reloader, which has a to_cleanup method. Pass it a block to be run during the unloading process.

This is a delicate procedure. Putting in a config/initializers/ doesn’t work because it gets scheduled after the cleanup method that actually unloads the classes – it will reload the class and then not find a database connection to disconnect. The call actually has to appear in development.rb or another suitable early place in order to get a chance at the connection before the class’s untimely demise.

ActionDispatch::Reloader.to_cleanup {Thingy.disconnect}

(There is no magic to disconnect, it’s a method I wrote.)

Class Configuration with to_prepare

A common pattern for configuring Ruby code is to set properties on a class or module. (I’m actually starting to distrust this – it’s shared global state – but we’ll roll with it for now.)

Problem: when Rails unloads the classes, the configuration gets lost. The settings work once, and then disappear on the next request.

Fortunately, ActionDispatch::Reloader also has a to_prepare method that gets called before each request. For the price of putting my code in a block that gets re-run every time (it’s only in development.rb after all), I can ensure that the parameters are set every time.

ActionDispatch::Reloader.to_prepare do
  Thingy.perform_timing = true
end
Comments Off

Give Rails Autoloading a Boot to the Head

In my last post on doing modularity like you mean it, I discussed the fact that Rails makes using modular applications challenging because you don’t get automatic reloading. Rails engines have the necessary hooks to get things working relatively smoothly, but if you are making truly Rails-independant code, you are on your own. Now that I’ve confirmed my suspicions by reading the source, I’m going to talk about how to give Rails autoloading a boot the head and keep gem based development from driving you crazy.

The code itself is somewhat long and convoluted – just what you’d expect from a library that hacks core Ruby semantics, and has been battle tested by being embedded in one of the most popular Ruby projects. The tests cover a lot of corner cases, and I imagine that many of them had their origin in bugs filed by frustrated users. What I’m really going to describe is my own understanding of the process, which has been confirmed by reading the code.

Note that my code check, and almost all of my experience, is with Rails 3.0 and 3.1.

Rails, during development, operates like this:

  • An unknown constant is detected
  • The constant name is converted to a relative path
  • The autoload paths are consulted, to see if any of them contains the desired file. The file is loaded.
  • Constants defined by loading the file are tracked.
  • If the file doesn’t create the constant that triggered the load, an error is raised.
  • At the end of the request, all the autoloaded constants (which were tracked) are removed. This allows them to be reloaded on the next request and receive updated code.

At almost every step in this process, things can go wrong.

An Unknown Constant is Detected

Ruby is somewhat famous for method_missing, which allows you to intercept an undefined method, and provide some behavior instead of (or in addition to) an exception. Less well known is it’s cousin, const_missing.

This is far from the most common stumbling block, but it’s still helpful to know how it works. If you define your own const_missing, Rails autoloading won’t work. Perhaps more common, if your Rail-independant code requires your files, the constants will be defined, and won’t trip const_missing. While this may seem obvious, we’ll be returning to this fact shortly.

The other important detail is that if you have nested namespaces, the outer namespace gets fully resolved before the inner namespaces have a chance to trip const_missing. To fully understand the implications of this, we need to work through the next step.

The Constant Name is Converted to a Relative Path

The assumption is made that there is a one to one correspondence between the fully qualified module name and the relative file path. You violate this assumption at your own peril.

Fully qualified means the complete name needed to reach the constant, starting from the base level. Gem::Submodule::MyClass is fully qualified. MyClass or Submodule::MyClass are not. You might be able to use the shorter forms inside the appropriate module or class declaration, but I honestly haven’t investigated how smart the autoloader is in this case.

It’s the fully qualified name that gets converted to a path. So in the above example, gem/submodule/my_class.rb. Notice that namespace levels get converted to directories, and CamelCase gets converted to under_score. If classes and modules don’t exactly match this convention, they aren’t going to be found, with one exception.

Now that I’ve established how filenames are created, I can revisit the point about the order constants are looked up. Recall that const_missing trips “in order”. In our running example, the first thing that will happen is that Gem will generate a call to const_missing – not Gem::Submodule::MyClass. If gem.rb defined Gem::Submodule::MyClass, then things can stop there. The later constants are defined, so const_missing won’t fire, and no more autoloading will occur. If you define the parts in the higher level namespace, you aren’t foreced to define files and directories for the parts.

The opposite, however is not true. If you define Gem::Submodule::MyClass in gem/submodule/my_class.rb, but don’t have a gem/submodule.rb, the autoloading process will fail when it hits Gem::Submodule and can’t find gem/submodule.rb. Work arounds include having a stub file, or declaring the module in gem.rb.

The Autoload Paths are Consulted

Just as Ruby has a LOAD_PATH where it looks for gems, Rails has an autoload_path where it tries to append the relative path for a missing constant to find someplace to load it. Specifically, it is config.autoload_paths, or ActiveSupport::Dependencies.autoload_paths. A recent experience indicates you use config in application.rb (right where the comment suggests it),

config.autoload_paths += Dir["#{config.root}/vendor/modules/**/lib/"]

and the fully qualified form if you need to add some later, such as in development.rb or an initializer.

ActiveSupport::Dependencies.autoload_paths += %w[ .... ]

Also of note is autoload_once_paths, which enjoy the same searching benefits, but don’t have their constants tracked (and eventually unloaded).

Constants Defined by Loading the File are Tracked.

The autoload system keeps track the constants defined during an autoload operation. It records these as autoloaded constants, for later use.

It’s important to note that ‘autoloaded constants’ are a key feature in Rails ability to reload files during development without restarting the server. If you caused a constant to be defined by an explicit require, (or even a straight up definition) it won’t be marked as autoloaded. But I’m getting two steps ahead of myself.

If the File Doesn’t Define the Constant that Triggered the Load, an Error is Raised.

This is the infamous LoadError: "Expected #{file_path} to define #{qualified_name}", which means pretty much was it says. When const_missing gets called, it by necessity knows which constant was missing, and to keep running the program it has to provide a value for that constant. It’s going to get that value by attempting to read the constant again after loading it’s best guess for the appropriate file. If the constant isn’t defined, not only would there be no value to provide, but const_missing would trigger and you’d get an infinite loop (at least until the stack ran out).

At the End of the Request, All the Autoloaded Constants are Removed.

In order to get automatic reloading, the system will have to trigger autoloading again next time around. In order for that to happen, the constants need to be missing for const_missing. During request cleanup, all the autoloaded constants are purged, so that they will be able to trigger again, bringing in fresh code.

If you are developing your application as gems, the first challenge is to get listed in autoload_paths. After that, you may still have the issue that files were loaded by require, so their constants weren’t marked as autoloaded. Fortunately, there is also configuration parameter for this (I keep it in development.rb)

  ActiveSupport::Dependencies.explicitly_unloadable_constants += %w[ Gem AnotherGem ]

This is actually a separate list than the autoloaded constants. The autoload list is cleared after each run, (with the expectation that it will be repopulated) whereas the system attempts to remove the explicit constants every time.

And that’s Rails autoloading in a nutshell – at least until the Rails 3.2 announcement where “we now only reload classes from files you’ve actually changed”.

Comments Off

Fast Rails Tests are a Half-Hearted Effort

There is a niggling problem with fast rails tests. Actually, it’s kind of a fundamental problem. If you are testing code that doesn’t need rails, why is that code in your rails project? If it is, you’re feeling the power of the dark side.

Getting fast rails tests requires some effort. You need a separate directory. Those tests may have a separate spec helper. You have to prevent them from loading the default spec helper. You need to tell all your tools which directory to use, and slap them upside the head when they make assumptions. Some will ignore you anyway. The fact is that all this is swimming upstream – we already have a well established method for making completely independent tests.

“Rails is delivery mechanism, not an architecture.” – Uncle Bob Martin, The Lost Years of Architecture

Uncle Bob has a solution for monolithic Rails projects: break them up into separate gems. The really nice thing about this is that gems have their own tests. Now you are swimming downstream – going with the flow and working the way the tools expect to be used. It’s a lot harder to accidentally load Rails, and if you by chance use a railism, the tests will just fail. (Given sufficient coverage.)

Fast rails tests are a half-hearted effort because it only separates the tests. If the code you are testing is really independent of Rails, a whole-hearted effort would lift the code itself, along with it’s tests, into it’s own gem (“I’m a real boy now!”) Since I don’t need to share these gems between projects, I’ve been putting them into /vendor/modules. I shied away from other directories like /vendor/gems because I didn’t want other tools making assumptions about where those gems came from or what it is or isn’t safe to do with them. Everything still exists within one source control repository, but I’ve got my eye on a future where the modules become full fledged gems shared between multiple projects.

Separate gems save you from jumping through hoops to run your tests. There’s just one little problem: now you have to jump through hoops to run your application.

“Once you start down the dark path, forever will it dominate your destiny” – Yoda

Rails wants to own your code and eat your soul. It wants total control of your app and where it’s files are located. If you move the cheese, Rails will get all pouty, pee on the carpet, and dig up the flowerbeds.

One of the conveniences Rails provides is automatic code reloading. It’s the kind of thing you don’t notice until it’s gone. Rails only provides automatic code reloading for user files it owns. If it’s coming from “a gem”… well son, that there’s library code, which will obviously never change. So it doesn’t need automatic code reloading, which would after all, just be overhead.

The alternate is, just likely every other Ruby application, to stop the program and start it from scratch to see your changes. In case you haven’t noticed, Rails takes a long time to start up. It’s a long way from fast Rails tests. One defense, of course, is to run your fast non-rails tests and try to get everything working before integration. In the real world, we still have integration problems, which means you’ll still be cycling the Rails server.

The appropriate levers to fix this are ActiveSupport::Dependencies.autoload_paths and ActiveSupport::Dependencies.explicitly_unloadable_constants. However, I still run into problems and haven’t yet done a sufficient deep-dive on the problem to really understand how to properly set things up.

One comment so far, add another

Fast Rails Tests

Ruby on Rails gives us incredible power. MVC architecture, automatic reloading, easy model relations, cacheing, asset management. It also gives us incredible overhead, especially when it comes to testing.

I recently ran across this in Event Faster Websites, quoting Jakob Nielsen’s research:

  • 0.1 second is about the limit for having the user feel that the system is reacting instantaneously, meaning that no special feedback is necessary except to display the result.
  • 1.0 second is about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay. Normally, no special feedback is necessary during delays of more than 0.1 but less than 1.0 second, but the user does lose the feeling of operating directly on the data.
  • 10 seconds is about the limit for keeping the user’s attention focused on the dialogue. For longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. Feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.

Where do your tests lie on that scale?

Rail Overhead

Rails tests tend to be slow for two reasons:

  • Loading the entire framework with all it’s nifty features on every test run.
  • Using the entire framework with all it’s nifty features on every test.

Not using the full stack (especially hitting the database) basically comes down to mocking and stubbing, which is a topic unto itself. What I’m going t focus on here is not loading the entire framework. This doesn’t mean addressing loading won’t cause some mocking and stubbing – anywhere you butt up against the Rails API or ActiveSomething, you’re going to have to stub that bit out to avoid loading it, an exercises that should make a nice object lesson in modularity and coupling.

Fast Rails Tests

Getting separately testable pieces won’t be easy. You actually need something that doesn’t depend on Rails, such as accessing an API, or that can be easily stubbed out, such pulling a couple methods into a module. You can do some refactoring in the standard test environment, but then you’ve got to figure out how to actually test them without rails.

Corey Haines demonstrated /spec_no_rails. That was a little long for me. as well as a little negative, so I went with /spec_clean. I wanted, of course, to have /spec be the plain tests and /spec_rails the slow ones, but that runs afoul of the many and various tools, such as autotest, which make assumptions about your directory structure.

I also tried have a /spec/clean directory, but I ran into trouble with the default load paths and assumptions made about spec_helper.rb. In the end I had a separate directory for spec_clean. To avoid the ambiguity of spec_helper, I had spec_rails_helper and spec_clean_helper. Getting this to work with the default load path required creating/spec/spec_clean_helper.rb` and having it load the correct file, which could then adjust the load paths appropriately.

You have to be careful adjusting the load paths. Oh, and in your tests, don’t forget that won’t have Rails magic constant loading, so you’ll have to require everything explicitly. My first draft of spec_clean_helper looked like this:

$LOAD_PATH << File.join(dir)
$LOAD_PATH << File.join(File.join(dir), "../app")
Dir[File.join(dir, "support/**/*.rb")].each {|f| require f} 
require File.join(dir, '../config/initializers/my_init')

RSpec.configure do |config| ...

If you want to go a step further and avoid Bundler’s startup overhead, you’ll also have to make provisions for anything outside of Rubgems – directly loading anything that falls under Bundler’s path or git options.

One comment so far, add another