I sold my cow and all I got were these url helpers
Posted by Craig Ambrose on April 20, 2008 at 11:19 PM
In rails applications, we link to other pages in our application by generating a url which maps to a particular controller (class) and action (method) using a rule which we call a route. Back in rails 1.0, we would do something like this:
<%= link_to 'Edit User', {:controller => 'users', :action => 'edit', :id => @user} %>
This is not an article on routing for dummies, I presume you already know this stuff. However, I want to recap why we do this, in case anyone has forgotten the reason for all this.
To give a point of comparison, lets assume that there were no routes in rails, and that code was directed to a particular place based on the default rules of ”:controller/:action/:id?:other_params”. A link might look like this:
<%= link_to 'Edit User', "/users/edit/#{@user.to_param}" %>
That’s actually shorter than the above. Clearly brevity isn’t the main goal here. So what are the goals of customisable routed?
Human (and SEO) Friendly URLS
If we need further parameters, we don’t want to introduce a question mark into the url. We want it to keep looking like a directory structure. If we are creating a new user inside group 5, we want a url like /groups/5/users/new, instead of /users/new?group_id=5. This goal is probably not one you have forgotten, so lets jump on to the next one.
A Single Point of Change For Url Mappings
It’s a well known bad smell in any piece of software if changing your mind about one simple concept requires you to make changes all through the code (Martin Fowler calls this “Shotgun Surgery”). If our client says, “can you change all references to ‘users’ in the urls to saying ‘people’ instead”, or “can you prefix all admin urls with /admin” then we would expect to be able to do so without too much trouble.
The beautiful thing about routing in rails is that the routes control both the generation and the parsing of urls. Back when I wrote PHP apps, I had code to parse the urls (big ugly case statements) and in some cases I had code to generate the urls, but I never bothered to create one simple system for doing both.
So, with those two goals in mind, lets travel back to the present and look at resource routes in rails 2. When we create a route with map.resource, a bunch of special helper methods are also created. This allows us to replace our initial example with
<%= link_to 'Edit User', edit_user_path(@user) %>
Lets look at the pros and cons of that.
Firstly, it’s much shorter. The hard coded string was a fair bit shorter too, so we know that brevity isn’t always the main goal, but short is generally not a bad thing.
It’s a little bit more english-like, in that it contains less symbols. However, this also means that it is less semantic. It’s easy to learn how to read urls that are specified as a hash of controller, action and params. I know how to read the resource helpers too, but there are a few different rules to learn in order to parse them mentally, and I find it takes new rails programmers a little while to figure them out.
It’s overloadable. Since it’s a method, we can declare a helper of the same name and do something completely different. This can be handy, although in practice it’s a bit dangerous since there are other ways to declare the route, so you’re not guaranteed to intercept all calls. Also, it would then give behaviour that our programmer who has now worked so hard to figure out how the restful routing helpers work something of a surprise. The principle of least surprise is worth considering.
By in large, I’m still kind of in favour of the new notation at this stage. When I first learnt it, I thought, “wow, that looks much nicer”, and that feeling is a very important argument in it’s favour. We’ve gotten a lot with this new functionality. I just want to mention the one feature we sold off without even noticing.
When generating a url we are new directly linking the view code to a single routing rule.
“WTF?”, I hear you say.
Haven’t we already established that the rails routing system decouples url generation from the controller code that it maps to, allowing us to configure the interface between them in one place (routes.rb). Why am I now saying that I’ve linked the url generation in my view to a single route and lost my ability to vary it at will?
Lets say that we wanted to map all uses of the user edit action to a totally new url. It could look like anything we wanted, previously, we had the power to do so because our request to generate a url just gave the keys and values, and our action just accepted key/value parameters, and the string we used for the url in between was totally up to the routes.
Now we’re using a method unique to this action to generate the url. By calling edit_user_path(@user), we’re not actually giving up the flexibility to decide what that method does, but if we wanted to make it map to anything other than the edit action on the users controller, nested inside no other resources, then we’d be violating all the conventions that we’d built up in order to understand the user of these helpers.
So, if we want to do something like move this action to a different resource, we find that we need to go through and use a new set of helper methods for all the links. Since we need to change each link to do this, we’re really not much better of that if we’d used hard coded strings in the links.
If you wanted to rename the users resource to ‘people’, it’s quite a tricky operation. I’ve done it many times and without foolproof refactoring tools, you need to search and replace for strings like user_ and look at each method call to see if it’s a url helper which should be renamed to edit_people_path or similar.
Recently, I’ve been experimenting with going back to expressing things as a hash. Also, in doing so I follow a set of conventions.
- Always put spaces either side of the
=>operator. - Always use symbols for the keys
- Always use single quotes as string delimiters for the values, if they are string literals.
This gives much more precise things to search and replace for. Lets say that I want to rename the users resource to people. I can search for all strings matching :controller => 'users'.
I’m not necessarily saying that this approach is the best, but I think that we should all consider that the goal here is simplicity. The simplest code isn’t necessary the shortest. The simplest code is the easiest to read, the easiest to learn it’s full meaning, and the easiest to change. When we got all excited about named url helper methods, I’m not sure it was at all clear how much we were giving up in return.
Storing Images from remote URLs
Posted by Craig Ambrose on February 22, 2008 at 12:17 AM
Here’s a handy little modification to attachment_fu to allow the model to have it’s file data set from a remote url. I’ve been doing this for a while but recently altered it to compute the mime-type from the downloaded file, rather than just basing it off the extension. This is because some web sites host files (such as images) at urls without a file extension.
As this is a monkey patch for attachment_fu, I follow Chris Wanstrath’s evil twin convention and put it in a plugin called attachment_fu_hacks.
This code is also dependent on the mimetype-fu plugin. Mimetype-fu calculates the mime type of a file using the *nix “file” command, rather than using file extensions as the mime-types gem does. If you’re using OSX, then this wont work unless you change the two occurances of file -bir in mimetype_fu.rb to file -br —mime (which is compatible with both OSX and linux). I’ve submitted that change to the author so hopefully it will be incorporated into future versions..
respond_to.email, or how to handle incoming emails in rails RESTfully
Posted by Craig Ambrose on February 09, 2008 at 04:49 AM
There’s a bunch of information around on how to handle incoming emails with your rails application, in particular the wiki page, but I have some concerns with the methods that are being suggested, and in this article I present an alternative which I’ve been trying out and I really like.
Handling incoming email is, in essence, very simple. All you need to do is get the email, which is a big chunk of text, parse it with a ruby email class, such as TMail (which is used by ActionMailer), and perform some action. If you’re only handling a few specific addresses, it might be best to fetch the email via POP3, and I’ve done that before using a daemon to regularly poll the pop account.
POP3 is not a viable solution if you want to handle all email for a certain domain. At this point, we probably want to talk about SMTP.
A Very Short Guide to SMTP
Simple Mail Transport Protocol is pretty damn cool if you ask me. It’s dead simple, basically the client can only say “hello, here’s an email from X to Y”. Just like HTTP, it’s fully push based. There’s no polling, emails get pushed across the internet. Just like HTTP, it has a hole stack of response codes which are of course appropriate to trying to send an email, rather than talk to a web resource.
Using Postfix Mail Filters to Call Ruby
Postfix is a common open source SMTP server. Before I looked at it, it was big and scary. After a few hours of expert help, I wonder what seemed so complicated. One of the basic ways that we can use postfix to push mail to our rails app is by specifying a command like script which gets executed whenever postfix gets an email. This is the first option presented on the rails wiki, and they suggest using a script which calls the receive method of one of your ActionMailer classes.
My Concern
If we’re going to use ActionMailer to parse an email, and then presumably fire off a bunch of ActiveRecord code to make changes to your database as a result, clearly we’re loading the entire rails stack. Every time we get an email we’re loading the entire rails stack. This seems like how we handled web requests back in the day when there was only mod_cgi. No shared resources between requests, a big performance hit for loading all of rails and then getting rid of it each time, and the concern that we can only handle as many incoming emails as we have RAM on our server as the rails code takes up a bunch of memory.
What I Want
I don’t want to have to worry about the resources I need to scale my email server, I already do that with my application servers. I want to handle emails in a way that re-uses an in-memory copy of the rails classes and called be scaled in a predictable way.
That sounds a lot like a mongrel cluster.
We all have one of those already right. So why not handle incoming mail over HTTP? It’s dead easy, it scales well, and the result is really Rails-ish.
respond_to.email
I was hoping to get a plugin out of this. It’d be so handy that people would queue for miles to download it. The trouble is, it’s actually not even enough code to bother, it’s only about three lines of ruby and the same of postfix config. So, lets call this a pattern. I’ll describe how to do it, and you can all run off and do it yourself.
Step One: Install Postfix
Install postfix on one of your servers. For any sizable rails site, I like to have a little VPS just for daemons, cron jobs, scripts, and the mail server, to keep it separate from all the web stuff. On ubuntu, this was as easy as “sudo apt-get install postfix”. For the default configuration type, I chose “internet site”.
Step Two: Setup Your MX Record
For mail to start arriving at your mail server, you need to add a MX record to your DNS which points at the url of your server. Depending on your host, you probably have a web interface to do this, and it’s probably dead easy.
The Magic Script!
Create a file called mail_handler.rb, and pop it somewhere in your rails project. I created a /bin directory for it. Don’t use a rake task, the goal here is not to load in any unecessary stuff. Here’s the contents.
#!/usr/bin/ruby
require 'net/http'
require 'uri'Net::HTTP.post_form URI.parse('http://www.craigambrose.com/emails'), { "email" => STDIN.read }
If ruby is somewhere else on your machine, change the line at the top to be correct (try “which ruby” on that machine to see where it is). I’ve chosen to hardcode in the url that I want to post the email to so that I don’t have to load any other files. If you have more deployment environments to worry about, you might want to put the target url in a yml file and parse it here. Just don’t load your rails environment file, that’s the whole point of this.
Configuring Postfix to Call the Script
In this example, the domain that I want to handle email for is “craigambrose.com”. Everywhere you see this, replace it with your own domain name. Most of the commands below need root access.
In /etc/postfix/main.cf
mydestination = localhost.localdomain, localhost, craigambrose.com
virtual_maps = hash:/etc/postfix/virtual
alias_maps = hash:/etc/aliases
In /etc/postfix/virtual (this is a file, you may need to create this)
@craigambrose.com rails_mailer
The above says to redirect any address at craigambrose.com to the alias “rails_mailer”, which I’ll create next. You could run multiple rails apps of the same server by giving them all unique aliases. On the left, you can use a regular expression to match addresses if you only want to match some of them.
To apply this change to virtuals, run:
postmap /etc/postfix/virtual
In /etc/aliases
rails_mailer: "|/var/www/apps/craigambrose/current/bin/mail_handler.rb"
That’s the alias we created on the left. On the right is the path to my script, change as necessary. The pipe character before the script path means “the following is a shell command, not an email address”.
To apply this change to aliases, run:
postalias /etc/aliases
To apply the main configuration changes to postfix, run:
/etc/init.d/postfix reload
Testing the Setup
I should be able to send an email now to “someaddress@craigambrose.com”. To see it get process by postfix, we might want to watch the postfix info log:
tail -f /var/log/mail.info
When the mail is process, you should see a line like:
to=<someaddress@craigambrose.com>, orig_to=<root>, relay=local, delay=2, status=sent (delivered to command: /var/www/apps/craigambrose/current/mail_handler.rb)
Then, go peek at your rails app logs. You should see that the mail has been passed through by the script. Even if you haven’t written an action to handle it yet, the log entry should be there.
Troubleshooting
If you didn’t see the correct line in your postfix logs, then perhaps there’s a problem with your DNS Set. You could try talking to postfix directly. Mail servers listen on port 25, and you can telnet into them and speak directly. Try “telnet YOUR_SERVER_IP 25” And the try typing in what the client says in the sample SMTP communication on wikipedia with the example address changed to the domain that you want to test. If that works, but sending email didn’t, you’ll need to investigate your DNS setup.
Handling the Rails Action
The target url I put in my mail script was http://www.craigambrose.com/emails, so the mail is going to get POSTed to that resource. With normal rails resource routes, that means that we’re expecting to handle the email in the create action of the EmailsController. That seems very sensible to me. My script puts the unparsed email into params[:email].
To parse it with TMail, all you need to do is:
require 'tmail'
email = TMail::Mail.parse(params[:email])
Alternatively you could pass it to the “receive” method of any ActionMailer derived class, which does the above automatically.
I’ve had some reports that TMail is both a little slow, and also not quite up to parsing all the possible ways that an email might be encoded in the big bad world. That’s a subject for another blog post.
Final Performance Note
When postfix is calling your script, it makes so that only a certain number of calls are occurring concurrently, the default is 20, which seems pretty good to me. If you’d like to tweak this, use the following setting in main.cf (and don’t forget to reload postfix afterwards).
default_destination_concurrency_limit = 30
Acknowledgments
Setting up servers is not my area of expertise. Many thanks to Andrew Snow of Octopus for the postfix help and Pete Yandell for sharing some of the lessons learned on his great mailing list site 9cays
Image management that will scale
Posted by Craig Ambrose on November 27, 2007 at 02:59 AM
There is a lot of conflicting information around about handling user uploaded images in rails applications. I’ve done it a number of different ways, and the good news is that it’s not too hard to move from one system to another. However, dealing with scaling issues is a pain and it’s nice to get it right first go. So, here are some problems that I’ve encountered recently, along with some solutions.
Files Per Directory Limit
Depending on which OS you use for hosting, you’ve probably got a limit to the number of files (or directories) you can put inside a given directory. It’s usually about 32,000. While this seems like a long way off, if your site accepts user content then hopefully this will eventually become a problem for you. There have been various talks and articles written about different hashing systems for file names, but it’s worth mentioning that this is basically a solved problem, and you shouldn’t have to tackle it yourself.
If you’re still using file_column, as I am for a few things, then this one might bite you. The simplest solution, I think, is to migrate to attachment_fu. The file system store for attachment_fu implements file name based hashing, and the s3 and database stores don’t suffer from the problem at all. Also, the way in which attachment_fu handles pluggable storage classes means that you could also slip in your own custom storage system later without having to change the way that you use attachment_fu in your models.
If you’re thinking of making the switch, here’s an article I wrote on migrating from file column to attachment_fu.
RMagick Memory Leaks
RMagick is really handy, and so just about every rails image handling tutorial on the internet recommends it’s use. I’m using it all over the place. My advice to you, is don’t ever do this. It turns out that RMagick leaks memory every time it manipulates an image. I haven’t measured the amount myself, but I’m told it’s quite a bit. Certainly I’ve been having resource consumption problems with scripts using RMagick heavily. So, say goodbye to it.
DHH recommended just using the image magick binaries manually. That’s basically a good idea, but a slightly easier way of doing that is to use the mini_magick gem. Mini magick provides a ruby API, but under the hood it just calls the image magick command line tools. Attachment_fu comes with a mini magic processor, so you can just add ”:processor => :MiniMagick” to your call to has_attachment and you’re in business. Khamsouk Souvanlasy wrote a good tutorial on using mini_magick with acts_as_attachment.
Cropping
The one thing I noticed in using attachment_fu instead of file column is that file column resizes images and crops them nicelly, the way that you would expect. By default, attachment_fu tends to stretch them. This has been covered better by other people, so I just want to mention it because stretching is almost certainly what you want, and until Rick fixes it, I’d suggest making a small change to the plugin yourself. There are a number of articles on the subject, but I think the best one is probably over at toolman tim’s blog.
Don’t just go with Tim’s solution though, have a look at the comments, and you will find options for the different image processesors. I used “labrat’s” suggested fix for mini magick (paste here)
Amazon S3
Amazon S3 appears to be a great solution for handling user generated images, and I’m starting to use it a fair bit. One word of warning however, is that I’ve already started to encounter an occasional communication error with amazon, as discussed in this thread and I don’t yet know how serious it is or how easily fixed. I’ll post some more on this subject when I’m better informed.
Caching Makes Your Brain Explode
Posted by Craig Ambrose on November 13, 2007 at 04:20 AM
I’ve been spending a lot of time recently trying to make boxedup.com scale. Before I started, I’d watched the right screen-casts, read the right books, and I thought I knew what had to be done to speed up rails applications when the need arose.
Boy, was I wrong.
A quick look at the three methods of caching rails pages reveals that page caching is of no use to a site which insists on displaying the current user on all pages (as most of them seem to). Next up is action caching, which does let me execute before and after filters, allowing me to handle to logged in user, but caches the entire rendered action, including the layout, so once again I can’t display the currently logged in user. There are possibly some ways around this, but since action caching is really just a specialised form of fragment caching, lets talk about that.
Fragment Caching
Fragment caching does work. In fact, my first attempts at it benchmarked so well in my simplistic “load this page 100 times in httperf” tests that I dived in head first. The books on this subject, particularly the pragprog one, give the impression that this is pretty straight forward. It’s not. There are some massive gotchas that will bring even a fairly low traffic site to it’s knees if you don’t watch out for them.
It’s All About Expiry
You can’t consider caching without thinking about cache expiry. In rails, this is typically done with cache sweepers. For fragment caching, the sweepers call the expire fragment method. This can take a string, which matches the fragment name exactly, or it can take a regular expression.
Gotcha #1, Don’t Use expire_fragment With A Regex
First up, this doesn’t work with memcache anyway, it only works with the file system cache. There’s nothing that wrong with the file system cache. Reading from it is faster than rendering a template. Expiring from it, however, is pretty slow. Expiring from it using a regex is absolutely appalling. The reason why is better explained in this article by Adam Doppelt.
So, if you can’t expire it with a regex, that leaves you the following options for expiry:
- Time based expiry. There are some plugins that add this feature to the file system store. Memcache gives it to you for free, and if you’re relying on this heavily, I’d use memcache.
- Being in one of those good situations where the number of possible fragments is known, and you can expire them each explicitly. This didn’t work for me in some of the critical areas that I needed to cache.
- Storing a list (in the database) of the caches that you built up which need to be expired if a certain thing is changed.
Don’t Expire, Just Render it Obsolete
This article so far doesn’t really capture how much pain this stuff has caused me, and I’ll try and cover some other points in other articles. For now, lets jump straight to the good bit.
I’ve read a lot of articles on caching, but this is the best, go read it:
The Secret To Memcached – by Tobias Lütke
Tobias also struggled with expiry, and his solution is to take advantage of the fact that if you’re using memcache, then you can never cache too many items. The oldest ones get pushed out when you run out of of space.
So, here’s my first bit of advice. If you’re building a real site, go straight to memcache. If you’re not building a site for big traffic, don’t cache, just optimise any really stupid queries that are giving you trouble. If you’re using memcache, be sure to run monit too.
So, we’re running memcache, and we don’t want to expire our fragments. Instead, try and find fragment keys that don’t need to be expired, because they will be replaced if the data changes.
The one I’ve just implemented was a stream of recent activity, much like facebook. Each little type of activity had a different template, and the rendering of this took up a lot of time. Fetching the data was also non-trivial. However, if I wanted to expire a cache of the activity stream, then I’d need to do so anytime something occured on the site that triggered an activity for this user.
<% cache ["activity_stream", @latest_activity.id, @user.id].to_s do %>
... render the activities
<% end %>
There’s the code. The real example had a few more parameters, but you can see here that the magic is in the fact that I used @latest_activity.id as part of the key. I’m still having to query that from the database, but it’s pretty simple to do, and all I really need is one little integer, instead of all the activities and their associated objects. If a new activity is created for this user, then this id will have changed, and so I’ll me asking for a different cache key.
Benchmarks for this are looking really good. I’ll let you know how it goes in the wild, but I’m not expecting too many problems as most of my previous troubles have been to do with expiry, and this code doesn’t need any expiry. No sweepers, no regular expressions. It’s simple, and it scales.
How Do You Freeze Your Rails Version?
Posted by Craig Ambrose on May 15, 2007 at 03:45 AM
The first time a Ruby on Rails release introduced a few backwards incompatible changes, it caused a bit of an uproar (the one I’m thinking of was 1.0.1, which broke Typo, amongst other things). Suddenly, everyone realised that we were writing web applications that weren’t just dependent on Ruby on Rails, they were dependent on a particular version of Rails.
So, these days we all know that we need to lock our Rails applications into using a set version. There are two ways of doing this, and each has its pros and cons.
[1] Freezing Rails in the Vendor Directory
If your application finds a copy of Rails in the ./vendor/rails directory, then it will use that instead of whatever rails gems are installed on the system. This is really handy, and it’s the method that I initially adopted for all my sites when the rails 1.0.1 problem occurred. At first, I was copying the Rails files into my project subversion repositories, but I quickly learned that the easier way to do it was with subversion externals.
From the root of your rails application, execute:
svn propedit svn:externals vendor/
The subversion externals properties from that directory will now be editable in your default editor. Each line in this folder represents a link to an external repository, with the name of the local folder to export to first, then the repository URL. So, for example, to use rails version 1.2.3, we would use the following.
rails http://svn.rubyonrails.org/rails/tags/rel_1-2-3/
You can find the URL of that tag, or any other rails tag or branch (or the trunk itself) by browsing the rails repository.
The upside of this method, is that your application is safe to deploy on almost any machine with the correct ruby stack installed, even if the rails gems are not present.
The downside, is that rails is pretty large, and every time you do a svn checkout, it has to grab it all. In particular this slows down your svn deploy command. If your using deprec, it can really slow down your svn setup command too, because the file permissions have to be set on all those folders.
[2] Specifying the Rails Version in environment.rb
This is the accepted method now, and is done by default in all new rails applications. To lock in that same version number, all you need to do is add the following to your environment.rb, if it isn’t already present:
RAILS_GEM_VERSION = '1.2.3' unless defined?
This means that rails will load the gem for rails 1.2.3, if it exists. If it doesn’t exist, it will throw an error, rather than run your application. Remember that gem doesn’t remove old versions when it installs new ones, so even if a newer version of rails is installed on the server, if the correct version was there once, it should stay better.
These days, I’m moving my sites over to using this method, for the reasons of hard disk space and speed that I mentioned above.
The only downside is, I need to make sure that the right version of the gem is installed. However, most applications have other gem dependencies too, apart from just rails.
How do you ensure that the required gems are installed on a new server? If you’re logging in to the server manually and installing the gems, have a think about automation. Surely this is the province of the cap setup task. It might seem easy to do it now, but don’t forget that you might have to do it again when you move servers in six months time. Also, when that happens, you might be in a hurry. You might also be migrating to a multi-server cluster.
Don’t wait, automate now. Wack all your setup needs into cap tasks. Consider using before or after filters on the cap setup action. Please note that if you’re using deprec, it already adds these filters, so you’ll need to use an action called after_after_setup. This problem is fixed in capistrano 2, which should be out as a stable release very soon, but until then, you can do it the hard way.
