Image management that will scale
Posted by Craig Ambrose on November 27, 2007 at 02:59 AM
There is a lot of conflicting information around about handling user uploaded images in rails applications. I’ve done it a number of different ways, and the good news is that it’s not too hard to move from one system to another. However, dealing with scaling issues is a pain and it’s nice to get it right first go. So, here are some problems that I’ve encountered recently, along with some solutions.
Files Per Directory Limit
Depending on which OS you use for hosting, you’ve probably got a limit to the number of files (or directories) you can put inside a given directory. It’s usually about 32,000. While this seems like a long way off, if your site accepts user content then hopefully this will eventually become a problem for you. There have been various talks and articles written about different hashing systems for file names, but it’s worth mentioning that this is basically a solved problem, and you shouldn’t have to tackle it yourself.
If you’re still using file_column, as I am for a few things, then this one might bite you. The simplest solution, I think, is to migrate to attachment_fu. The file system store for attachment_fu implements file name based hashing, and the s3 and database stores don’t suffer from the problem at all. Also, the way in which attachment_fu handles pluggable storage classes means that you could also slip in your own custom storage system later without having to change the way that you use attachment_fu in your models.
If you’re thinking of making the switch, here’s an article I wrote on migrating from file column to attachment_fu.
RMagick Memory Leaks
RMagick is really handy, and so just about every rails image handling tutorial on the internet recommends it’s use. I’m using it all over the place. My advice to you, is don’t ever do this. It turns out that RMagick leaks memory every time it manipulates an image. I haven’t measured the amount myself, but I’m told it’s quite a bit. Certainly I’ve been having resource consumption problems with scripts using RMagick heavily. So, say goodbye to it.
DHH recommended just using the image magick binaries manually. That’s basically a good idea, but a slightly easier way of doing that is to use the mini_magick gem. Mini magick provides a ruby API, but under the hood it just calls the image magick command line tools. Attachment_fu comes with a mini magic processor, so you can just add ”:processor => :MiniMagick” to your call to has_attachment and you’re in business. Khamsouk Souvanlasy wrote a good tutorial on using mini_magick with acts_as_attachment.
Cropping
The one thing I noticed in using attachment_fu instead of file column is that file column resizes images and crops them nicelly, the way that you would expect. By default, attachment_fu tends to stretch them. This has been covered better by other people, so I just want to mention it because stretching is almost certainly what you want, and until Rick fixes it, I’d suggest making a small change to the plugin yourself. There are a number of articles on the subject, but I think the best one is probably over at toolman tim’s blog.
Don’t just go with Tim’s solution though, have a look at the comments, and you will find options for the different image processesors. I used “labrat’s” suggested fix for mini magick (paste here)
Amazon S3
Amazon S3 appears to be a great solution for handling user generated images, and I’m starting to use it a fair bit. One word of warning however, is that I’ve already started to encounter an occasional communication error with amazon, as discussed in this thread and I don’t yet know how serious it is or how easily fixed. I’ll post some more on this subject when I’m better informed.
Migrating from file_column to attatchment_fu
Posted by Craig Ambrose on September 09, 2007 at 03:25 AM
About a year ago, file_column was one of the most popular plugin for storing files, particularly images, in rails applications. These days, the most popular plugin is Rick Olsen’s attachment_fu.
The main advantage of attachment_fu is it’s ability to store the files either on the file system, in binary fields in the database, or on amazon s3. The pluggable nature of the code also makes it fairly easy to support some other storage service.
I’ve avoided moving over because file_column actually provides more comprehensive image manipulation features, but there comes a time in the life-cycle of most applications where file system storage just doesn’t cut it in a multi-server environment.
There are already tutorials on using attachment_fu, I’m presuming that you already know how to use it. I’m just going to help you make the switch. Lets start with some code, here’s my migration for moving across the data:
class CreateProfilePhotosFromFileColumn < ActiveRecord::Migration
def self.up
for profile in Profile.find(:all)
image_filename = select_value "SELECT image FROM profiles WHERE id = #{profile.id}"
unless image_filename.blank?
image_path = RAILS_ROOT + "/public/system/profile/image/#{profile.id}/#{image_filename}"
image_file = File.open(image_path, 'r') photo = ProfilePhoto.new(:profile_id => profile.id)
photo.set_from_file(image_file)
photo.save!
end
end
end def self.down
execute "DELETE FROM profile_photos"
endend
In this example, I previously used file column in the field called “image” of my Profile model. Now, I have a new model called ProfilePhoto, which belongs_to Profile.
Although I’m happy to loop through all profiles using regular active record finders, note that I didn’t use the image method of profile. I know that after this works, I’m about to remove everything to do with file column, and so to play it safe, I use assert_select to fetch the file column image name directly from the database. This is ugly, but good policy in general for producing migrations that keep working after the code changes.
The other trick here is the call to “set_from_file”. This method doesn’t exist in attachment_fu, and was the first (of several) glaring omissions that I noticed in this plugin. To make this migration work, you’ll need to make a few changes to attachment_fu.rb yourself.
The following goes inside the InstanceMethods module:
def set_from_file(source_file)
source_file_extension = File.extname(source_file.path).reverse.chomp('.').reverse
source_file_name = File.basename(source_file.path)
self.content_type = self.class.mime_type_from_extension(source_file_extension)
self.filename = source_file_name
self.temp_data = source_file.read
end
The following goes inside the ClassMethods module:
def mime_type_from_extension(extension)
MIME::Types.type_for(extension).first.simplified
end
And the following goes at the top of the file:
require 'mime/types'
You’ll also need to “gem install mime-types”, although you will already have this if you installed the amazon s3 library.
This code is not hugely error tolerant. It presumes that all your records with file columns contain valid image files that are going to be accepted by your attachment_fu model. It also assumes that you set up attachment_fu correctly of course.
If it works, I would then add further migrations to remove the old file_column field from the Profile model, and to remove the file_column files themselves from the hard disk.
You’ll probably find the set_from_file method to be a valuable addition to attachment_fu for other purposes too. Our applications often receive their data in ways other than just HTTP post, and being able to save a file object seems like a pretty obvious addition.
