Using Model factories with RSpec
Posted by Craig Ambrose on February 23, 2008 at 10:12 PM
I had a lot of questions and concerns about RSpec when I first made the switch a few months ago. I’d seen a few talks on BDD in general, and watched the peepcode screencasts on rspec, and I could see that BDD tended to encourage a fairly good style of testing. I managed to find answers for most of my questions, so here’s a little list of the challenges and resolutions that I faced when I made the switch.
When I first started on rails, like a good little agile developer, I mocked out everything for my unit tests and ensured that I never actually hit the database and thus never required fixtures. I used the mocha library predominantly, but after a month or so of this I had gotten totally tangled up.
The problem was that active record objects are a bit like an iceberg where only a part of them is visible in the code, and the rest is lurking below the murky waters of your database. My tests were missing heaps of bugs caused by queries generating invalid SQL, but reliance upon properties which may or may not exist depending on the database schema, and were also far too complicated as they required lines and lines of setup code to handle all the wacky ways in which a model could interact with a database.
My solution eventually was to give in, and test things the way DHH does. Not particularly elegant, certainly not “correct” in terms of what I’d been taught regarding how to test, but it did work. The upside of these tests which often integrated a number of layers together was that I got great coverage. The downside was that I’d often get a raft of test failures if I broke a single line of code, and some things were just a bit too much effort to test and tended to motivate me not to bother. Also, the dependence on fixtures meant that as the test situations got more complex, I’d have to add more fixtures, and in doing so break other tests.
When moving to RSpec, I threw all this away, and started again on mocking everything out. RSpec helped out with a few nice extra methods, like mock_model, which can sensibly provide a mock ActiveRecord object that has the bare essentials already (like the id and to_param methods). This time I did better, but I still got a little tangled up, and I found it really hard to test methods where I was querying for record. With a mock based approach, all I could test was that I’d sent the query that I expected, not that it was actually valid SQL or that it would fetch any sensible results.
The resolution, as it turns out, lies somewhere in between.
I use mocking pretty heavily still, particularly when I’m creating or updating records. But, I also often want to deal with real data, particularly when it’s being fetched rather than created. Fixtures were a bad idea, because they created a great big data set that had to be valid for all tests. There are some plugins to provide different sets of fixtures for different scenarios, but I think that having the data used for a test in a different file is also downright confusing. Setting up for each test needs to be easy.
The solution for me was the use of a model factory, which is a pattern in some use amongst my colleagues at Cogent. I’d left the worlds of C# and C++ with a bit of a dislike of factories, being a pattern that, like dependency injection, seemed to add a great deal of complexity to a problem that is much more easily solved in a dynamic language. However, a simple model factory for testing is a different kettle of fish.
Lets say I’m testing that I can’t create a user if their email address already exists. If I was using fixtures, I’d create a user with the email address I was going to try out, but that would affect all other tests. If I created a user directly in the test then I’d need to update my test each time I changed the information needed to create a valid user. Instead, a model factory makes it look like this:
describe User, "when being created" do
it "should require email address to be unique" do
model_factory.user(:email => "craig@craigambrose.com")
user = User.create(:email => "craig@craigambrose.com")
user.should_not be_valid
user.errors.on(:email).should == "must be unique"
end
end
The user method on model_factory creates and returns a valid user. It can be called as many times as I want, and it will always return a valid user. If User contains fields that must be unique, then the data I use to populate it contains integers which increment each time. The factory methods also take arrays of options which can be used instead of these default values, which I use whenever I want to set an attribute that I care about in the test. This way, if I change what is required to create a user, all I need to change is the model factory, not every single test.
I don’t use the model factory all the time. I still make heavy use of mock_model. For each case, I try and determine which method will yield the simplest test that actually forces me to write the code that is needed, rather than just appearing to give coverage over the lines of code. Sometimes I use some mocking and some concrete objects in the same test.
I haven’t provided the code for the model_factory itself. It’s very straightforward and I’ll leave it as an exercise for the reader. Drop me a line if you have a particularly clever implementation that you’d like to share or are interested in seeing some of my code.
Storing Images from remote URLs
Posted by Craig Ambrose on February 22, 2008 at 12:17 AM
Here’s a handy little modification to attachment_fu to allow the model to have it’s file data set from a remote url. I’ve been doing this for a while but recently altered it to compute the mime-type from the downloaded file, rather than just basing it off the extension. This is because some web sites host files (such as images) at urls without a file extension.
As this is a monkey patch for attachment_fu, I follow Chris Wanstrath’s evil twin convention and put it in a plugin called attachment_fu_hacks.
This code is also dependent on the mimetype-fu plugin. Mimetype-fu calculates the mime type of a file using the *nix “file” command, rather than using file extensions as the mime-types gem does. If you’re using OSX, then this wont work unless you change the two occurances of file -bir in mimetype_fu.rb to file -br —mime (which is compatible with both OSX and linux). I’ve submitted that change to the author so hopefully it will be incorporated into future versions..
respond_to.email, or how to handle incoming emails in rails RESTfully
Posted by Craig Ambrose on February 09, 2008 at 04:49 AM
There’s a bunch of information around on how to handle incoming emails with your rails application, in particular the wiki page, but I have some concerns with the methods that are being suggested, and in this article I present an alternative which I’ve been trying out and I really like.
Handling incoming email is, in essence, very simple. All you need to do is get the email, which is a big chunk of text, parse it with a ruby email class, such as TMail (which is used by ActionMailer), and perform some action. If you’re only handling a few specific addresses, it might be best to fetch the email via POP3, and I’ve done that before using a daemon to regularly poll the pop account.
POP3 is not a viable solution if you want to handle all email for a certain domain. At this point, we probably want to talk about SMTP.
A Very Short Guide to SMTP
Simple Mail Transport Protocol is pretty damn cool if you ask me. It’s dead simple, basically the client can only say “hello, here’s an email from X to Y”. Just like HTTP, it’s fully push based. There’s no polling, emails get pushed across the internet. Just like HTTP, it has a hole stack of response codes which are of course appropriate to trying to send an email, rather than talk to a web resource.
Using Postfix Mail Filters to Call Ruby
Postfix is a common open source SMTP server. Before I looked at it, it was big and scary. After a few hours of expert help, I wonder what seemed so complicated. One of the basic ways that we can use postfix to push mail to our rails app is by specifying a command like script which gets executed whenever postfix gets an email. This is the first option presented on the rails wiki, and they suggest using a script which calls the receive method of one of your ActionMailer classes.
My Concern
If we’re going to use ActionMailer to parse an email, and then presumably fire off a bunch of ActiveRecord code to make changes to your database as a result, clearly we’re loading the entire rails stack. Every time we get an email we’re loading the entire rails stack. This seems like how we handled web requests back in the day when there was only mod_cgi. No shared resources between requests, a big performance hit for loading all of rails and then getting rid of it each time, and the concern that we can only handle as many incoming emails as we have RAM on our server as the rails code takes up a bunch of memory.
What I Want
I don’t want to have to worry about the resources I need to scale my email server, I already do that with my application servers. I want to handle emails in a way that re-uses an in-memory copy of the rails classes and called be scaled in a predictable way.
That sounds a lot like a mongrel cluster.
We all have one of those already right. So why not handle incoming mail over HTTP? It’s dead easy, it scales well, and the result is really Rails-ish.
respond_to.email
I was hoping to get a plugin out of this. It’d be so handy that people would queue for miles to download it. The trouble is, it’s actually not even enough code to bother, it’s only about three lines of ruby and the same of postfix config. So, lets call this a pattern. I’ll describe how to do it, and you can all run off and do it yourself.
Step One: Install Postfix
Install postfix on one of your servers. For any sizable rails site, I like to have a little VPS just for daemons, cron jobs, scripts, and the mail server, to keep it separate from all the web stuff. On ubuntu, this was as easy as “sudo apt-get install postfix”. For the default configuration type, I chose “internet site”.
Step Two: Setup Your MX Record
For mail to start arriving at your mail server, you need to add a MX record to your DNS which points at the url of your server. Depending on your host, you probably have a web interface to do this, and it’s probably dead easy.
The Magic Script!
Create a file called mail_handler.rb, and pop it somewhere in your rails project. I created a /bin directory for it. Don’t use a rake task, the goal here is not to load in any unecessary stuff. Here’s the contents.
#!/usr/bin/ruby
require 'net/http'
require 'uri'Net::HTTP.post_form URI.parse('http://www.craigambrose.com/emails'), { "email" => STDIN.read }
If ruby is somewhere else on your machine, change the line at the top to be correct (try “which ruby” on that machine to see where it is). I’ve chosen to hardcode in the url that I want to post the email to so that I don’t have to load any other files. If you have more deployment environments to worry about, you might want to put the target url in a yml file and parse it here. Just don’t load your rails environment file, that’s the whole point of this.
Configuring Postfix to Call the Script
In this example, the domain that I want to handle email for is “craigambrose.com”. Everywhere you see this, replace it with your own domain name. Most of the commands below need root access.
In /etc/postfix/main.cf
mydestination = localhost.localdomain, localhost, craigambrose.com
virtual_maps = hash:/etc/postfix/virtual
alias_maps = hash:/etc/aliases
In /etc/postfix/virtual (this is a file, you may need to create this)
@craigambrose.com rails_mailer
The above says to redirect any address at craigambrose.com to the alias “rails_mailer”, which I’ll create next. You could run multiple rails apps of the same server by giving them all unique aliases. On the left, you can use a regular expression to match addresses if you only want to match some of them.
To apply this change to virtuals, run:
postmap /etc/postfix/virtual
In /etc/aliases
rails_mailer: "|/var/www/apps/craigambrose/current/bin/mail_handler.rb"
That’s the alias we created on the left. On the right is the path to my script, change as necessary. The pipe character before the script path means “the following is a shell command, not an email address”.
To apply this change to aliases, run:
postalias /etc/aliases
To apply the main configuration changes to postfix, run:
/etc/init.d/postfix reload
Testing the Setup
I should be able to send an email now to “someaddress@craigambrose.com”. To see it get process by postfix, we might want to watch the postfix info log:
tail -f /var/log/mail.info
When the mail is process, you should see a line like:
to=<someaddress@craigambrose.com>, orig_to=<root>, relay=local, delay=2, status=sent (delivered to command: /var/www/apps/craigambrose/current/mail_handler.rb)
Then, go peek at your rails app logs. You should see that the mail has been passed through by the script. Even if you haven’t written an action to handle it yet, the log entry should be there.
Troubleshooting
If you didn’t see the correct line in your postfix logs, then perhaps there’s a problem with your DNS Set. You could try talking to postfix directly. Mail servers listen on port 25, and you can telnet into them and speak directly. Try “telnet YOUR_SERVER_IP 25” And the try typing in what the client says in the sample SMTP communication on wikipedia with the example address changed to the domain that you want to test. If that works, but sending email didn’t, you’ll need to investigate your DNS setup.
Handling the Rails Action
The target url I put in my mail script was http://www.craigambrose.com/emails, so the mail is going to get POSTed to that resource. With normal rails resource routes, that means that we’re expecting to handle the email in the create action of the EmailsController. That seems very sensible to me. My script puts the unparsed email into params[:email].
To parse it with TMail, all you need to do is:
require 'tmail'
email = TMail::Mail.parse(params[:email])
Alternatively you could pass it to the “receive” method of any ActionMailer derived class, which does the above automatically.
I’ve had some reports that TMail is both a little slow, and also not quite up to parsing all the possible ways that an email might be encoded in the big bad world. That’s a subject for another blog post.
Final Performance Note
When postfix is calling your script, it makes so that only a certain number of calls are occurring concurrently, the default is 20, which seems pretty good to me. If you’d like to tweak this, use the following setting in main.cf (and don’t forget to reload postfix afterwards).
default_destination_concurrency_limit = 30
Acknowledgments
Setting up servers is not my area of expertise. Many thanks to Andrew Snow of Octopus for the postfix help and Pete Yandell for sharing some of the lessons learned on his great mailing list site 9cays
