Pondering the Cloud – infrastructure is boring

Welcome Back!

One thing I find quite fascinating with the current industry obsession with “The Cloud” is that fact that it’s about infrastructure.

Infrastructure is boring.

A lot of the enthusiasm seems misplaced. A bit like seeing the Model-T for the first time and being excited by the Freeway. Many (if not most) of the “cloud” startups seemed focused on this infrastructure level … building systems like Amazon’s EC2 and the RightScale management suite. And of course, this is perhaps understandable – we have to have the infrastructure first and the payoff for the winners of this race will be big. Microsoft and Windows big.

But.

The real excitement is in what happens next …

Where does software go when computing is in the cloud?

CloudKit, pondering the future

I ponder the future and trawl the web.

Recently in my travels I found the awesome CloudKit.

CloudKit provides schema-free, auto-versioned, RESTful JSON storage with optional OpenID and OAuth support, including OAuth Discovery.

Cloudkit is similar in some respects to CouchDB – date is stored in JSON, access via REST.

I looked at CouchDB in some detail for a project I have in the works and was very impressed with it’s capabilities, particularly the ability to query the datastore using JavaScript and a Map/Reduce algorithm.

In the end, I have decided that the benefits of persisting* with MySQL (or PostgreSQL) are too great to abandon relational technology just yet – the Rails ecosystem has such good support for these technologies in Active Record.

* And yes, pun intended.

Collecting performance metrics in Rails

Further to my earlier comment that:

When discussing performance no opinion should be accepted without a metric

One of the great things about Rails is there is a plugin for just about everything and collecting performance metrics is no exception. Most of this is because of Ruby and it’s incredible meta-programming flexibility (but that’s another story). Rails has built-in support for some basic performance monitoring on the request stack (although in recent versions this has been extracted into a plugin), but there are some excellent alternatives.

RPM by NewRelic:

New Relic RPM is a Ruby on Rails performance monitoring application that lets you see and understand application performance metrics in real time so you can fix Rails problems fast. RPM is intuitive. It’s granular. And, it’s a 10-second Rails plug-in install.

metric_fu on GitHub:

metric_fu is a set of rake tasks that make it easy to generate metrics reports.  It uses Saikuro, Flog, Rcov, and Rails’ built-in stats task to create a series of reports.  It’s designed to integrate easily with CruiseControl.rb by placing files in the Custom Build Artifacts folder.

There are some great screencasts on Scaling Rails available from New Relic’s Rails Lab. And yes, scale is orthogonal to performance, but some of the discovery techniques are the same.

Updated:

A commenter has correctly pointed out that metric_fu is not a performance anaysis tool as such. I guess the subtext of what I am saying is that tools like these help track complexity – and complexity is often one of the underlying causes of performance issues.

You have no right to your opinion on performance

I was discussing with a friend about how to approach some performance issues in the application he works on (for a quite large company you have probably heard of).

As is typical in any contact with performance issues, the problem isn’t particularly with the code (as in, it can probably be fixed), but with the ongoing discussions with the team.

So much of performance is really voodoo and superstition, and everyone has their own beliefs. Some are valid, of course, but in my experience there is always someone going on about using single (‘) instead of double (“) quotes or something similar.*

This leads me to the only rule you really need in these situations:

When discussing performance no opinion should be accepted without a metric

If you don’t have any metrics you both don’t know what your performance issue actually is, and without knowing what your problem is you can’t possibly have a solution.

* In PHP, a string defined ’string’ is faster than “string” because the latter will interpolate and render variables. However, this level of “faster” (or ‘faster’) is so small as to be irrelevant and any discussion is a distraction.

The Big Ball of Mud

I have been revisiting one of my favourite explorations of software architecture: The Big Ball of Mud.

The key take away for me is the following message about developing features first:

You need to deliver quality software on time, and under budget.

Therefore focus first on features and functionality, then focus on architecture and performance.

Gonzo Software Development

I would like to introduce you to an idea of mine:

Gonzo Software Development

The ideas here are half-baked, half-formed and largely not thought out at all.

As you shall see, that’s mostly the point.

I think this is a start of a manifesto.

It’s a vibe I have, a feeling about the way to do things, based on my experience over the last 10 years and the last 18 months in particular – an independent developer, working as a lone-wolf and an occasional team player and exposed to maybe a dozen organisations in various states of decay, chaos and excellence, and occasionally all of these together.

From Wikipedia on Gonzo Journalism:

Gonzo journalism is a style of journalism which is written subjectively, often including the reporter as part of the story via a first person narrative. The style tends to blend factual and fictional elements to emphasize an underlying message and engage the reader …

Gonzo journalism tends to favor style over accuracy and often uses personal experiences and emotions to provide context for the topic or event being covered. It disregards the ‘polished’ edited product favored by newspaper media and strives for the gritty factor …

In other contexts, gonzo has come to mean “with reckless abandon,” or, more broadly, “extreme”.

Gonzo Software Development is agile development performed with reckless abandon. It’s an approach that says good enough is good enough, and riffs fast and furious to discover the “good enough”.

Gonzo Software is subjective, everyone has the tools to create and connect.

Gonzo Software Development is not afraid of code.

Or humans.

Gonzo wants the humans to not be afraid of the code

Gonzo doesn’t pretend the human can be taken out of the process.

Gonzo wants to help the human be smarter, faster.

Gonzo, to para-phrase, is many small pieces, loosely and intelligently connected.

Gonzo Software Development is a work in progress.

Watch this space.

Filtering and Named Scopes

I have been delighting in the power of Active Record’s Named Scopes and recently discovered is a technique for cleanly adding user-driven filtering to Active Record models using Named Scopes and a little bit of Ruby magic.

Named Scopes provide a clean way of adding finders to your Active Record Models – collecting complex finder logic into granular methods that can then be chained together to perform complex combinations of queries. Named scopes are eminently testable as each defined scope can be tested individually, as well as the actual combinations of scope-chains.

Take the following example from some recent production code:

named_scope :by_author, lambda { |*args| {:conditions =>; ["author_id = ?", args.first] }}

named_scope :by_state, lambda { |*args| { :conditions => ["state = ?", args.first || "published"]
}}

named_scope :by_date, lambda { |*args| {:conditions => ["published_at BETWEEN ? AND ? OR updated_at BETWEEN ? AND ?", args.first[0], args.first[1] || Time.now, args.first[0], args.first[1] || Time.now]
}}

As you can see, some quite complex logic can be wrapped into the named scopes, including parameters.

The scopes can be chained, which wraps everything into a single query:


Model.by_author(author_id).by_state("published").by_date(10.days.ago)

Once you have your named scopes set up, you can add some magic to dynamically chain them for user-based filtering.

What follows is from Caboose’s: The awesomest filter and sort ever

In the controller I set up some code to grab incoming parameters and pass them to the Model (in this case called BlogPost).


filter_opts = {}
filter_opts[:page] = params[:page] || 1
filter_opts[:author] = params[:user]
filter_opts[:state] = params[:state]
filter_opts[:date] = [start_date, end_date]

@blog_posts = BlogPost.find_by_filter(filter_opts)

I have removed some of the processing logic here, but the idea is that the user selects from a range of filtering options in the User Interface (selecting a date range, for example) and the controller grabs, cleans and validates these as appropriate and passes it through to the model.

The find_by_filter method is where all of the magic happens. We add the valid scopes to an array, and then chain them all together using inject.


def self.find_by_filter(opts = {})
scopes = []

scopes =>; [:by_author, opts[:author] ] if opts[:author]
scopes => [:by_state, opts[:state]] if opts[:state]
scopes => [:by_date, opts[:date]] if opts[:date]

order = opts[:order] || "published_at DESC"
page = opts[:page] || 1

scopes.inject(BlogPost) {|model,scope|
model.scopes[scope[0]].call(model, scope[1])
}.paginate(:all, :o rder => order, :page => page)
end

The final line is the magic. Using inject with the model as the accumulator, basically emulates the chained call we saw earlier (by_author.by_state.by_date), but with the added advantage in this instance that only the scopes with the relevant options (defined in opts) are called by the find_by_filter.

As you can see, not only are the named scopes chained together, but I am adding paginate for good measure. Records are cleanly paginated

For more information on named scopes see
Ruby on Rails Active Record Guide
.

Reboot

I am rebooting.

New vision.

New direction.

New ambition.

A work in progress.

And it starts now …