A Primer on Ruby C Extensions Part 2 - FFI

In my previous post I covered building a simple C extension written in Ruby. There are times however that you may need to call functions defined in an existing native library. Fortunately there is a tool precisely for this job. Enter FFI, or Foreign Function Interface. As defined by Wikipedia “A foreign function interface (or FFI) is a mechanism by which a program written in one programming language can call routines or make use of services written in another”. Thanks to the team working on Ruby FFI we have a relatively easy set of tools to work with.

First let say I ther is already a native library available that has an already defined method for determining if a number is prime. Lets call this libprime.so. Tying into this library is actually quite simple using Ruby FFI. The first step obviously is installing it. gem install ffi should do the trick. Now we need to take a look at the code defined in libprime.so.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#include <stdlib.h>
#include <stdint.h>
#include <math.h>

int prime(register int n)
{

  if (n == 2)
    return 1;
  if ((n & 1) == 0)
    return 0;

  register uint64_t sqrt_n = ((uint64_t)sqrt(n)) + 1;
  register uint64_t i=3;
  for (i; i<= sqrt_n; i+=2) {
    if (n % i == 0)
      return 0;
  }
  return 1;
}

As you can see the method prime is made available. So our next step is to tie into the library using FFI.

1
2
3
4
5
6
7
8
9
10
11
12
13
require 'rubygems'
require 'ffi'
module Primed
  extend FFI::Library
  ffi_lib "libprime.so"
  attach_function :prime, [:int], :int

  def prime?
    return false if prime(self) == 0
    true
  end

end

Looks simple enough right? Let me quickly explain what this is doing. Our first task is obviously to make FFI available to us. We call extend FFI::Library and that gives us all the tools that we need to work with. Second, we need to define the library to tie into, in this case libprime.so. Third, we call attach_function. This takes 3 arguments, the name of the library’s method to call, an array of arguments that the method will take, and the data type the method returns. Finally, we wrap the method to our liking and we’re done. For more information check out the FFI Wiki over at Github.

A Primer on Ruby C Extensions

NOTE: This is a repost of an older blog post of mine from another Blog

I will address one of the primary uses for a C extension in Ruby, speed. Due to it’s very nature, Ruby is slow (as compared to compiled languages like C). It gets the job done, but sometimes it takes it’s sweet time doing it. Sometimes it is necessary to speed things up a bit, and here enter C extensions. There are several methods of implementing extensions, from the generic C extension, to ruby-inline. In this particular article I will focus on the generic C extension.

In this example, I am going to use a fairly inefficient piece of Ruby code I created a while ago for Project Euler (Problem 10) for finding the sum of all primes under 2000000:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class Integer

  def prime?
    return true if self == 2
    return false if (self & 1) == 0
    square = Math.sqrt(self).round + 1
    i = 3
    while i <= square
      i+= 2
      return false if (self % i) == 0
    end
    true
  end

end

numbers = (2..2000000).to_a

numbers = numbers.select { |n| n.prime? }
puts numbers.inject { |result, element| result = element + result }

UPDATE: Changed i on line 7 from 1 to 3, as otherwise it will cause 3 and 5 to return false.

At the time that I wrote this, I was relatively unaware of more efficient ways of resolving prime numbers (such as a euler sieve), however the code still ran under the allotted 2 minute window (52 seconds) so I went with it. Now to speed it up. To write a C extension you need, at a bare minimum two things:

an extconf.rb file – this file is used by ruby to generate the Makefile that is used to compile the extension the source file for the extension (in this case primed.c) Here is a look at these two files for my new version of problem 10:

primed.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include "ruby.h"
#include <stdlib.h>
#include <math.h>

VALUE Primed;

VALUE method_prime(VALUE obj, VALUE args)
{
  register uint64_t n;
  n = NUM2INT(obj);
  if (n == 2)
    return Qtrue;
  if ((n & 1) == 0)
    return Qfalse;

  register uint64_t sqrt_n = ((uint64_t)sqrt(n)) + 1;
  register uint64_t i=3;
  for (i; i<= sqrt_n; i+=2) {
    if (n % i == 0)
      return Qfalse;
  }
  return Qtrue;
}

void Init_primed() {
  Primed = rb_define_module("Primed");
  rb_define_method(Primed, "prime?", method_prime, -2);
}

extconf.rb

1
2
3
4
5
6
7
8
9
10
11
# Loads mkmf which is used to make makefiles for Ruby extensions
require 'mkmf'

# Give it a name
extension_name = 'primed'

# The destination
dir_config(extension_name)

# Do the work
create_makefile(extension_name)

First let me explain primed.c. The objective of this extension is to determine whether or not a number is prime, so that an integer can call x.prime? and return true or false. It is essentially identical to the method used in the pure ruby script above. One of the first thing you may notice is this line:

1
VALUE Primed

VALUE is a data type defined by Ruby that represents the Ruby object in memory. It is basically a struct that contains the data related to the object. In this case, the object will represent the Primed module in ruby, so it will contain data about the instance methods, variables, etc. for that module. All Ruby objects are represented in C by VALUE, regardless of their type within the Ruby VM, anything else will likely result in a segfault.

Next we define the actual method to calculate whether the value is prime. Note that because we need to return a Ruby object, we set the return type as VALUE as well. QTrue and QFalse are directly representative of true and false in Ruby, and also return correctly within C (QTrue will evaluate as true, QFalse will evaluate as false).

Finally we see the Init_primed method. Every time a class or module is instantiated within the Ruby VM it calles Init_name. It is here we actually instantiate the Primed module and bind the method_prime function to the Ruby method prime?. Both functions used are pretty self explanatory as to what they do, except for the last argument used in ruby_define_method which is essentially the arity or number of arguments to expect in the Ruby method. In this case, -2 actually causes Ruby to send back self as the first argument to the method_prime function, and an array of any other arguments as the second.

Now we have all of our code. The last thing to put in place is extconf.rb:

1
2
3
4
5
6
7
8
9
10
11
# Loads mkmf which is used to make makefiles for Ruby extensions
require 'mkmf'

# Give it a name
extension_name = 'primed'

# The destination
dir_config(extension_name)

# Do the work
create_makefile(extension_name)

Pretty simple right? Now when you call ruby extconf.rb it will generate a Makefile that you can use to build the extension. And the final result? Using the C extension the code runs in just under 3 seconds. Still not really efficient, but it demonstrates the point. When Ruby’s speed is the bottle neck, using C is a viable and easy option.

Go to Part 2 - FFI

The Wrong and Right of Rewrite

Almost every developer has faced it. Staring down code with frustration building until something in side you just snaps. Suddenly you see the light. This code is crap. Useless. There is too much duplication. It’s too hard to understand. There’s no documentation and the guy who wrote went missing in a jungle Safari in the Congo. It’s time to rewrite. Or is it?

The rewrite is an easy trap to fall into. You have to ask yourself though, am I considering a rewrite for the right reasons. The answer lies in the ability for you to be honest with yourself. First, a rewrite is NEVER a snap decision. If it took you less than a few days to determine you need to rewrite then you are %99.99 likely to be doing it for the wrong reasons. Having recently found myself considering a rewrite, I decided to stop myself and think it through. Here is a small list of various reasons for a rewrite. Some good, some bad.

The Wrong Reasons (or the most commonly used excuses)

  • I hate PHP. Or whatever language it is you find a particular distaste for. If this is your reason, you have probably spoiled yourself (I’m talking to most Ruby developers, including myself). Changing languages should only be a factor if you already have legitimate reasons for doing a rewrite.
  • I don’t want to learn the code. The actual typical excuse is that it was just “done wrong”. I think this is the most common excuse for a rewrite. Face it, working with other peoples code is a part of being a Software Engineer.
  • The person that wrote the original code sucks. This is often closely linked to the above. They suck because I cannot understand it.
  • It’s not the way I would have done it. Well of course it isn’t. Code style is almost as unique to a Software Engineer as a fingerprint.

The Right Reasons

  • Unusable in it’s current state. If the codebase is unusable a rewrite is justified. Keep in mind that unusable code doesn’t mean unreadable code, or code you don’t understand, or code you don’t want to debug.
  • NOBODY can read it. Okay, if it is that unreadable then it is likely that a rewrite is justified.
  • The codebase cannot support the current requirements of the software, either due to language limitations or the software was created in a way that prevents th requirements being met. A good albeit controversial example is Twitter’s move from Ruby to Java.

Lightweight API Authentication With Lua and NGiNX

I recently was looking for a way to easily handle API authentication with as little overhead as possible. So first a little background. At Scan we are following a service oriented structure, where each service defines and maintains its own API endpoints. The “API” is in fact nothing more than a glorified proxy, that sends back the request to the appropriate service which then handles the logic. This is all handled through NGiNX. More on the pros and cons of that structure in a future post. That being said, my goals were simple:

  • Little to no long term code maintinence. I want to be able to forget it is even there.
  • Fast. After all, it is authenticating every API request.
  • I wanted it to behave almost like a Ruby on Rails before_filter call. It should authenticate the key before the API endpoints are actually even hit.

Enter the lua-nginx-module. Lua is an extremely light weight fast scripting language made popular by its ability to be embedded in almost anything, even in other scripting languages. In this case, the Lua runtime can be compiled directly into NGiNX. The lua-nginx-module conveniently adds an NGiNX configuration option called access_by_lua which seems to fit all of the above requirements nicely:

  • Little or no maintinence - the entire authentication script run by NGiNX is a total of less than 70 lines of code (with comments and logging removed), and depending on requirements could be done in less.
  • Fast - Lua is on average five times faster than ruby, and is embedded in the actual webserver. It also runs concurrently. Each NGiNX worker gets it’s own Lua runtime.
  • Prevents access before it hits the first endpoint - The access_by_lua acts in the same way as a simple basic authentication directive would. It will stop the call from ever continuing to the API endpoints if authentication fails.

The best thing is that it is flexible. How the lua authenticates is entirely up to the developer. It can read from a file, call a back end service, or even read directly from a database. It is entirely up to you.