Ruby security pitfalls, and how to avoid them
Ruby is a very versatile language. It combines the simplicity of an elegant syntax with powerful features such as support (and encouragement) for monkey patching. Thanks to the popularity of the Rails framework, Ruby is one of the top ten languages in use today. With so many people writing Ruby code, there are a lot of common mistakes that developers could make with serious security implications.
Sharp knives
From its inception Ruby has, quite rightly so, built a reputation for being packed with features focused on developer happiness. A famous quote about Ruby, from the Rails doctrine, goes a little something like this.
Ruby includes a lot of sharp knives in its drawer of features. There’s nothing programmatically in Ruby to stop you using its sharp knives to cut ties with reason.
That's a fairly accurate assessment! When you provide sharp knives, it's only a matter of time before sooner or later, someone ends up seriously hurt.
Let's take a look at some of the common ways Ruby, and Rails, developers can do a lot of damage by overlooking some important security considerations. We'll also see how that damage can be prevented with some minor tweaks and adjustments to the code.
Unsafe deserialization
It's always a bad idea to execute anything that the user might have submitted. Serializing and deserializing user input does not seem like such a bad idea because no code is being run. But it is! In fact, unsafe deserialization is one of the OWASP Top Ten, a basic checklist for web security.
Ruby's built-in YAML library, based on Psych, has support for serializing custom data types to YAML and back. See this serialization code here and the YAML it produces.
Deserializing this YAML gives back the original data type.
As you can see, the line -- !ruby/object:Set in the YAML describes how to re-instantiate objects from their text representations. But this opens up a slew of attack vectors, that can escalate to RCE when this instantiation can execute code.
Solution
The solution is to use safe-loading. It's a very small change, just using the YAML::safe_load function instead of YAML::load, but it completely blocks loading of custom classes.
Standard types like hashes and arrays can still be serialized to and deserialized from YAML documents just like before, which is mostly what you or most people wanted to do in the first place.
IO hijacking
If your app needs to read or write user-specified files from the disk or make API requests to user-specified URLs, you might be familiar with code that looks like this.
This looks fairly innocuous, right? Ask the user for a file name and print out its contents line-by-line till the end of the file. And it works as you would expect too! But if you don't see the problem, you likely didn't read the documentation thoroughly enough.
You see, the Kernel::open function that we used in the code snippet above has one additional feature, it can also be used to spawn processes and pipe output from them. See what happens when you pass an input that starts with the pipe (|) character.
This executes the date command and pipes the output from that command back to Ruby, which reads it just like it would read a file. Of course a malicious user would be executing commands much more destructive than date. This is basically RCE, a remote user executing code on your servers, with the full privileges available to your web-app.
At the risk of sounding repetitive, never ever run code that comes from an untrusted source such as a user but if you have to, you can at the very least limit their access to the resources that you've chosen.
Solution
Never use Kernel::open. The better alternative is to use File::open or URL::open or IO::open, whichever the use may be. Let's try this again, this time with File::open instead of Kernel::open and see how it works differently. It's only slightly different but so much more secure. No more access to shell commands.
SQL injection
We've all heard of SQL injection. If you've been living under a rock, this is what it looks like. It's what happens when you directly use user input from the frontend, without sanitization, in an SQL query on the backend.
Let's say you have a column users in which you're searching by name entered in a form. Here is how you would do this in Ruby.
But this is very very unsafe. Consider this malicious input from a nefarious user.
The query, as seen below on the first line, becomes faulty due to the malicious input, as seen below on the second line. The WHERE clause is now always true and gives you every user in the table.
As you can see this is a pretty severe error and another one of OWASP Top Ten.
Solution
The solution is again very simple. Use one of these two ways.
Both of these approaches are designed to sanitize the value before generating the query and are therefore impervious to this attack.
Auto incrementing IDs
When you create a Rails model, it creates the id field for you. It's generally an auto-incrementing integer field. For the most part, this is a sufficient implementation. Integer IDs have the advantage of being simple and since they keep incrementing, there's no chance of collisions. But what they offer in simplicity, they lose in security.
- They make scaling harder than it already is. If a lot of servers operate independently of each other, it's likely that they will end up assigning the same number to different resources.
- They make it easy to identify how many objects you have. A competitor can just sign up and see their own ID to know how many customers you have and can even estimate business metrics like growth rate.
- They make your data easy to loop over and steal. A malicious user could in theory just enumerate and iterate over the entire list one by one and fetch all the resources that haven't been restricted.
- They can leak sensitive information. Unlisted resources that are only visible to link holders rely on the unguessable nature of these IDs to work and would break if these IDs were integers.
These are not hypothetical scenarios by the way. There are many sites that just do this, and consequently many sites have in the past exposed sensitive personal information to people who tried changing the numbers in the URL.
Solution
If you're building a web application in 2020, use UUIDs. Rails makes this way too easy. With this single change, your models will all use UUIDs as primary keys.
Or if you want to use integers, use large ones that are chosen at random. Let's take a look at YouTube videos, more specifically let's look at the URL structure of YouTube videos. This is what a YouTube video URL looks like.
The string of seemingly random letters at the end, that's the video ID. It's basically a random number between $0$ and $64^{11}$ (yes, that's over 73 quintillion numbers!) encoded in Base64. Not exactly a UUID, but not an auto-incrementing field integer field either. Why doesn't YouTube use simple integer IDs that go 1, 2, 3... and so on? Well, now you know why.
Slicing tomatoes
Yes, Ruby's power of expression makes it easy to introduce security issues but the solution is not to switch off these features or migrate to a different language. This brings us back to the Rails doctrine quote from earlier. The very next sentence to that quote is this.
We enforce such good senses by convention, by nudges, and through education. Not by banning sharp knives from the kitchen and insisting everyone use spoons to slice tomatoes.
The key takeaway from all of these examples is to never trust your users. User provided input should not be serialized-deserialized, evaluated or rendered directly. The only right way to be safe is to be careful of what you write and thoroughly audit what you have written.
Static code analyzers
Static Code Analysis tools such as linters and vulnerability scanners can help you find a lot of issues before they get exploiting in the open. A good linter like RuboCop can find and notify you of security problems like unsafe deserialization and IO hijacking, and can offer suggestions as to possible fixes. Brakeman, a popular vulnerability scanner can find SQL injection among a laundry list of other possible security lapses. Both of these are free and open source tools that you should absolutely be using in your development and CI workflows.
DeepSource
You should also consider automating this entire audit and review process using code review automation tools like DeepSource that scans your code, on every commit and for every PR, through it's linters and security analyzers and can automatically fix a multitude of issues. DeepSource also has its own custom-built analyzers for most languages that are constantly improved and kept up-to-date. And it’s incredibly easy to set up!
Who knew it could be so simple?
Have fun slicing tomatoes, and be careful not to hurt your fingers in the process!