20 Mar 2017, 22:56

A tour of Rust, Pt. 2

In my last post, I talked about what I think is the significance of the programming language Rust, and why I wanted to try learning it. Today, I take a look at Exercism.io (a sort-of social network for programming exercises) and its series of Rust challenges. These are definitely easy problems (so far); the focus is on learning how to solve them in a language that is new to you. We’ll see where I tripped up while trying to grasp some of the Rust language features, and some Rust I’ve learned so far.

Exercism.io Setup and Workflow

Assuming you also use MacOS with Homebrew, it’s just a couple of steps:

$ brew update && brew install exercism
$ exercism configure --key=YOUR_API_KEY
$ exercism configure --dir=~/Documents/Exercism

And then each new programming exercise is fetched this way:

$ exercism fetch rust		# will download the next exercise, "whatever"
$ cd ~/Documents/Exercism/whatever
$ mkdir src && touch src/lib.rs   # then, work on your solution in lib.rs

Each exercise asks you to write a Rust library that exports a function or two, such that you implement some behavior instructed in the README.md for that exercise. The folder structure provided is that of a Rust “crate”, which is what Rust (or rather, its build tool, Cargo) calls a source package. You will define a pub fn whatev() in src/lib.rs, which is the filename convention for a Rust library crate (as opposed to an executable crate, which would have a fn main() defined in a src/main.rs). Cargo.toml is the manifest file that defines the crate: the version string, the dependencies, author.

Each challenge comes with unit tests in tests/whatever.rs (technically, integration tests. Rust allows you to write the unit tests inline with the source files themselves). You can run the tests using Rust’s build tool, Cargo:

$ cargo test	# both compiles your library and runs the tests

If you just wanted to compile, you could run cargo build. Were this an executable crate rather than a library crate, you could also cargo run, but with a library crate run has no meaning, so we cargo test. Note: if you had noticed that the binaries are rather large, that is because Cargo builds debug binaries by default. For release, you would use cargo build —release.

Once all of the unit tests pass for an exercise, you can submit your solution like so:

$ exercism submit src/lib.rs

Rust Debugging in the VS Code editor

Although I discussed setting up a Rust development environment in the last post, Andrew Hobden has also documented the setup and use of other tools in the Rust development toolchain, and his writeup may be worth a look. For me right now, I don’t want the added headache of trying to work with any alpha or “nightly build” features of Rust. What was important for me was debugging in VS Code, so I appreciated his help with that. While I previously had no luck getting the “Native Debug” VS Code extension to work, using Andrew’s instructions, I did get the “LLDB Debugger” extension to work.

But do you need to manually recreate .vscode/launch.json and .vscode/tasks.json for every project you want to debug? That blows – Well sort of. In VS Code you can click the debug icon in the sidebar and then the settings wheel icon in the debug pane that appears, and VS Code will create a mostly-complete launch.json file to which you just have to add:

"preLaunchTask": "cargo",
"sourceLanguages": ["rust"]

And of course, you’ll have to fix the final part of the path for “program” (your debug target). So there isn’t that much to do manually for each new project. But when you tell VS Code to run a “preLaunchTask”, as above, you then have to define that “task” in a .vscode/tasks.json file, but it’s the same every time, just copy and paste it from your last project. A hassle compared to debugging with a real IDE, but a minor hassle at least.

Exercism Exercises 1-10

1: Hello World, and strings in Rust

It looks like the developers of this challenge changed their answer format somewhere along the way during development, and now it actually contains conflicting instructions. Fortunately, this is the only challenge with this problem, but ignore the README.md this time as well as the GETTING_STARTED.md. As with most of these challenges, the most important file is tests/hello-world.rs which defines the Cargo unit tests and gives the guiding examples of what your code is supposed to produce. In this case, it is very simple, you just need to produce the string “Hello, World!” using a function called fn hello.

But what is the correct function declaration for fn hello? First, it has to be a function that is published by your Rust library for external callers, thus it is pub fn hello.

It is not taking any arguments (despite what the muddled insructions state), so it is pub fn hello().

And it returns a string, so it is…uh-oh. Here, Rust makes this harder than you might expect. There is the primitive type for representing strings, str, and then there is a String type (from the Rust standard library). They seem similar and completely redundant at first, but their purposes and usages are different in a variety of ways that will trip you up if you don’t understand what each one means. This duality of Strings and string literals is essential to understand, and it is poorly explained (if explained at all) in every Rust tutorial I’ve seen. If people need to write long posts explaining the difference, I think the language documentation could be doing a better job here.

Complicating matters, str is used synonymously in Rust documentation with “string” and “string literal”, and a reference to a subset of a String is an &str, a.k.a. “string slice”. In fact, a function that takes a &str can be passed a &String (a concept in Rust called coercion, i.e. implicit type conversion), but a function that takes a &String cannot be passed a &str. Wow. Confused yet? Just wait until you try to concatenate two strings. We’ll get to that later.

If you choose to use String, the return type is simple to understand, but you have to build a String instance out of a string literal using .to_string() or String::from, which is non-obvious:

pub fn hello() -> String {
  "Hello, World!".to_string()  // alternatively, String::from("Hello, World!")
}

If you choose instead to use str, the actual returned value needs nothing special, but the return type is by borrowed reference (hence the ampersand) and requires a lifetime specifier, something unique to Rust:

pub fn hello() -> &'static str {
    "Hello, World!"
}

This is to say, hello() returns a reference to an immutable string literal. In other words, a pointer to the string “Hello, World!” and the caller of hello() cannot change that string using a dereference of this pointer. The reference is valid for static, a lifetime duration defined as the “duration of the entire program.” This is basically a guarantee to the caller that this reference will always be valid. String literals will always get a static lifetime because they’re hard-coded in the compiled Rust binary’s data section; they are never deallocated.

So, with a simple HelloWorld example we’ve had to introduce ourselves to the three big concepts unique to Rust: ownership, reference borrowing, and lifetimes. We’ve also tripped over the str/String duality and the concept of coercion. As we struggle to comprehend these concepts, they’ll be responsible for the majority of our compile-time errors. This is the Rust learning curve.

2: Gigasecond, and including external crates

Hint, for this one, you’ll be needing the Chrono crate, because the Rust standard library currently has no library for handling the concept of time. Your lib.rs file begins with:

extern crate chrono;
use chrono::*;

And your Cargo.toml file declares this dependency as well:

[dependencies]
chrono = "0.2"

When you cargo test, Cargo will automatically fetch and install the crate “Chrono” for you. Nice! Now you can add seconds to times and compare times to one another.

The instructions for this challenge may mislead you to try to use the caret as an exponentiation operator:

A gigasecond is 10^9 (1,000,000,000) seconds.

Yes, but Rust (like C before it) lacks an exponentiation operator. Not only does 10^9 not define the value 1_000_000_000, it also doesn’t generate a compile-time error. Instead, it is interpreted as the bitwise XOR of 10 and 9: in other words, 10^9 equals 3 (surprise, LOL). Again, the official Rust documentation (“The Rust Programming Language”) is a bit lacking with its complete absence of explanation of the operators that Rust actually has and does not have, a fundamental part of any language. Instead, you should consult the “Rust Language Reference” for this information. That said, if you really want to do exponentiation, several of the primitive types have exponentiation methods: the floating point types f32 and f64 which offer the powi(i32) method, and the integer types i32 and i64 which offer pow(u32).

let ten = 10_i64;
ten.pow(9)  // this is 1,000,000,000

3: Leap, and the modulus operator

There is little to learn from this exercise except the proper use of the % operator, which, again, was up to you to find in the Language Reference. It’s an “either you know this trick or you don’t” challenge, but popular in whiteboard programming questions in job interviews, and occasionally useful in real life. Example snippet:

// On every year that is evenly divisible by 4:
    if candidate_year % 4 == 0 {
        // Except every year that is evenly divisible by 100:
        if candidate_year % 100 == 0 {

4: Raindrops, and the modulus operator again

This is a simple integer-to-string (hint: some_value.to_string()) and integer-factoring challenge. Again, the modulo operator is all you need, and this exercise fails to add any new lesson really.

5: Bob, and iterators

Given some &str prompt you can iterate over every character in the string in a for loop, without having to use pointers or indices as a C programmer is tempted to do:

for character in prompt.chars() {
	// do stuff to each character
}

And in fact, it is basically impossible to loop across a str in any other way, because you cannot use array indexing on a str as you might with a string in C, and create a for loop that ranges from prompt[0] to prompt[prompt.len()]. Even for Rust types where that pattern is possible, it is discouraged: find your loop ranges using iterators, which are returned by methods like .chars() or .iter(). The code above automatically turns character into a value of type char because prompt.chars is a range of char values.

Rust’s char type has some handy methods, for example: if character.is_alphabetic() and if character.is_uppercase().

6: Beer Song, string concatenation, and the match statement

String concatenation in Rust is completely bonkers:

let a = "foo";
let b = "bar";

println!("{}", a + b);                          // invalid
let c = a + &b;                                 // invalid
let c: String = a + b.to_string();              // invalid
let c: String = a.to_string() + b.to_string(); 	// invalid

let c: String = a.to_string() + b;              // valid!
let c: String = a.to_string() + &b.to_string(); // valid!
c.push_str(" more stuff on the end");           // valid!

The strings a and b here are str instances. The str type lacks any kind of concatenation operator, so you can’t use a + to concatenate them when the left operand is a str. However when the left operand is a String you totally can use the + because String does have the concatenation operator.

The String type is growable, whereas str is an annoyingly restricted type of data that mucks up everything it touches. You can’t build a str out of other str; you can’t even build a String out of two str without bending yourself into a pretzel. You may seek to avoid str altogether, but you can’t. Because every Rust string literal is a str, we are forced to work with both str and String, upconverting the former to the latter with .to_string(), and/or connecting them onto the end of a String with its .push_str() method.

But at least you can use the match keyword to help with this challenge:

// Form the appropriate English words to refer to the bottle or bottles:
fn bottles(quantity: u32) -> String {
    match quantity {
        0 => "No more bottles".to_string(),
        1 => "1 bottle".to_string(),
        _ => quantity.to_string() + " bottles",
    }
}

That’s a lot cleaner than an if—else-if—else would have been.

7: Difference of Squares

This one should be a review, you can use iterators (in the form of for-loop ranges), and exponentiation isn’t necessary if you just want to do squares:

let somevalue = 123;
let mut square;
square = somevalue * somevalue;

8: Sum of Multiples

Another review challenge. Tests you again on using iterators, and the modulus operator (“is a multiple of x” is synonymous with “is cleanly divisible by x”). You might use nested loops, or maybe something fancier like closures. I think this post is long enough without addressing that concept.

9: Grains, and the panic macro

Back on exercise 2 we learned how to do exponentiation in Rust, so that’s half of this challenge. The other hint is that the unit tests are testing for error conditions indicated by “panic.” The way to panic in Rust is via the panic macro:

if s < 1 || s > 64 {
	panic!("Square must be between 1 and 64");
}

10: Hamming, unwrap, and the Result type

And finally (for now), we learn how to multiplex a return value and an error condition into one type, Result.

If we choose the return value for our function to be -> Result<i32, &'static str> then we are declaring that we might return either Ok(123) or Err("Something went wrong").

Some of the unit tests are checking to see if the returned condition is an error of any kind: .is_err(). Using Err() satisfies that check.

Rust Initial Impressions

So after getting beyond “Hello world” and trying a few exercises, my initial impressions of Rust as a language are that its strictness is its defining characteristic. You could even say it’s a pain, honestly, not what I could call a joy to work in (although speaking ill of Rust invites the fans to show up and blame you for failing to love it). The payoff doesn’t have to be rapid prototyping joy, though, it just has to be the more secure code that you are ostensibly creating by being so strict and explicit about everything. That’s okay too.

The Good The Bad
Cargo build tool The official documentation for Rust’s language and standard library
Passionate community Condescending comments like “I can tell you’re an imperative language guy” when you don’t use closures
rustfmt for automated code style enforcement (cargo fmt) Learning curve for the errors emitted by the Rust compiler
Rust can be debugged with rust-lldb or rust-gdb and this mostly works within VS Code Your errors will all be compile-time anyway, for better or worse
Expressive method names like .is_alphabetic() are a welcome improvement to C standard lib Any time you have to use strings (String vs str, string concatenation, etc.) you will wonder if Rust will ever catch on

27 Feb 2017, 13:06

A Look at the Rust Programming Language

Where to Find More Execution Performance

Moore’s Law is just about done. It once described a trend of transistor count doubling every 24 months (enabled by increasing the density of transistors by making them ever-smaller). Now:

Between the introduction of 65 nm and 45 nm chips, about 23 months passed. To get from 45 nm to 32 nm took about 27 months, 28 months to go down from there to 22 nm and 30 months to shrink to the current 14 nm process. And that’s where Intel has been stuck since September 2014.

Intel might release 10nm scale chips in late 2017, which would mean that they worked 36-40 months in order to shrink from 14nm to 10nm scale. In other words, the most recent density doubling (the shrink from 22nm to 10nm), by the time it happens, will have taken over 5 years. The next doubling is likely to take at least that long, assuming the multiple breakthroughs required to do so can even be achieved. 10nm is already fairly close to the atomic scale: ~45 silicon atoms across (one atom: 0.22nm). One of the obstacles at this scale to be addressed is quantum tunneling, not that I pretend to understand it.

Of course, Moore’s Law can be satisfied one other way without changing density, which is to simply use bigger and bigger processor dies. You may have seen charts showing that transistor count continues to increase on schedule with Moore’s Law, but this is only true for dedicated GPUs and high-end server CPUs, which are already up against cost practicality limits due to these die sizes.

Even if we were still on track for Moore’s Law, increasing transistor counts alone have provided diminishing returns as of late. Recent density increases have mainly just served to reduce power draw and to make more space on the CPU die dedicated to graphics rendering (an ideal parallelizable task). Tech being an optimistic culture makes it slow to acknowledge the obvious truth here: CPU cores aren’t getting significantly faster. Unless your work is on a mobile device or can be delegated to a GPU or server farm, your only performance upgrades since 2010 have been I/O-related ones.

Granted, transistor density improvements have continued to increase CPU power efficiency. But I have a Intel “Core i7” (2.66 GHz i7-620M, 2-core) laptop that will turn 7 years old in a couple of months, and today’s equivalent CPUs still offer only a marginal performance improvement for tasks that aren’t 3D graphics. The equivalent CPU today, the Intel “Core i7” (2.7GHz i7-7500U, 2-core), has single-threaded performance only about 60% better than my CPU from 7 years ago. Not enough to make me throw out my old laptop.

All of this background is to make my point, which is that the next performance leap has to come from improved software, rather than relying on “free” improvements from new hardware. A few software methods for achieving a generational improvement in performance might be:

  • Parallelism
  • Optimizing compilers
  • Moving tasks from interpreted languages back to compiled languages

All of these things are already happening, but it’s the last one that I’m interested in most.

Parallelism

Parallelism has brought great performance improvements in graphics, “AI,” and large data set processing (so-called “Big Data”), and is the reason why GPUs continue to march forward in transistor count (although, again, check out those increasing die sizes; those are approaching their own limits of practicality). The problem with parallelism, though, is that while there are some workloads that are naturally suited to it, others aren’t and never will be. Sometimes, computing Task B is dependent on the outcome of Task A, and there is just no way to split up Task A. Even when parts of a task can be parallelized, there are swiftly diminishing returns to adding more cores, as described by Amdahl’s Law. What parallelized processing does scale well for is large data sets, although the home user is not typically handling large data sets, and won’t directly benefit from this kind of parallelism.

Optimizing Compilers

Here are Daniel J Bernstein’s 2015 slides about the death of “optimizing compilers,” or rather, that despite all the hype about them, we are still manually tuning the performance critical portions of our programs. The optimizing compilers’ optimization of non-critical code portions is irrelevant, or at least not worth the effort put into optimizing compilers. It appears that a compiler to generically optimize any code as well as an expert human could, would require something like a general AI with a full contextual understanding of the problem being solved by the code. Such a thing doesn’t exist, and is not on the horizon.

Better (Safer) Compiled Languages

C and C++ never really left us, and neither have all of the inherent memory errors in code programmed in C and C++. That includes Java, whose runtime is still written in C. The Java runtime has been the source of many “Java” security issues over the years, to the point where the Java plug-in was effectively banned from all web browsers. Despite that, the rest of the browser is also written in C and C++, and just as prone to these problems. There hasn’t been any viable alternative but to try to sandbox and privilege-reduce the browser, because any safer language is too slow.

The real cost of C and C++ ’s performance is their high maintenance burdens: coding in them means always opening up subtle concurrency errors, memory corruption bugs, and information leak vulnerabilities. This is why simply improving the C++ standard library and adding more and more features to the language has not altered its basic value proposition to developers, who have already fled to “safe” languages.

That’s where the experimental language, Rust, comes in. It’s a compiled systems programming language with performance on par with (or better than) C++, but with compile-time restrictions on memory management and concurrency that should prevent entire classes of bugs. At some point in the next 5 years, I predict that we will see Rust (or something like it, whether it’s Swift or some new really strict C++ compiler) slowly start replacing C/C++ wherever performance and security are both primary concerns. It’s exciting to think that a well-designed compiled language could solve most of the reasons for the ~20-year flight away from native code programming.

Having played with Rust for a few days, I can say it will certainly not replace Python for ease of development, but it’s a really interesting disruptor for anyone writing native code. Security researchers should also take notice.

Rust Programming Language

For what it’s worth, Rust was the “Most Loved Programming Language of 2016 in the Stack Overflow Developer Survey.” It enforces memory management and safety at compile-time. Some memory safety features of the language include:

  • Rust does not permit null pointers or dangling pointers. Since pointers are never NULL, you can always safely dereference a pointer.

  • There are no “void” pointers.

  • Pointers can not be downcast to a more specific type, only upcast to a more generic type. If generic data structures are needed, you use parameterized types/functions.

  • Variables can be allocated on the heap and are cleaned up without the need for “free” or “delete.”

  • Concurrent-access race conditions are impossible, because every piece of data is either:

    • mutable (reference from a single “owner” at a time, owner re-assigned if needed) OR
    • immutable (multiple references can exist)

(there can be only one mutable reference, or an aribtrary number of immutable references to the same allocation, but never both [credit: @vitiral])

If you just wanted a statically typed, compiled language with a modern standard library that is easy to extend, you could also choose Go. But Rust claims to be all of that, plus faster and safer. Rust will work in embedded devices and other spaces currently occupied by C/C++; Go will not. Some think Rust is just fundamentally better, but I am not qualified to judge that.

Rust and parallelism

Rust makes parallelization an integral part of the language, with support for all of the necessary parallel programming primitives. Parallelized versions of various programming constructs can be swapped in without changing your existing code. This is possible because the Rust language forces the programmer to specify more about how data will be used, which prevents race conditions at runtime by turning them into errors at compile time, instead.

Concept of “Ownership” in Rust

The major innovation of the Rust language (inspired by a prior language, “Cyclone”) is that its compiler, in order to do memory management and prevent race conditions at compile time, tracks “ownership” of all variables in the code. Once a variable is used (like in a call to a function) it is considered to be passed to a new “owner,” and using it in a subsequent statement is illegal and would trigger a compiler error. If the developer’s intention was to copy-on-use (“clone”), they must specify that in their code. For certain simple data types (integers, etc.), they are automatically copied-on-use without any explicit intent from the developer. Another aspect of ownership in Rust is that all variables are (what in C/C++ would be called) const, by default. In Rust, if you want a variable to be mutable, it has to be explicitly stated in the declaration.

This concept is the foundation of the Rust language. It’s hard to grasp at first, since it is very different from programming in C or C++, or even Java. The most detailed explanation of Rust ownership that I’ve seen is this article by Chris Morgan, but to actually learn the concept I’d recommend starting with this 25 minute video by Nikolas Matsakis.

At first, it seems like another mental burden on the programmer, but adopting this concept of memory management means the programmer is also relieved of having to manage memory with carefully paired calls to malloc() and free() (or new and delete). “So what, isn’t this what you get with C# or Java?” Not quite: those languages use a Garbage Collector to track references to data at runtime, which has an inherent performance overhead and whose “stop-the-world” resource management can be inconsistent and unpredictable. Rust does it in the language, at compile time. So, without the use of a Garbage Collector, Rust makes memory management (and concurrent access to data) safe again.

Rust is a Drop-In Replacement for C

Just like C/C++, Rust can be coupled to Python or any other language with a native interface, in order to leverage the strengths of both. And, debugging Rust programs is officially supported by GDB. This works the other way around too, i.e., you can build a Rust program on top of native code libraries written in C/C++. Mozilla is even working on a web browser engine in Rust, to replace Gecko, the Firefox engine. Benchmarks in 2014 showed a 300% increase in performance vs Gecko, and by early 2016, it was beating Webkit and Chrome as well (at least in some hand-picked benchmarks where they leverage Rust’s ease of parallelism to delegate a bunch of stuff to the GPU). If you’re interested in the details of how Rust can improve browser engines, Mozilla wrote about it here. Buried in the paper is a detail that they seem to have downplayed elsewhere, though: the new browser engine is actually still bootstrapped by an existing codebase, so it’s still 75% C/C++ code. On the other hand, that also goes to show how Rust integrates well with C/C++.

Rust has a Package Manager, which is also its Build Tool

Makefiles are impossible to write and debug, and basically you’re always just copy-pasting a previous Makefile into the new one, or hoping an IDE or build tool abstracts away all that crap for you, which is why this wheel has been reinvented many times. I generally don’t have a favorite build tool (they’re all bad), since it always seems to come down to a manual troubleshooting cycle of acquiring all the right dependencies. The worst is having a build system that is a big layer cake of scripts on top of XML on top of Makefiles.

Rust package manager “Cargo” simply uses TOML files to describe what a Rust project needs in order to build, and when you build with Cargo, it just goes out and gets those dependencies for you. Plus, the packages are served from Crates.io, so if you’re keeping score that’s a double tech hipster bonus for using both the .io domain and TOML.

Installation and Hello World

Assuming you’re using MacOS like me (there is plenty of info out there already for Windows and Linux users) and you have Homebrew:

    $ brew install rust
    $ rustc --version
    rustc 1.15.0

You probably want an editor with Rust syntax highlighting and code completion. These are your choices. I went with Visual Studio Code, aka VS Code. It’s not what I’d call an IDE, and I still haven’t gotten it to integrate with a debugger, but hopefully JetBrains will step up and make a Rust IDE – once there is a market for it.

VS Code doesn’t understand Rust out of the box. Launching VS Code, hit Command-P to open the in-app console:

ext install vscode-rust
(install the top search result, should be the extension by kalitaalexey)

Optionally, you can install a GDB/LLDB integration layer to attempt to debug from VS Code (in theory – YMMV but I haven’t gotten it to work for LLDB with C++ yet, let alone Rust):

ext install webfreak.debug
(install the top search result)

Notice in the bottom right: “Rust tools are missing” … click install. It will invoke Cargo (the Rust package manager) to download, compile, and install more of the Rust toolchain for you: racer, rustfmt, rustsym, etc. And all of the dependencies for those. Go have a coffee, this will take a while. About 18 minutes on my system.

Finally: close VS Code, and open up Terminal so we can put all these new Rust binaries on your $PATH.

$ open -a /Applications/TextEdit.app ~/.bash_profile

Add the line export PATH="/Users/yourusername/.cargo/bin:$PATH" and save.

Open a new instance of VS Code. It should no longer tell you that Rust tools are missing. 👍🏻

Test the environment with a Hello World in Rust! Save the following as hello.rs:

fn main() {
    println!("Hello World!");
}

Open “View -> Integrated Terminal.” From here you can compile by hand like a peasant, because VS Code isn’t an actual IDE.

bash-3.2$ cd ~/Desktop
bash-3.2$ rustc hello.rs
bash-3.2$ ./hello
Hello World!

But for a realistic scenario, we could have also used Cargo to both create a new Rust project and then build it.

In a future post, I will share my thoughts on what it’s like to try to actually write a program in Rust.

Rust References