Rvalue references and move semantics 101

Rvalue references and move semantics can hardly be counted among the beginner’s topics in C++. After encountering them for the first time in a Scott Meyer’s book, I felt deeply confused. Like, what would you ever need them for? And why make an already complex language even more so? It turns out, not only are rvalue references useful, but they also simplify common patterns!

Dynamic resource management, the old way

Take a look at the following rudimentary String implementation:

class String {
private:
  std::size_t length_ {0};
  char *data_ { new char[1] { '\0' }};

public:
  String(const char * str):
    length_{ std::strlen(str) },
    data_{ new char[length_ + 1] }
  {
    std::memcpy(data_, str, length_ + 1);
  }

  String(const String& other):
    length_{ other.length_ },
    data_{ new char[ length_ + 1]}
  {
    std::memcpy(data_, other.data_, length_ + 1);
  }

  String& operator=(const String& other) {
    if (length() != other.length()) {
      auto data = new char[other.length_ + 1];
      delete[] data_;
      data_ = data;
      length_ = other.length_;    
    }
    std::memcpy(data_, other.data_, length_ + 1);
    
    return *this;
  }

  ~String() noexcept {
      delete[] data_;
  }
    
  friend inline std::ostream& operator<<(std::ostream& out, const String& str) {
    return out << str.data_;
  }
};

String manages a dynamic memory resource (the char array data_). To do so, it implements the RAII idiom, and adhering to the rule of three, defines three special member functions:

  • copy constructor,
  • copy assignment operator,
  • destructor.

Other classes can use String to store text data without worrying about memory management:

class File {
private:
  String name_;

public:
  File(const String& name):
    name_{ name }
  {}
      
  friend inline std::ostream& operator<<(std::ostream& out, const File& file){
    return out << "File{ name='" << file.name_ << "' }";
  }
};

The snippet of code below demonstrates how String and File are used together. When executed, it prints data.in, followed by File{ name=’data.in’ } to the screen.

#include <iostream>

int main() {
  String name{ "data.in" };
  File file{ name };
  std::cout << name << '\n';
  std::cout << file;
}

Remarkably, owing to the RAII guarantees, the heap-memory claimed by the name and the file’s name_ objects is automatically released when those objects fall out of scope. You can verify that by adding an output statement to the String’s destructor:

~String::String() noexcept {
    std::cout << "Destroying String with content: '" << data_ << "'\n";
    delete[] data_;
}

And yet, can you spot a disturbing issue with the current implementation of String and File? Yes, it is…

Copies, copies everywhere

The String object, name, that contains the data.in is used to create a File instance. In fact, it is its only purpose, there is no other reason for its existence. It’s created and then passed to the File’s constructor where a copy of it is made when initializing File::name_. That makes two String objects spawning to life in this short program. One of them is an unnecessary copy and you should get rid of it. On the first try, you could skip creating name altogether by writing:

File file{ "data.in" };

This works because of the implicit conversion from a string literal to String supported by one of the String’s constructors. But there is still an extra object created and a copy being made. The additional String object appears as an argument to the File’s constructor. It only lives within the constructor’s scope, bound to its const reference parameter, but it’s there and it’s being copied. That’s just silly–why would you need two Strings when one suffices?

Stealing object’s content

There is a solution to this craziness. The main burden of String is the ownership of a piece of heap memory. That’s a resource than can be “stolen” by another String. Stealing is done by hijacking the ownership of the heap-allocated memory:

class String {
public:
  void steal_content(String& other) {
    // delete currently held own memory  
    delete[] data_;
   
    // actual "stealing"
    length_ = other.length_;
    data_ = other.data_;
   
    // making sure other object is still valid
    // and that it doesn't refer its old data
    other.length_ = 0;
    other.data_ = new char[1]{};
  }
};

In steal_content, after the memory is stolen, the just robbed other object is set to a valid state of an empty string. You don’t won’t two Strings pointing to the same heap-memory.

Naturally, stealing is not really an official name for what happens in steal_content–it’s usually called moving. Ignoring the naming details, you can use this function in File to avoid making a copy of the name argument:

class File {
private:
  String name_;

public:
  File(String& name):
    name_{ }    // default initialize
  {
    name_.steal(name);  // move the  name object
  }

};

But this has a nasty side-effect. Because the argument to the constructor became a non-const lvalue reference (otherwise the moved-from object cannot be modified), it cannot be used as before:

File file{ "data.in" };

Only a const reference can bind to a temporary String object that’s implicitly created from “data.in”. Naturally, you can provide two constructor overloads, one taking a const String& and one String&. It could work in this case, but such an approach is not scalable to other scenarios. What if there is already a const - non-const overload set in the codebase. Perhaps a silly one, like in the example below:

class File {
private:
  String name_;

public:

  // throws on invalid file names
  File(const String& name):
    name_{ throw_if_invalid(name) }
  {}

  // tries to fix the file name if it's invalid
  File(String& name):
    name_{ try_repair(name) }
  {}
};

A common solution to exploding overload sets is using a tag. A tag is a special, empty structure that’s added to the parameter list of a function to distinguish between overloads. Something like this suffices:

struct MovableTag{};

You could add support for MovableTag to both String and File, leading to:

class String {
public:
  // copy constructor
  String(const String& other);  

  // move constructor with a tag
  String(String& other, MovableTag):
    length_{ other.length_ },
    data_{ other.data_ }
  {
    other.length_ = 0;
    other.data_ = new char[1]{};
  }

};

class File {
public:
  // throws on invalid file names
  File(const String& name):
    name_{ throw_if_invalid(name) }
  {}

  // tries to fix the file name if it's invalid
  File(String& name):
    name_{ try_repair(name) }
  {}

  // just moves name, tagged overload
  File(String& name, MovableTag):
    name_{ name, MovableTag{} }
  {}
};

As a point of interest, a parameter of type MovableTag is never named in function definitions. This has two advantages. First, there’s no compiler warning about an unused argument. Second, it’s a clear signal of intent to the compiler that it can optimize this parameter away because it’s only used for overload resolution.

With both File and String taking advantage of the tag, the code becomes:

#include <iostream>

int main() {
  String name{ "data.in" };
  File file{ name, MovableTag{} };
  std::cout << name << '\n';
  std::cout << file;
}

It’s a bit uglier, but it works just fine, printing an empty line (the stolen from name object), followed by File{ name=’data.in’ }. Sadly, next to the subjective ugliness, it is impractical to implement a move assignment operator using tagging. In other words, making something along these lines work:

class File {
public:
  String& operator=(String& other, MovableTag) {
    if (this != &movable.obj) {
      delete[] data_;
      data_ = other.data_;
      length_ = other.length_;
      other.length_ = 0;
      other.data_ = new char[1]{};
    }
    return *this;
  }
};

Is not a trivial task. To make code more readable, and allow move assignment you’ll need:

Stealing by wrapping

The idea is straightforward, instead of using a tag object, you should use a wrapper to distinguish between the overloads. The wrapper must be generic, to support arbitrary types:

template <typename T>
struct Movable {
  T& obj;
};

// deduction guide
template <typename T>
Movable(T) -> Movable<T>;

The wrapper has only one data member, an lvalue, non-const reference to an object that can be moved from. Because Movable is an aggregate, a deduction guide is added to enable writing code like:

String name{ "data.in" };
Movable movable{ name };  // deduction guide used to deduce T=String

You could even go a step further and supplement Movable with a function template:

template <typename T>
Movable<T> as_movable(T& t) {
  return {t};
}

/* ~~~ */
```c++
String name{ "data.in" };
auto movable = as_movable( name );  // deduction guide used to deduce T=String

With Movable in place, you can refactor String to support move construction and move assignment with:

class String {
private:
  std::size_t length_ {0};
  char *data_ { new char[1] { '\0' }};
public:
  // copy constructor and assignment op.
  String(const String& other);
  String& operator=(const String& other);

  // move constructor
  String(Movable<String>& movable):
    length_{ movable.obj.length_ },
    data_{ movable.obj.data_ }
  {
    movable.obj.length_ = 0;
    movable.obj.data_ = new char[1]{};
  }

  // move assignment operator
  String& operator=(Movable<String>& movable) {
    if (this != &movable.obj) {
      delete[] data_;
      data_ = movable.obj.data_;
      length_ = movable.obj.length_;
      movable.obj.length_ = 0;
      movable.obj.data_ = new char[1]{};
    }
    return *this;
  }

  // destructor
  ~String() noexcept;
};

In this implementation, Movable<T> is used to distinguish between the overloaded constructors and copy assignment operators. The move variants still steal the data from the String passed as a wrapped argument to them. Movable enabled writing the move assignment operator–something that was non-trivial with a tag.

You should also change File to support the new String construct:

class File {
private:
  String name_;
public:
  // copies String
  File(const String& name):
    name_{ name }
  {}
  
  // moves String
  File(Movable<String>& name):
    name_{ name }
  {}    
};

And finally, you’ll enjoy clean code with both copying and moving variants available:

#include <iostream>

int main() {
  String input_name{ "data.in" };
  String output_name{ "results.out" };

  // moves input_name
  auto movable{ as_movable(input_name) };
  File input{ movable };

  // copies output_name
  File output{ output_name };
}

With a small caveat… Movable<String> is taken by non-const reference by the constructors. Consequently, it’s impossible to move objects created in-place:

#include <iostream>

int main() {
  String input_name{ "data.in" };
  String output_name{ "results.out" };

  // moves input_name, OK
  auto movable{ as_movable(input_name) };
  File input{ movable };

  // moves output_name, ERROR
  File output{ as_movable(output_name) };
}

The snippet of code above produces an error saying that you cannot bind a non-const lvalue reference to a temporary object created with as_movable. You could, potentially, solve this problem by changing the signatures of the constructors to:

String::String( const Movable<String>& movable);
File::File( const Movable<String>& movable);

And, shockingly, it would work! This trick is exploiting one of the dark corners of C++. The obj data member of Movable must obey the Movable’s constness. However the obj’s referent (the String object that obj refers to) does not belong to Movable and is not concerned by whatever happens to it. Consequently, despite Movable being passed by const reference, you can modify the String referenced by obj and move the data that belongs to it.

The code works as intended and expresses the intent of a programmer in a clean way (as_movable). Yet, it feels like a fraud–an object that can be moved from is passed (indirectly) by const reference. Writing code like this is putting your future self in trouble. There must be a better way, and it came in C++11 with…

Rvalue references

To understand how rvalue references work, it’s enough to replace each Movable<String>& with String&& (skipping the const qualifier) and each as_movable with std::move. After making those changes and necessary adjustments, String becomes:

class String {
private:
  std::size_t length_ {0};
  char *data_ { new char[1] { '\0' }};
public:
  // copy constructor and assignment op.
  String(const String& other);
  String& operator=(const String& other);

  // move constructor
  String(String&& movable):
    length_{ movable.length_ },
    data_{ movable.data_ }
  {
    movable.length_ = 0;
    movable.data_ = new char[1]{};
  }

  // move assignment operator
  String& operator=(String&& movable) {
    if (this != &movable) {
      delete[] data_;
      data_ = movable.data_;
      length_ = movable.length_;
      movable.length_ = 0;
      movable.data_ = new char[1]{};
    }
    return *this;
  }

  // destructor
  ~String() noexcept;
};

And the File class is:

class File {
private:
  String name_;
public:
  // copies String
  File(const String& name):
    name_{ name }
  {}
  
  // moves String
  File(String&& name):
    name_{ std::move(name) }    // <- notice std::move
  {}    
};

Notice a small addition–there’s now a std::move when initializing name_ in the second File’s constructor. That’s because of the language rules dictating that rvalue referenceness doesn’t propagate. So an object that’s passed by an rvalue reference (name) stops being one when passed further down the line. It needs to be marked as an rvalue reference again, and that’s the job of std::move. Yes, you read it correctly, std::move doesn’t move anything–it just marks something as an rvalue reference, similarly to how as_movable marked something as an object potentially movable from by wrapping it into Movable.

With those changes, the main part of the code becomes:

int main() {
  String input_name{ "data.in" };
  String output_name{ "results.out" };

  // copies input_name
  File input{ input_name };

  // moves output_name
  File output{ std::move(output_name) };
}

And voila! the program works as intended.

Can you spot the difference between using rvalue references and the home-cooked solution? You can mark an object as movable with std::move in-place when passing it to a function. With as_movable it was impossible because a non-const lvalue reference cannot bind to a temporary object. With rvalue references the issue isn’t there–they are made specifically to bind to both objects marked with std::move and to temporaries. So even the code below, where an rvalue Strings reference binds directly to a temporary returned by a function works:

String get_name();

int main() {
  File file{ get_name() };
}

Summary

What you’ve learned? Hopefully quite a handful:

  • Moving is actually “stealing” dynamic resources owned by an object and placing them into another one.
  • Moving is possible without rvalue references.
  • Tagging and wrapping help with overload resolution.
  • Objects that are tagged as movable can be moved-from.
  • In modern C++, an rvalue reference is used to “tag” and object as movable.
  • std::move marks an object as an rvalue reference.
  • std::move doesn’t move anything, it’s actually closer in its meaning to as_movable.