C++11 | Bulldozer00's Blog

Crash!!!!

February 9, 2016 bulldozer00 2 comments

Holy crap! I can’t remember the last time I crashed a GCC C++ compiler. But looky here at this seg fault:

Well, I didn’t cause the crash, a colleague did. When was the last time you crashed a compiler?

Categories: C++11 Tags: compiler crash

RaiiGuard

February 6, 2016 bulldozer00 3 comments

In my application domain, the following shared object pattern occurs quite frequently:

Typically, Shared is a small, dynamically changing, in-memory, database and two or more threads need to perform asynchronous CRUD operations on its content. I’ve seen, and have written, code like this more times than I can count:

  Shared so{};
  //The following idiom is exception-unsafe
  so.lock();
  so.doStuff(); //doStuff() better not throw!
  so.unlock();

The problem with the code is that if doStuff() (or any other code invoked within its implementation) throws an exception, then unlock() won’t get executed. Thus, any other thread that attempts to access so by calling so.lock() will get blocked… forever… deadlock city.

Here is one way of achieving exception safety, but it’s clunky:

  Shared so{};
  //Clunky way to achieve exception safety
  so.lock();
  try {
    so.doStuff(); //doStuff() can throw!
  }
  catch(...) {
    so.unlock();
  }
  so.unlock();

I use Stroustrup’s classic RAII (Resource Acquisition Is Initialization) technique to achieve exception safety:

  Shared so{};
  RaiiGuard<Shared> guardedSo{so};
  guardedSo->doStuff(); //It's OK for doStuff() to throw!

Since the solution involves constructing/destructing a RaiiGuard<Shared> object in the same scope as the Shared object it encapsulates and protects from data races, the tradeoff is an exception-safety gain for a bit of performance loss.

Here is the simple RaiiGuard class template that I use to achieve the non-clunky way to exception safety:

template<typename T>
class RaiiGuard {

public:
  explicit RaiiGuard(T& obj) : _obj(obj) {
    _obj.lock();
  }

  T* operator->() {
    return &_obj;
  }

  ~RaiiGuard() {
    _obj.unlock();
  }

private:
  T& _obj;

};

It locks access to an object of type T on construction and unlocks it on destruction – the definition of RAII and RRID (Resource Release Is Destruction). The user obtains access to the “monitored” underlying object by invoking T* RaiiGuard<T>::operator->().

The only requirement imposed on the template type T is that it implements the lock() and unlock() functions as follows:


class T {

public:

  void lock() {
    m.lock();
  }

  void unlock() {
    m.unlock();
  }

  void doStuff() {
  }

private:

  mutable std::mutex m;

};

Since they all contain lock()/unlock() pairs, any of the STL mutex types can be substituted for std::mutex.

How do you achieve exception safety for the shared object pattern?

Categories: C++11 Tags: exception safety, RAII, RaiiGuard

Stack, Heap, Pool

September 14, 2015 bulldozer00 25 comments

Update 10/7/15: Because this post lacked a context-setting introduction, I wrote a prequel for it after the fact. Please read the prequel before reading this post.

Three Ways

The figure below shows three ways of allocating memory from within a C++ application: stack, heap, custom written pool.

Comparing Performance

The code below measures the performance of the three allocation methods. It computes and prints the time to allocate 1000, 1MB objects from: the stack, the C++ free store, and a custom-written memory pool (the implementation of Pool appears at the end of this post).

#include <chrono>
#include <iostream>
#include <array>
#include "Pool.h"

class BigObject {

public:

  //Each BigObject occupies 1MB of RAM
  static const int32_t NUM_BYTES{1000 *1024};

  void reset() {} //Required by the Pool class template

  void setFirstChar(char c) {byteArray[0] = c;}

private:

  std::array<char, NUM_BYTES> byteArray{};

};

//Compare the performance of three types of memory allocation:
//Stack, free store (heap), custom-written pool
int main() {

  using namespace std::chrono; //Reduce verbosity of subsequent code
  constexpr int32_t NUM_ALLOCS{1000};
  void processObj(BigObject* obj);

  //Time heap allocations
  auto ts = system_clock::now();
  for(int32_t i=0; i<NUM_ALLOCS; ++i) {
    BigObject* obj{new BigObject{}};
    //Do stuff.......
    processObj(obj);
  }
  auto te = system_clock::now();
  std::cout << "Heap Allocations took "
            << duration_cast<milliseconds>(te-ts).count()
            << " milliseconds"
            << std::endl;

  //Time stack allocations
  ts = system_clock::now();
  for(int32_t i=0; i<NUM_ALLOCS; ++i) {
    BigObject obj{};
    //Do stuff.......
    processObj(&obj);
  } //obj is auto-destroyed here
  te = system_clock::now();
  std::cout << "Stack Allocations took "
            << duration_cast<milliseconds>(te-ts).count()
            << " milliseconds"
            <<  std::endl;

  //Pool construction time
  ts = system_clock::now();
  Pool<BigObject, NUM_ALLOCS> objPool;
  te = system_clock::now();
  std::cout << "Pool construction took "
            << duration_cast<milliseconds>(te-ts).count()
            << " milliseconds"
            <<  std::endl;

  //Time pool allocations
  ts = system_clock::now();
  for(int32_t i=0; i<NUM_ALLOCS; ++i) {
    BigObject* poolObj{objPool.acquire()};
    //Do Stuff.......
    processObj(poolObj);
  }
  te = system_clock::now();
  std::cout << "Pool Allocations took "
            << duration_cast<milliseconds>(te-ts).count()
            << " milliseconds";

}

void processObj(BigObject* obj) {

  obj->setFirstChar('a');

}

The results for a typical program run (gcc 4.9.2, cygwin/Win10, compiler) are shown as follows:

As expected, stack allocation is much faster than heap allocation (3X in this case) because there is no such thing as a “fragmented” stack. The interesting result is how much faster custom pool allocation is than either of the two “standard” out of the box methods. Since each object is 1MB in size, and all of the memory managed by the pool is allocated only once (during construction of the pool object), dynamically acquiring and releasing pointers to the pre-constructed pool objects during runtime is blazingly fast compared to creating a lumbering 1MB object every time the user code needs one.

After you peruse the Pool implementation that follows, I’d love to hear your comments on this post. I’d also appreciate it if you could try to replicate the results I got.

The Pool Class Template – std::array Implementation

Here is my implementation of the Pool class used in the above test code. Note that the Pool class does not use any dynamically resizable STL containers in its implementation. It uses the smarter C++ implementation of the plain ole statically allocated C array. Some safety-critical domains require that no dynamic memory allocation is performed post-initialization – all memory must be allocated either at compile time or upon program initialization.

//Pool.h
#ifndef POOL_H_
#define POOL_H_

#include <array>
#include <algorithm>
#include <memory>

// The OBJ template argument supplied by the user
// must have a publicly defined default ctor and a publicly defined
// OBJ::reset() member function that cleans the object innards so that it
// can be reused again (if needed).
template<typename OBJ, int32_t NUM_OBJS>
class Pool {

public:

  //RAII: Fill up the _available array with newly allocated objects.
  //Since all objects are available for the user to "acquire"
  //on initialization, fill up the _inUse array will nullptrs.
  Pool(){

    //Fill the _available array with default-constructed objects
    std::for_each(std::begin(_available), std::end(_available),
        [](OBJ*& mem){mem = new OBJ{};});

    //Since no objects are currently in use, fill the _inUse array 
    //with nullptrs
    std::for_each(std::begin(_inUse), std::end(_inUse),
        [](OBJ*& mem){mem = nullptr;});

  }

  //Allows a user to acquire an object to manipulate as she wishes
  OBJ* acquire() {

    //Try to find an available object
    auto iter = std::find_if(std::begin(_available), std::end(_available),
        [](OBJ* mem){return (mem not_eq nullptr);});

    //Is the pool exhausted?
    if(iter == std::end(_available)) {

      return nullptr; //Replace this with a "throw" if you need to.

    }
    else {

      //Whoo hoo! An object is available for usage by the user
      OBJ* mem{*iter};

      //Mark it as in-use in the _available array
      *iter = nullptr;

      //Find a spot for it in the _inUse list
      iter = std::find(std::begin(_inUse), std::end(_inUse), nullptr);
  
      //Insert it into the _inUse list. We don't
      //have to check for iter == std::end(_inUse)
      //because we are fully in control!
      *iter = mem;

      //Finally, return the object to the user
      return mem;

    }

  }

  //Returns an object to the pool
  void release(OBJ* obj) {

    //Ensure that the object is indeed in-use. If
    //the user gives us an address that we don't own,
    //we'll have a leak and bad things can happen when we
    //delete the memory during Pool destruction.
    auto iter = std::find(std::begin(_inUse), std::end(_inUse), obj);

    //Simply return control back to the user
    //if we didn't find the object in our
    //in-use list
    if(iter == std::end(_inUse)) {

        return;

    }
    else {

      //Clear out the innards of the object being returned
      //so that the next user doesn't have to "remember" to do it.
      obj->reset();

      //temporarily hold onto the object while we mark it
      //as available in the _inUse array
       OBJ* mem{*iter};

       //Mark it!
       *iter = nullptr;

       //Find a spot for it in the _available list
       iter = std::find(std::begin(_available), std::end(_available), nullptr);

       //Insert the object's address into the _available array
       *iter = mem;
    }

  }

  ~Pool() {

    //Cleanup after ourself: RAII
    std::for_each(std::begin(_available), std::end(_available),
        std::default_delete<OBJ>());

    std::for_each(std::begin(_inUse), std::end(_inUse),
        std::default_delete<OBJ>());

  }

private:

  //Keeps track of all objects available to users
  std::array<OBJ*, NUM_OBJS> _available;

  //Keeps track of all objects in-use by users
  std::array<OBJ*, NUM_OBJS> _inUse;

};

#endif /* POOL_H_ */

The Pool Class Template – std::multiset Implementation

For grins, I also implemented the Pool class below using std::multiset to keep track of the available and in-use objects. For large numbers of objects, the acquire()/release() calls will most likely be faster than their peers in the std::array implementation, but the implementation violates the “no post-intialization dynamically allocated memory allowed” requirement – if you have one.

Since the implementation does perform dynamic memory allocation during runtime, it would probably make more sense to pass the size of the pool via the constructor rather than as a template argument. If so, then functionality can be added to allow users to resize the pool during runtime if needed.

//Pool.h
#ifndef POOL_H_
#define POOL_H_

#include <algorithm>
#include <set>
#include <memory>

// The OBJ template argument supplied by the user
// must have a publicly defined default ctor and a publicly defined
// OBJ::reset() member function that cleans the object innards so that it
// can be reused again (if needed).
template<typename OBJ, int32_t NUM_OBJS>
class Pool {

public:

  //RAII: Fill up the _available set with newly allocated objects.
  //Since all objects are available for the user to "acquire"
  //on initialization, fill up the _inUse set will nullptrs.
  Pool(){

    for(int32_t i=0; i<NUM_OBJS; ++i) {

      _available.insert(new OBJ{});
      _inUse.insert(nullptr);

    }

  }

  //Allows a user to acquire an object to manipulate as she wishes
  OBJ* acquire() {

    //Try to find an available object
    auto iter = std::find_if(std::begin(_available), std::end(_available),
        [](OBJ* mem){return (mem not_eq nullptr);});

    //Is the pool exhausted?
    if(iter == std::end(_available)) {

      return nullptr; //Replace this with a "throw" if you need to.

    }
    else {

      //Whoo hoo! An object is available for usage by the user
      OBJ* mem{*iter};

      //Mark it as in-use in the _available set
      _available.erase(iter);
      _available.insert(nullptr);

      //Find a spot for it in the _inUse set
      iter = std::find(std::begin(_inUse), std::end(_inUse), nullptr);

      //Insert it into the _inUse set. We don't
      //have to check for iter == std::end(_inUse)
      //because we are fully in control!
      _inUse.erase(iter);
      _inUse.insert(mem);

      //Finally, return the object to the user
      return mem;

    }

  }

  //Returns an object to the pool
  void release(OBJ* obj) {

    //Ensure that the object is indeed in-use. If
    //the user gives us an address that we don't own,
    //we'll have a leak and bad things can happen when we
    //delete the memory during Pool destruction.
    auto iter = std::find(std::begin(_inUse), std::end(_inUse), obj);

    //Simply return control back to the user
    //if we didn't find the object in our
    //in-use set
    if(iter == std::end(_inUse)) {

        return;

    }
    else {

      //Clear out the innards of the object being returned
      //so that the next user doesn't have to "remember" to do it.
      obj->reset();

      //temporarily hold onto the object while we mark it
      //as available in the _inUse array
       OBJ* mem{*iter};

       //Mark it!
       _inUse.erase(iter);
       _inUse.insert(nullptr);

       //Find a spot for it in the _available list
       iter = std::find(std::begin(_available), std::end(_available), nullptr);

       //Insert the object's address into the _available array
       _available.erase(iter);
       _available.insert(mem);
    }

  }

  ~Pool() {

    //Cleanup after ourself: RAII
    std::for_each(std::begin(_available), std::end(_available),
        std::default_delete<OBJ>());

    std::for_each(std::begin(_inUse), std::end(_inUse),
        std::default_delete<OBJ>());

  }

private:

  //Keeps track of all objects available to users
  std::multiset<OBJ*> _available;

  //Keeps track of all objects in-use by users
  std::multiset<OBJ*> _inUse;

};

#endif /* POOL_H_ */

Unit Testing

In case you were wondering, I used Phil Nash’s lightweight Catch unit testing framework to verify/validate the behavior of both Pool class implementations:

TEST_CASE( "Allocations/Deallocations", "[Pool]" ) {

    constexpr int32_t NUM_ALLOCS{2};
    Pool<BigObject, NUM_ALLOCS> objPool;

    BigObject* obj1{objPool.acquire()};
    REQUIRE(obj1 not_eq nullptr);

    BigObject* obj2 = objPool.acquire();
    REQUIRE(obj2 not_eq nullptr);

    objPool.release(obj1);
    BigObject* obj3 = objPool.acquire();
    REQUIRE(obj3 == obj1);

    BigObject* obj4 = objPool.acquire();
    REQUIRE(obj4 == nullptr);

    objPool.release(obj2);
    BigObject* obj5 = objPool.acquire();
    REQUIRE(obj5 == obj2);

}

==============================================
All tests passed (5 assertions in 1 test case)

C++11 Features

For fun, I listed the C++11 features I used in the source code:

* auto

* std::array<T, int>

* std::begin(), std::end()

* nullptr

* lambda functions

* Brace initialization { }

* std::chrono::system_clock, std::chrono::duration_cast<T>

* std::default_delete<T>

Did I miss any features/classes? How would the C++03 implementations stack up against these C++11 implementations in terms of readability/understandability?

Categories: C++11 Tags: C++11, memory allocation

Easier To Use, And More Expressive

August 20, 2015 bulldozer00 15 comments

One of the goals for each evolutionary increment in C++ is to decrease the probability of an average programmer from making mistakes by supplanting “old style” features/idioms with new, easier to use, and more expressive alternatives. The following code sample attempts to show an example of this evolution from C++98/03 to C++11 to C++14.

In C++98/03, there were two ways of clearing out the set of inner vectors in the vector-of-vectors-of-doubles data structure encapsulated by MyClass. One could use a plain ole for-loop or the std::for_each() STL algorithm coupled with a remotely defined function object (ClearVecFunctor). I suspect that with the exception of clever language lawyers, blue collar programmers (like me) used the simple for-loop option because of its reduced verbosity and compactness of expression.

With the arrival of C++11 on the scene, two more options became available to programmers: the range-for loop, and the std::for_each() algorithm combined with an inline-defined lambda function. The range-for loop eliminated the chance of “off-by-one” errors and the lambda function eliminated the inconvenience of having to write a remotely located functor class.

The ratification of the C++14 standard brought yet another convenient option to the table: the polymorphic lambda. By using auto in the lambda argument list, the programmer is relieved of the obligation to explicitly write out the full type name of the argument.

This example is just one of many evolutionary improvements incorporated into the language. Hopefully, C++17 will introduce many more.

Note: The code compiles with no warnings under gcc 4.9.2. However, as you can see in the image from the bug placed on line 41, the Eclipse CDT indexer has not caught up yet with the C++14 specification. Because auto is used in place of the explicit type name in the lambda argument list, the indexer cannot resolve the std::vector::clear() member function.

8/26/15 Update – As a result of reader code reviews provided in the comments section of this post, I’ve updated the code as follows:

8/29/15 Update – A Fix to the Line 27 begin-begin bug:

Categories: C++, C++11, C++14 Tags: c++, C++11, programming

PeriodicFunction

July 15, 2015 bulldozer00 17 comments

In the embedded systems application domain, there is often the need to execute one or more background functions at a periodic rate. Before C++11 rolled onto the scene, a programmer had to use a third party library like ACE/Boost/Poco/Qt to incorporate that functionality into the product. However, with the inclusion of std::thread, std::bind, and std::chrono in C++11, there is no longer the need to include those well-crafted libraries into the code base to achieve that specific functionality.

The following figure shows the interface of a class template that I wrote to provide users with the capability to execute a function object (of type std::function<void()>) at a periodic rate over a fixed length of time.

Upon instantiation of a PeriodicFunction object, the class executes the Functor immediately and then re-executes it every periodMillis until expiration at durationMillis. The hasExpired(), stop(), and restart() functions provide users with convenient query/control options after construction.

The figure below depicts the usage of the PeriodicFunction class. In the top block of code, a lambda function is created and placed into execution every 200 milliseconds over a 1 second duration. After waiting for those 5 executions to complete, the code restarts the PeriodicFunction to run the lambda every 500 milliseconds for 1.5 seconds. The last code segment defines a Functor class, creates an object instance of it, and places the functor into execution at 50 millisecond intervals over a duration of 250 milliseconds.

The output of a single run of the program is shown below. Note that the actual output matches what is expected: 5 executions spaced at 200 millisecond intervals; 3 executions spaced at 500millisecond intervals; and 5 executions spaced at 50 millisecond intervals.

In case you want to use, experiment with, or enhance the code, the implementation of the PeriodicFunction class template is provided in copy & paste form as follows:


#ifndef PERIODICFUNCTION_H_
#define PERIODICFUNCTION_H_

#include <cstdint>
#include <thread>
#include <mutex>
#include <chrono>
#include <atomic>

namespace pf {

using std::chrono::steady_clock;
using std::chrono::duration;
using std::chrono::milliseconds;

template<typename Functor>
class PeriodicFunction {
public:
  //Initialize the timer state and start the timing loop
  PeriodicFunction(uint32_t periodMillis, uint32_t durationMillis,
      Functor& callback, int32_t yieldMillis = DEFAULT_YIELD_MILLIS) :
      func_(callback),
      periodMillis_(periodMillis),
      expirationTime_(steady_clock::now()
              + steady_clock::duration(milliseconds(durationMillis))),
      nextCallTimeMillis_(steady_clock::now()), yieldMillis_(yieldMillis) {
    //Start the timing loop
    t_ = std::thread { PeriodicFunction::threadLoop, this };
  }

  //Command & wait for the threadLoop to stop
  //before this object gets de-constructed
  ~PeriodicFunction() {
    stop();
  }

  bool hasExpired() const {
    return hasExpired_;
  }

  void stop() {
    isRunning_ = false;
    if (t_.joinable())
      t_.join();
  }

  void restart(uint32_t periodMillis, uint32_t durationMillis) {
    std::lock_guard<std::mutex> lg(stateMutex_);
    //Stop the current timer if needed
    stop();

    //What time is it right at this instant?
    auto now = steady_clock::now();

    //Set the state for the new timer
    expirationTime_ = now + milliseconds(durationMillis);
    nextCallTimeMillis_ = now;
    periodMillis_ = periodMillis;
    hasExpired_ = false;

    //Start the timing loop
    isRunning_ = true;
    t_ = std::thread { PeriodicFunction::threadLoop, this };
  }

  //Since we retain a reference to a Functor object, prevent copying
  //and moving
  PeriodicFunction(const PeriodicFunction& rhs) = delete;
  PeriodicFunction& operator=(const PeriodicFunction& rhs) = delete;
  PeriodicFunction(PeriodicFunction&& rhs) = delete;
  PeriodicFunction& operator=(PeriodicFunction&& rhs) = delete;

private:
  //The function to be executed periodically until we're done
  Functor& func_;

  //The period at which the function is executed
  uint32_t periodMillis_;

  //The absolute time at which we declare "done done!"
  steady_clock::time_point expirationTime_;

  //The next scheduled function execution time
  steady_clock::time_point nextCallTimeMillis_;

  //The thread sleep duration; the larger the value,
  //the more we decrease the periodic execution accuracy;
  //allows other sibling threads threads to use the cpu
  uint32_t yieldMillis_;

  //The default sleep duration of each pass thru
  //the timing loop
  static constexpr uint32_t DEFAULT_YIELD_MILLIS { 10 };

  //Indicates whether the timer has expired or not
  std::atomic<bool> hasExpired_ { false };

  //Indicates whether the monitoring loop is active
  //probably doesn't need to be atomic, but good practice
  std::atomic<bool> isRunning_ { true };

  //Our precious thread resource!
  std::thread t_ { };

  //Protects the timer state from data races
  //between our private thread and the caller's thread
  std::mutex stateMutex_ { };

  //The timing loop
  void threadLoop() {
    while (isRunning_) {
      auto now = steady_clock::now();//What time is it right at this instant?
      std::lock_guard<std::mutex> lg(stateMutex_);
      if (now >= expirationTime_) { //Has the timer expired?
        hasExpired_ = true;
        return;
      }
      else if (now > nextCallTimeMillis_) {//Time to execute function?
        nextCallTimeMillis_ = now + milliseconds(periodMillis_);
        std::bind(&Functor::operator(), std::ref(func_))(); //Execute!
        continue; //Skip the sleep
      }
      //Unselfish sharing; let other threads have the cpu for a bit
      std::this_thread::sleep_for(milliseconds(yieldMillis_));
    }
  }
};
//End of the class definition
}//namespace pf

#endif /* PERIODICFUNCTION_H_ */

One potential improvement that comes to mind is the addition of the capability to periodically execute the user-supplied functor literally “forever” – and not cheating by setting durationMillis to 0xFFFFFFFF (which is not forever). Another improvement might be to support variadic template args (like the std::thread ctor does) to allow any function type to be placed into execution – not just those of type std::function<void()>.

In case you don’t want to type in the test driver code, here it is in copy & paste form:


#include <iostream>
#include <chrono>
#include "PeriodicFunction.h"

int main() {
  //Create a lambda and plcae into execution
  //every 200 millis over a duration of 1 second
  using namespace std::chrono;
  steady_clock::time_point startTime { steady_clock::now() };
  steady_clock::time_point execTime { startTime };
  auto lambda = [&] { //We don't need the ()
    using namespace std::chrono;
    execTime = steady_clock::now();
    std::cout << "lambda executed at T="
              << duration_cast<milliseconds>(execTime - startTime).count()
              << std::endl;
    startTime = execTime;
  };
  pf::PeriodicFunction<decltype(lambda)> pf1 { 200, 1000, lambda };
  while (!pf1.hasExpired()) ;
  std::cout << std::endl;

  //Re-run the lambda every half second for a second
  startTime = steady_clock::now();
  execTime = startTime;
  pf1.restart(500, 1500);
  while (!pf1.hasExpired()) ;
  std::cout << std::endl;

  //Define a stateful Functor class
  class Functor {
  public:
    Functor() :
        startTime_ { steady_clock::now() }, execTime_ { startTime_ } {
    }
    void operator()() {
      execTime_ = steady_clock::now();
      std::cout << "Functor::operator() executed at T="
          << duration_cast<milliseconds>(execTime_ - startTime_).count()
          << std::endl;
      startTime_ = execTime_;
    }
  private:
    steady_clock::time_point startTime_;
    steady_clock::time_point execTime_;
  };

  //Create a Functor object and place into execution
  //every 50 millis over a duration of 250 millis
  Functor myFunction { };
  pf::PeriodicFunction<Functor> pf2 { 50, 250, myFunction, 0 };
  while (!pf2.hasExpired()) ;
}

Categories: C++11 Tags: programming

The Move That Wasn’t

June 30, 2015 bulldozer00 3 comments

While trying to duplicate the results I measured in my “Time To Get Moving!” post, an astute viewer posted this comment:

For your convenience, I re-list the code in question here for your inspection:


#include <iostream>
#include <vector>
#include <utility>
#include <chrono>
using namespace std;

struct Msg{
  vector<double> vDoubs;
  Msg(const int NUM_ELS) : vDoubs(NUM_ELS){
  }
};

int main() {
//Construct a big Msg object!
const int NUM_ELS = 10000000;
Msg msg1{NUM_ELS};

//reduce subsequent code verbosity
using std::chrono::steady_clock;
using std::chrono::duration_cast;
using std::chrono::microseconds;

//Measure the performance of
//the "free" copy operations
auto tStart = steady_clock::now();
Msg msg2{msg1}; //copy ctor
msg1 = msg2; //copy assignment
auto tElapsed = steady_clock::now() - tStart;
cout << "Copy ops took "
     << duration_cast<microseconds>(tElapsed).count()
     << " microseconds\n";

//Measure the performance of
//the "free" move operations
tStart = steady_clock::now();
Msg msg3{std::move(msg1)}; //move ctor
msg1 = std::move(msg3); //move assignment
tElapsed = steady_clock::now() - tStart;
cout << "Move ops took "
     << duration_cast<microseconds>(tElapsed).count()
     << " microseconds\n";

cout << "Size of moved-from object = "
     << msg3.vDoubs.size();
} //"free" dtor is executed here for msg1, msg2, msg3

Sure enough, I duplicated what my friend Gyula discovered about the “move” behavior of the VS2013 C++ compiler:

Intrigued by the finding, I dove deeper into the anomaly and dug up these statements in the fourth edition of Bjarne Stroustrup’s “The C++ Programming Language“:

Since the Msg structure in the code listing does not have any copy or move operations manually defined within its definition, the compiler is required to generate them by default. However, since Bjarne doesn’t state that a compiler is required to execute moves when the programmer explicitly directs it to with std::move() expressions, maybe the compiler isn’t required to do so. However, common sense dictates that it should. Hopefully, the next update to the VS2013 compiler will do the right thing – like GCC (and Clang?) currently does.

Categories: C++11 Tags: Bjarne Stroustrup, C++11, move semantics, std::move

Time To Get Moving!

June 15, 2015 bulldozer00 8 comments

Prior to C++11, for every user-defined type we wrote in C++03, all standards-conforming C++ compilers gave us:

a “free” copy constructor
a “free” copy assignment operator
a “free” default constructor
a “free” destructor

The caveat is that we only got them for free if we didn’t manually override the compiler and write them ourselves. And unless we defined reference or pointer members inside of our type, we didn’t have to manually write them.

Starting from C++11 on, we not only get those operations for free for our user-defined types, we also get these turbo-boosters:

a “free” move constructor
a “free” move assignment operator

In addition, all of the C++ standard library containers have been “move enabled“.

When I first learned how move semantics worked and why this new core language feature dramatically improved program performance over copying, I started wondering about user-defined types that wrapped move-enabled, standard library types. For example, check out this simple user-defined Msg structure that encapsulates a move-enabled std::vector.

Logic would dictate that since I get “move” operations from the compiler for free with the Msg type as written, if I manually “moved” a Msg object in some application code, the compiler would “move” the vDoubs member under the covers along with it – for free.

Until now, I didn’t test out that deduction because I heard my bruh Herb Sutter say in a video talk that deep moves came for free with user-defined types as long as each class member in the hierarchical composition is also move-enabled. However, in a more recent video, I saw an excellent C++ teacher explicitly write a move constructor for a class similar to the Msg struct above:

D’oh! So now I was confused – and determined to figure out was was going on. Here is the program that I wrote to not only verify that manually written “move” operations are not required for the Msg struct, but to also measure the performance difference between moving and copying:

First, the program built cleanly as expected because the compiler provided the free “move” operations for the Msg struct. Second, the following, 5-run, output results proved that the compiler did indeed perform the deep, under the covers, “move” that my man Herb promised it would do. If the deep move wasn’t executed, there would have been no noticeable difference in performance between the move and copy operations.

From the eye-popping performance difference shown in the results, we should conclude that it’s time to start replacing copy operations in our code with “move” operations wherever it makes sense. The only thing to watch out for when moving objects from one place to another is that in the scope of the code that performs the move, the internal state of the moved-from object is not needed or used by the code following the move. The following code snippet, which prints out 0, highlights this behavior.

Categories: C++11 Tags: C++11, Herb Sutter, move semantics, programming

Acting In C++

May 21, 2015 bulldozer00 7 comments

Because they’re a perfect fit for mega-core processors and they’re safer and more enjoyable to program than raw, multi-threaded, systems, I’ve been a fan of Actor-based concurrent systems ever since I experimented with Erlang and Akka (via its Java API). Thus, as a C++ programmer, I was excited to discover the “C++ Actor Framework” (CAF). Like the Akka toolkit/runtime, the CAF is based on the design of the longest living, proven, production-ready, pragmatic, actor-based, language that I know of – the venerable Erlang programming language.

Curious to check out the CAF, I downloaded, compiled, and installed the framework source code with nary a problem (on CentOS 7.0, GCC 4.8.2). I then built and ran the sample “Hello World” program presented in the user manual:

As you can see, the CAF depends on the use one of my favorite C++11 features – lambda functions – to implement actor behavior.

The logic in main() uses the CAF spawn() function to create two concurrently running, message-exchanging (messages of type std::string), actors named “mirror” and “hello_world“. During the spawning of the “hello_world” actor, the main() logic passes it a handle (via const actor&) to the previously created “mirror” actor so that “hello_world” can asynchronously communicate with “mirror” via message-passing. Then, much like the parent thread in a multi-threaded program “join()“s the threads it creates to wait for the threads to complete their work, the main() thread of control uses the CAF await_all_actors_done() function to do the same for the two actors it spawns.

When I ran the program as a local C++ application in the Eclipse IDE, here is the output that appeared in the console window:

The figure below visually illustrates the exchange of messages between the program’s actors during runtime. Upon being created, the “hello_world” actor sends a text message (via sync_send()) containing “Hello World” as its payload to the previously spawned “mirror” actor. The “hello_world” actor then immediately chains the then() function to the call to synchronously await for the response message it expects back from the “mirror” actor. Note that a lambda function that will receive and print the response message string to the console (via the framework’s thread-safe aout object) is passed as an argument into the then() function.

Looking at the code for the “mirror” actor, we can deduce that when the framework spawns an instance of it, the framework gives the actor a handle to itself and the actor returns the behavior (again, in the form of a lambda function) that it wants the framework to invoke each time a message is sent to the actor. The “mirror” actor behavior: prints the received message to the console, tells the framework that it wants to terminate itself (via the call to quit()), and returns a mirrored translation of the input message to the framework for passage back to the sending actor (in this case, the “hello_world” actor).

Note that the application contains no explicit code for threads, tasks, mutexes, locks, atomics, condition variables, promises, futures, or thread-safe containers to futz with. The CAF framework hides all the underlying synchronization and message passing minutiae under the abstractions it provides so that the programmer can concentrate on the application domain functionality. Great stuff!

I modified the code slightly so that the application actors exchange user-defined message types instead of std::strings.

Here is the modified code:

struct ReqMessage {
  int32_t anInt32_t;
  double aDouble;
  string aString;
};

struct RespMessage {
  enum class AckNack{ACK, NACK};
  AckNack response;
};

void printMsg(event_based_actor* self, const ReqMessage& msg) {
  aout(self) << "anInt32_t = " << msg.anInt32_t << endl
             << "aDouble = " << msg.aDouble << endl
             << "aString = " << msg.aString << endl
             << endl;
}

void fillReqMessage(ReqMessage& msg) {
  msg.aDouble = 55.5;
  msg.anInt32_t = 22;
  msg.aString = "Bulldozer00 Rocks!";
}

behavior responder(event_based_actor* self) {
  // return the (initial) actor behavior
  return {
  // a handler for messages of type ReqMessage
  // and replies with a RespMessage
  [=](const ReqMessage& msg) -> RespMessage {
    // prints the content of ReqMessage via aout
    // (thread-safe cout wrapper)
    printMsg(self, msg);
    // terminates this actor *after* the return statement
    // ('become' a quitter, otherwise loops forever)
    self->quit();
    // reply with an "ACK"
    return RespMessage{RespMessage::AckNack::ACK};
  }};
}

void requester(event_based_actor* self, const actor& responder) {
  // create & send a ReqMessage to our buddy ...
  ReqMessage req{};
  fillReqMessage(req);
  self->sync_send(responder, req).then(
  // ... wait for a response ...
  [=](const RespMessage& resp) {
    // ... and print it
    RespMessage::AckNack payload{resp.response};
    string txt =
      (payload == RespMessage::AckNack::ACK) ? "ACK" : "NACK";
    aout(self) << txt << endl;
  });
}

int main() {
  // create a new actor that calls 'responder()'
  auto responder_actor = spawn(responder);
  // create another actor that calls 'requester(responder_actor)';
  spawn(requester, responder_actor);
  // wait until all other actors we have spawned are done
  await_all_actors_done();
  // run cleanup code before exiting main
  shutdown();
}

And here is the Eclipse console output:

Categories: C++, C++11 Tags: Akka, CAF, Erlang, lambdas

Whence Actors?

May 18, 2015 bulldozer00 Leave a comment

While sketching out the following drawing, I was thinking about the migration from sequential to concurrent programming driven by the rise of multicore machines. Can you find anything wrong with it?

Right out of the box, C++ provided Objects (via classes) and procedures (via free standing functions). With C++11, standardized support for threads and tasks finally arrived to the language in library form. I can’t wait for Actors to appear in….. ?

Categories: C++11, C++14, C++17 Tags: actors, C++11, C++17, concurrency

Make Simple Tasks Simple

February 5, 2015 bulldozer00 2 comments

In the slide deck that Bjarne Stroustrup presented for his “Make Simple Tasks Simple” CppCon keynote speech, he walked through a progression of refactorings that reduced an (arguably) difficult-to-read, multi-line, C++98 code segment (that searches a container for an element matching a specific pattern) into an expressive, C++11 one liner:

Some people may think the time spent refactoring the original code was a pre-mature optimization and, thus, a waste of time. But that wasn’t the point. The intent of the example was to illustrate several features of C++11 that were specifically added to the language to allow sufficiently proficient programmers to “make simple tasks simple (to program)“.

Note: After the fact, I discovered that “MemVec” should be replaced everywhere with “v” (or vice versa), but I was too lazy to fix the pic :). Can you find any other bugs in the code?