Tag Archives: C++

libsigc++ 3.0: Very variadic

I have just released libsigc++ 2.99.1, the first release of the libsigc++-3.0 API, which installs in parallel with the existing libsigc++-2.0 API. This is libsigc++ using much more modern C++, specifically C++14. The API itself is almost completely unchanged, but the implementation is now clearer and easier to contribute to. I’m rather proud of my work, but I’m quite sure that there’s still room for improvement.

Unfortunately, glibmm and gtkmm can’t use libsigc++-3.0 until they have their own parallel-installable versions. That’s not likely to happen until the mythical GTK+ 4 happens. Even though libsigc++ is mostly all templates in headers, its symbols do appear in the linker symbols for gtkmm method signatures, so changes to libsigc++ can break gtkmm ABI. I do have a sigc3 branch of glibmm in git just to show that it works.

With around 150 commits, I gradually refactored libsigc++ to use variadic templates with C++14 instead of generating code from nasty m4 files. Over the years, we have built up quite a large set of regression tests, run during “make check” and “make distcheck”, including various corner cases. Without these tests, the refactoring would have been much harder. With the tests, I could make small commits, knowing that each commit had not broken anything. When something did seem wrong, I could add a test and go back through the git history to find the problem, sometimes splitting commits into even smaller changes. I did a lot of that, rebasing several times, sometimes stopping and starting again after help from Jonathan Wakeley and Marcin Kolny. Those tests give me much more confidence in the end result than I could have if I had chosen to just reimplement the entire API from scratch.

The API is almost all .h files and according to wc, there are 24,145 lines of code in those files in libsigc++ 2.7.1 (after make), and  6,507 in libsigc++ 2.99.1. So there is now only 27% as much code.

This is possible because:

  • C++ variadic templates allow us to have one class or function where we previously had to generate multiple versions for 1 to 6 function parameters, sometimes with additional versions for const and non-const parameters or const and non-const (and volatile and non-volatile) member function pointers.
  • decltype(auto) lets us avoid lots of templated type traits just to correctly specify the correct type for methods.
  • The standard C++ type traits, such as std::conditional<>, std::result_of<>, std::is_base_of<>, std::remove_volatile<> and std::is_const<> let us write very generic code, sometimes replacing our own type traits. I added some more to libsigc++ to get compile-time type traits for member method pointers.
  • template aliases (like typedefs for templates) avoided the need for multiple functor classes deriving from a common base, even though I ended up not needing most of these aliases either.
  • I replaced our sigc::ref() and sigc::reference_wrapper() with std::ref() and std::reference_wrapper<>. Presumably these share a common history.
  • I removed some configure checks and ifdef-ed workarounds for older MSVC++ and Sun Forte compilers. Hopefully they aren’t necessary now, but we will see.

For some adaptors  I used the tuple utilities that I’ve been working on recently, for instance in sigc::bind(). These are copied into the libsigc++ source code, and I’d particularly welcome improvements to them in the form of patches or github pull requests.

I’m still not completely happy with all the overloads we have for sigc::mem_fun(), to take member functions that are non-const, const, non-volatile and/or volatile, but I have some things still to try. We might also remove several by not allowing both mem_fun(pointer, func) and mem_fun(reference, func).

Please do suggest ways to simplify the code yet further.

Playing with C++ Variadic Templates and Dynamic Programming

Dynamic Programming C++ base classes

Over the last few months I’ve been exploring various dynamic programming (DP) algorithms, trying to get a feel for how to choose suitable sub-problems and the dimensions of those sub-problems, and when a top-down or bottom-up strategy is best. In principle, dynamic programming lets you break a problem down into suitable sub-problems, as long as they obey the “principle of optimality“, meaning you can use some broad state (result) of the sub-problem calculation, without needing the actual specifics of each solution.

I’ve also been playing lots with variadic templates in modern C++.

My little murrayc-dp-algorithms project is the result. It has base classes for both bottom-up DP and top-down DP and several examples that use them. The end result is only slightly useful, and is currently far from optimal for performance. But I still think it’s an interesting exercise that shows how these various DP algorithms really relate to each other. It should also explain how I came up with those examples in my earlier blog post.

Top-Down DP

Let’s start with top-down dynamic-programming, because it’s generally simpler. Calculating Fibonacci numbers is the classic example. This is clearly inefficient because every recursive call recalculates almost the same lesser Fibonacci numbers, twice. This is exponential, with time complexity of roughly O(2^n).

int f(int i) {
  if (n == 0)
    return 0;
  else if (n == 1)
    return 1;

  return f(i - 1) + f(i -2);
}

One simple way to avoid the wasteful recalculation is to cache the results and reuse them. That’s generally called Memoization. For Fibonacci number calculation, this is still not the best way, but it’s a useful example.

DpTopDownBase provides a framework for dynamic programming with memoization. You provide a calc_subproblem() implementation, which should use get_subproblem(). That get_subproblem() will return a previously-calculated result if it exists, or recursively call calc_subproblem().

1-Dimensional Top-Down DP

For instance, the top-down DpFibonacci class is declared like so, deriving from DpTopDownBase:

class DpFibonacci
  : public murraycdp::DpTopDownBase< unsigned long, // subproblem type unsigned int // i > {
public:
  explicit DpFibonacci(unsigned int n)
  : n_(n) {}

This specifies that its calc_subproblem() (and the base class’ get_subproblem())will take an unsigned int and return an unsigned long:

unsigned long
calc_subproblem(
  type_base::type_level level,
  unsigned int i) const override {
  // Base cases:
  if (i == 0) {
    return 0;
  } else if (i == 1) {
    return 1;
  }

  return get_subproblem(level, i - 1)
    + get_subproblem(level, i - 2);
}

The level parameter is just for optional debug output, to show how deep we are in the recursive call stack.

It also implements get_goal_cell() so that the base class knows which value of n to use to get the result f(n). This just returns the n provided to DpFibonacci’s constructor.

We can then instantiate the class and call calc():

DpFibonacci dp(n);
auto result = dp.calc();

The base DpTopDownBase::calc() method calls get_goal_cell() and calls calc_subproblem() for that cell. Because our calc_subproblem() implementation uses get_subproblem(), it will automatically avoid recalculating already-calculated sub-problems.

That DpTopDownBase::calc() method is a fairly simple example of the usefulness of C++ variadic template parameters to call one variadic method with the result of another variadic method, using tuples and std::experimental::apply(). And it’s type-safe at compile time all the way.

2-Dimensional Top-Down DP

But DpTopDownBase isn’t just good for a 1-dimensional cache, based on one parameter. The wonder of C++ variadic templates lets us specify multiple parameters for calc_subproblem and get_goal_cell(). Those overridden methods take just the right parameter types and just the right number of parameters.

And this is where top-down DP can be useful, because it might not be necessary to calculate all possible sub-problems for all possible values of these parameters.

For instance, in the Knapsack Problem, we have a knapsack with a certain maximum weight capacity, and we have several items of various weights and values. We want to find the set of items to put in the knapsack that maximises the value. We give each item an index number.

In the classic Knapsack DP algorithm, we then have 2 dimensions (or parameters) for each sub-problem: A count of items and a maximum weight. So if we have 5 items and total capacity of 100Kg, we can calculate the max value, f(5, 100Kg) by using other sub-problems such as f(4, 80Kg) along the way. It’s not likely that we’ll need to calculate all 5 * 100 possible subproblems, though it is likely that we’ll reuse some sub-problem calculations.

For instance, the top-down DpKnapsack class derives from the base DpTopDownBase like so:

class DpKnapsack
: public murraycdp::DpTopDownBase<
    SubSolution,
    SubSolution::type_vec_items::size_type,
    Item::type_weight
  > {

Again it overrides calc_subproblem() and overrides get_goal_cell(). This time it takes 2 parameters instead of just i:

SubSolution
calc_subproblem(
  type_level level,
  type_size items_count,
  type_weight weight_capacity) const override {

And it can call get_subproblem() with those 2 parameters, letting the base class worry about how to cache the sub-problems and when to recalculate them. (This currently uses a rather inefficient generic hashing function for std::tuple<>s. )

Notice that we return a custom SubSolution class for the sub-problem result instead of just a number. This lets us store both the maximum value for that sub-problem and the items used in that sub-problem solution. That helps us to reconstruct the complete set of items for the sub-solutions along the way. However, that is inefficient because we won’t need all sub-solution item sets . It should instead be possible to write some generic code to  reconstruct the solution by working backwards from the result of the goal. I’ll try that out sometime.

Again we can just instantiate the class and call calc():

//Items with their values and weights:
DpKnapsack::type_vec_items items{
  {13, 15}, {12, 14}, {2, 13}, {10, 12},
  {1, 11}, {7, 10}, {6, 9}, {5, 8},
  {11, 7}, {14, 6}, {9, 5}, {4, 4}, {3, 3},
  {22, 2}, {8, 1}};
DpKnapsack::type_value weight_capacity = 45;

DpKnapsack dp(items, weight_capacity);
const auto result = dp.calc();
std::cout << "solution: value: " << result.value << std::endl
  << "with items: ";
print_vec(result.solution);
std::cout << std::endl;

The top-down DpContextFreeGrammarParser , top-down DpMakeChange , top-down DpEditDistance , top-down DpParenthesization do much the same thing. The top-down DpTsp uses 3 dimensions and overrides calc() to choose the minimum result over all the sub-problems at the top level instead of just letting the default base calc() use the result of one goal sub-problem as the overall result.

I think it’s nice to separate the real heart of each DP algorithm from the code that does the boring caching and checking.

Bottom-Up DP

In bottom-up DP, we calculate every single possible sub-problem result, starting from our base case. Even if we won’t need every single possible result, this is often more efficient because we sometimes know when we can discard whole sets of sub-problems. For instance, looking again at the classic Fibonacci Numbers example, we only really need the last 2 sub-problem results:

int f(int n) {
  if (n == 0)
    return 0;
  else if (n == 1)
    return 1;

  int result = 0;
  int prev1 = 1; // f(1) = 0;
  int prev2 = 0; // f(0) = 0;
  for (int i = 1; i < n; ++i) {
    result = (prev1 + prev2);
    prev2 = prev1;
    prev1 = result;
  }

  return result;
}

So we don’t need to cache all i results along the way. For a multi-dimensional DP algorithm, that can save us a lot of space.

1-Dimensional Bottom-Up DP

The bottom-up DpFibonacci class derives from DpBottomUpBase, specifying:

  • the number of sub-problems to keep along the way: 2 to keep f(n-1) and f(n-2).
  • the type of each sub-problem’s (and the final) calculation result: long for the result of f(5), for instance.
  • the type of the input: int for 5, for instance, in f(5).
class DpFibonacci
: public murraycdp::DpBottomUpBase<
    2, // count of sub-problems to keep
    unsigned long, // sub problem type
    unsigned int // i
  > {

DpFibonacci again overrides calc_subproblem() and get_goal_cell(), with exactly the same implementation as the top-down version.

The base class’ calc() method then just iterates from 0 to n, calling the virtual calc_subproblem() method and then returning the result for the goal cell, based on calling the virtual get_goal_cell() method.

DpBottomUpBase keeps just the right number of previously-calculated sub-problems, while reusing the discarded sub-problem storage, by using my (funky) circular_vector.

DpTripleStep is very similar, though it needs to keep the 4 previous sub-problems instead of 2, and of course it does a different calculation per sub-problem.

2-Dimensional Bottom-Up DP

The bottom-up DpKnapsack class is again much like the top-down DpKnapsack implementation, but it again specifies that it only needs the previous 2 sets of sub-problems.

class DpKnapsack
  : public murraycdp::DpBottomUpBase<
    2, // count of sub-problems to keep.
    SubSolution,
    SubSolution::type_vec_items::size_type,
    Item::type_weight
  > {

Where the bottom-up DpFibonacci passed one count to the base class’ constructor, the bottom-up DpKnapsack passes two, one for each calc_subproblem() parameter:

DpKnapsack(
  const type_vec_items& items,
  type_weight weight_capacity)
  : DpBottomUpBase(
      items.size() + 1,
      weight_capacity + 1),
    items_(items),
    weight_capacity_(weight_capacity) {}

The base DpBottomUpBase class uses these constructor parameters as ends of ranges, assuming that each range starts at 0. It will then iterate over the first range, iterating over the second range each time, like a nested for loop:

for (int items_count = 0; items_count < MAX_ITEMS; ++items_count) {
  for (int weight_capacity = 0; weight_capacity < MAX_WEIGHT_CAPACITY; ++ weight_capacity) {
    ...
}

There is no way in C++ to nest an arbitrary number of for loops. But we can use variadic templates to declare an arbitrarily-nested vector. And thanks to my funky variadic templates for vectors of vector (of vectors of vectors, etc), the base class can generically call calc_subproblem() as if it had nested for loops, regardless of how much nesting there should be. My for_vector_of_vectors() takes the (nested) vector, and a series of start and end parameters for each level and eventually uses std::experimental::apply() to call the functor (a lambda, in this case) for each combination. To get that series of start and end parameters, it uses my funky tuple_interlace() to alternate starts (0) with ends.

The bottom-up DpEditDistance and bottom-up DpSubstringMatching do much the same thing.

Problems with DpBottomUpBase

The bottom-up base class has some serious limitations:

It only lets us discard older sub-problems in the first dimension, such as the Knapsack sub-problems for items less than n-2. But you might understand enough about your algorithm to know other sets of sub-problems that can be discarded at various times, based on less simple criteria. This would make the best use of space while avoiding recalculation. Maybe some recalculation would be OK as long as you could minimize it.

Also, it is limited to a simple iteration, using a range from start to end, using a simple operator++ to move to the next iteration. But sometimes you want to do something a little more complicated than an iterating for loop. For instance, my top-down DpTsp class finds a solution for the Travelling Salesman Problem by manipulating sub-sets in its calc_subproblem(). I could use gosper’s hack to create a suitable n-choose-k subset iterator, but it still needs to iterate over a further set of subsets missing each item from the parent subset, and I suppose that would be doable with another custom iterator. But the start and end iterators at each level would then have to be instantiated based on the values of the parent level, and cannot just be supplied before the whole calc() starts. Maybe I could do something with lambda functions to let these iterators be instantiated along the way.

C++ Core Guidelines: Ownership and Parameters

I’ve been catching up on the wonderful CppCon talks from 2014 and 2015. It looks like we are getting towards a consensus about how to write modern C++, though I guess we are still early in the process,  and still coming up with as many questions as answers.

When this all settles down, it might need Scott Meyers to come out of retirement from C++ to combine and update his Effective C++ books. Maybe he’d do it if a really modern C++ emerged from it all.

Talks, Guidelines, Smartpointers and Ownership

Some of the most important talks focus on when pointers, or smartpointers, have “ownership”. Three talks in particular progress towards a rather utopian vision in which (shared) object ownership is clearer in various parts of your code and in which there are no object lifetime bugs:

  • CppCon 2014: Herb Sutter “Back to the Basics! Essentials of Modern C++ Style”  (slides)
    Herb discussed some best practices, particularly around auto and smartpointers.
  • CppCon 2015: Bjarne Stroustrup: “Writing Good C++14” (slides)
    Bjarne discussed how we can agree on guidelines that static analysis tools can use, and how following the rules can let static analysis tools find more bugs. For example, by using types to indicate whether a (smart)pointer is owning. As in Herb’s 2014 talk, he suggested that we should get() the raw pointer out of a shared_ptr<> before passing it to a function that doesn’t really want to share ownership. He also introduced the GSL, which has owned<> for when you can’t use shared_ptr<> or unique_ptr<>.
  • CppCon 2015: Herb Sutter: “Better C++14 by Default” (slides)
    The next day, Herb discussed many more ways that static analysis tools could avoid several classes of bugs, and how we can break the rules down into sets called profiles. He suggested some annotations, via the C++ generalized attributes syntax, already in C++11, which could let the tools find even more problems, but showed that we can make good assumptions by default. He showed a Microsoft static analysis tool doing this already.

I like the idea of tools telling me what to do. C++ is more complicated than ever, though also better than ever, and there’s more scope than ever for people to disagree about how to use it. At some point I like to move on and get stuff done. I’m sceptical that we can ever really make our code as safe as Bjarne and Herb suggest, but I’m convinced that we could catch several new categories of problems before runtime, and I want to try.

C++ Core Guidelines, Smartpointers, and Ownership

The C++ Core Guidelines are currently a bit of a muddle. There are many rules that seem to overlap too much, often with insubstantial examples. Too often there are admonitions about what not to do rather than examples of how to do things properly. Trying to comply with one rule will sometimes violate another rule, with no clues about resolving the conflict. The github issues and commits show that conflicts are being clarified, but the whole thing seems so big and spread so thinly that I guess it will be a long time until it feels consistent and cohesive.

The advice often seems to assume that you are writing application code rather than reusable library code, though large real world systems are rarely just one or the other. Therefore there is little consideration of ABI stability. For instance, it suggests “F.7: For general use, take T* or T& arguments rather than smart pointers” and “I.11: Never transfer ownership by a raw pointer (T*)” (and more overlapping advice) but what happens when you need to change the implementation and need to share ownership of the passed instance? You can’t just change a T* to a std::shared_ptr<T> without making existing applications crash. The document suggests GSL’s owned<> in this case, though I guess it will still affect shared library symbol names for methods, and it won’t actually do any reference counting. So in a library where you know that instances get passed around a lot, and there’s a reasonable chance that ownership will be shared, it doesn’t seem unreasonable to default to std::shared_ptr<> even if that obscures the actual ownership in some parts of the code.

Furthermore, when looking at other peoples’ code, how do we distinguish the foo(const T*) method that probably doesn’t keep a copy of the pointer, from all the foo(const T*) methods that might do anything at all because their authors never heard of the guidelines?

Raw pointers as non-owners

As a start, I’m playing with this idea of not overusing smartpointers and letting the lack of a smartpointer indicate non-involvement in the ownership. For instance:

std::shared_ptr<Something> something = get_something();

// This takes a shared_ptr:
do_something_and_keep_it(something); 

// This doesn't care about ownership:
do_something_and_dont_keep_it(something.get());

I’m ready to try this out in an application such as Glom, but fearful of doing it in a library API, such as gtkmm.

Bear in mind that, In the past,

  • I have been one of the maintainers that made glibmm and gtkmm use Glib::RefPtr<>, an intrusive reference-counting smartpointer, for all our wrapper classes of reference-counted glib/GTK+ objects (not widgets). For instance, see the doxygen graph of glibmm/giomm classes deriving from Glib::ObjectBase, and there are many in gtkmm too (not the widgets).
  • I converted most of my Glom codebase to use all of its data structure classes by shared_ptr (a custom shared_ptr<> before C++11) rather than trying to decide when they can be deleted.

These have worked out very well, partly because “only use this via this smartpointer” is a nice simple rule to follow. I’ve resisted letting anyone easily get the underlying pointer out of a RefPtr, to preserve this simplicity. Now that there’s this new (at least to me) consensus, I guess we’ll actually add RefPtr::get() and RefPtr::operator*() soon.

But I still hesitate, because I don’t expect the typical developer to be careful enough until there are tools to stop misuse and I don’t expect the typical developer to use the tools even when they exist. So the price of a better experience for experts may be more ways for non-experts to make mistakes. The gtkmm API has tried to be a pure and modern C++ while also being pragmatic and obvious, but it will be hard to keep the balance in this case.

Trying out raw pointers as non-owners in Glom

I tried this out in an “ownership” branch of Glom. I first looked at some static (standalone) functions that took a std::shared_ptr<> because those were unlikely to store any state between calls. Here is the fairly small patch that changed those parameters.

I then looked at some other functions that took a shared pointer to const (std::shared_ptr<const Something>), suggesting that the implementation just wanted to look at it and then forget it. Changing one function’s parameter forces you to look at the calling functions, where you might find that the shared_ptr<> is itself a parameter that should be changed to a non-owning raw pointer. I followed this logic up the call stack for a while and ended up with a larger patch. I forget which function I tried to change first.

I like that this reduces the amount of code where I need to worry about circular references, which I would break with a std::weak_ptr<>. But I still worry enough to wish there was some way to automatically identify where these might happen.

I’m still a little uncomfortable with this mix of shared_ptr<> and raw pointers, even in purely application code, so I’m not ready to put this in git master just yet. But I think I’ll come around to the idea.

std::unique_ptr parameters to take ownership

The C++ Core Guidelines, suggest “R.32: Take a unique_ptr<widget> parameter to express that a function assumes ownership of a widget” though they currently tell you to avoid std::move() (update: fixed now) which you’d need when calling such a method with a named variable. They don’t mean  gtkmm Widget, but I do.

Watch Herb Sutter’s 2014 talk for the rationale for taking std::unique_ptr<>, but not all types, by value. It’s also mentioned in the C++ Core Guidelines “F.15: Prefer simple and conventional ways of passing information” section.

For instance:

void
SomeClass::take_something(std::unique_ptr<Something> something) {
...
}

...
auto something =
  std::make_unique<Something>("something", 2, 'a');
something->set_extra_foos(foo1, foo2);
bar.take_something(std::move(something));

// Or without the intermediate variable:
bar.take_something(
  std::make_unique<Something>("something", 2, 'a'));

// Or again without the intermediate variable:
bar.take_something(
  create_something("something", 2, 'a'));

Trying out std::unique_ptr<> parameters to take ownership in gtkmm

gtkmm has Gtk::manage(), which roughly means “let the container widget worry about deleting this child widget later”. For instance, you can new a Gtk::Button and call Gtk::manage() on that button before you add() it to a Gtk::Grid. It works well and it’s very simple. Incidentally, all it really does is enable the default behaviour for GtkWidgets – the only clever stuff about Gtk::manage() is how we change the default for gtkmm.

Passing a std::unique_ptr<> to add() could have much the same meaning, and would have the advantage of using standard C++ API and a known convention. I have a patch in bugzilla that does this and there’s been some discussion on gtkmm-list.

However, the result is sometimes hard to love. You have to move the add() to later in your code, to avoid a use after std::move(). And, for some of our API, you really do need to use the pointer right after you’ve add()ed it to a container. So you need to take a temporary “observing” (non-owning) raw pointer before the add().

Currently, the worst part of the gtkmm API for this, which has never felt
quite right, though we can blame GTK+, is this:

auto cell = std::make_unique<Gtk::CellRendererText>();
cell->property_style() = Pango::STYLE_ITALIC

auto cell_nonowning = cell.get(); // yuck
auto column = std::make_unique<Gtk::TreeViewColumn>("something", std::move(cell));
column->add_attribute(cell_nonowning->property_text(), columns.title);
column->add_attribute(cell_nonowning->property_style_set(), columns.italic);

treeview.append_column(std::move(column));

I guess we’ll add the add(std::unique_ptr<>) overloads to allow this, but not deprecate Gtk::manage() until we feel more comfortable with how they make us rearrange our code.

libxml++ 3.0 soon

Kjell Ahlstedt has done some work on a new parallel-installable version of the libxml++ API, to fix some mostly-minor but annoying things that needed ABI breaks. You can see the code in libxml++’s git master.

He’s ready to release a stable .0 version soon. So now is the time to mention anything about the API that needs to be changed, please. People too often only look at these things right after the first stable release, when it’s too late.

C++ tuple utils

I’ve recently been been writing lots of modern C++ code with variadic templates. For instance , I’ve been trying to make libsigc++ use variadic templates instead of being a mess of generated code.

I often find myself needing utility functions and type traits to manipulate tuples, but the C++ standard library still only offers std::tuple_cat(). Writing these is awkward and that often stops me from experimenting quickly.

So I’m gradually gathering this code together in a little murrayc-tuple-utils library. I’d gladly change the name if this gets any use. Really, I’m surprised that nothing like this seems to exist already, apart from as part of larger projects such as boost::hana and boost::fusion. But boost is a really awkward dependency and those are larger projects with much grander goals.

So far murrayc-tuple-utils has:

  • tuple_cdr(): Removes the first element.
  • tuple_start<N>(): Takes the first N elements.
  • tuple_end<N>(): Takes the last N elements.
  • tuple_subset<pos, len>: Takes len elements, starting at pos.
  • tuple_interlace<T1, T2>: Takes elements from each tuple, interlacing (or zipping) them together.

For each of them, there are also type traits, such as tuple_type_cdr<>::type, though these are not so necessary now that C++14 has decltype(auto) for return types.

These are just enough code to make things work enough for me when I’m in a rush. I’m sure they can be improved, and maybe this is how to get those patches and pull requests.

The project has a complete autotools build structure, with “make check” tests, Doxygen documentation building, a pkg-config .pc file, etc, so you can try it out, improve it, and add to it, without having to mess around with that stuff.

Playing with xdg-app for PrefixSuffix and Glom

xdg-app lets us package applications and their dependencies together for Linux, so a user can just download the application and run it without either the developer or the user worrying about whether the correct versions of the dependencies are on the system. The various “runtimes”, such as the GNOME runtime, get you most of the way there, so you might not need to package many extra dependencies.

I put a lot of work into developing Glom, but I could never get it in front of enough non-technical users. Although it was eventually packaged for the main Linux distros, such as Ubuntu and Fedora, those packages were almost always broken or horribly outdated. I’d eagerly fix bugs reported by users, only to wait 2 years for the fix to get into a Linux distro package.

At this point, I probably couldn’t find the time to work more on Glom even if these problems went away. However, I really want something like xdg-app to succeed so the least I could do is try it out. It was pleasantly straightforward and worked very well for me. Alexander Larsson was patient and clear whenever I needed help.

xdg-app- builder

I first tried creating an xdg-app package for PrefixSuffix, because it’s a very simple app, but one that still needs dependencies that are not in the regular xdg-app GNOME runtime, such as gtkmm, glibmm and libsigc++.

I used xdg-app-builder, which reads a JSON manifest file, which lists your application and its dependencies, telling xdg-app-builder where to get the source tarballs and how to build them. Wisely, it assumes that each dependency can be built with the standard configure/make steps, but it also has support for CMake and lets you add in dummy configure and Makefile files. xdg-app-builder’s documentation is here, though I really wish the built HTML was online so I could link to it instead.

Here is the the manifest.json file for PrefixSuffix. I can run xdg-app-builder with that manifest, like so:

xdg-app-builder --require-changes ../prefixsuffix-xdgapp manifest.json

xdg-app-builder then builds each dependency in the order of its appearance in the manifest file, installing the files in a prefix in that prefixsuffix-xdgapp folder.

You also need to specify contexts in the “finish-args” though they aren’t explicitly called contexts in the manifest file. For instance, you can give your app access to the network subsystem or the host filesystem subsystem.

Creating the manifest.json file feels a lot like creating a build.gradle file for Android apps, where we would also list the base SDK version needed, along with each version of each dependency, and what permissions the app needs (though permissions are partly requested at runtime now in Android).

Here is the far larger xdg-app-builder manifest file for Glom, which I worked on after I had PrefixSuffix working. I had to provide many more build options for the dependencies and cleanup many more installed files that I didn’t need for Glom. For instance, it builds both PostgreSQL and MySQL, as well as avahi, evince, libgda, gtksourceview, goocanvas, and various *mm C++ wrappers. I could have just installed everything but that would have made the package much larger and it doesn’t generally seem safe to install lots of unnecessary binaries and files that I wouldn’t be using. I do wish that JSON allowed comments so I could explain why I’ve used various options.

You can test the app out like so:

$ xdg-app build ../prefixsuffix-xdgapp prefixsuffix
... Use the app ...
$ exit

Or you can start a shell in the xdg-app environment and then run the app, maybe via a debugger:

$ xdg-app build ../prefixsuffix-xdgapp bash
$ prefixsuffix
... Use the app ...
$ exit

Creating or updating an xdg-app repository

xdg-app can install files from online repositories. You can put your built app into a repository like so:

$ xdg-app build-export --gpg-sign="murrayc@murrayc.com" /repos/prefixsuffix ../prefixsuffix-xdgapp
$ xdg-app repo-update /repos/prefixsuffix

You can then copy that directory to a website, so it is available via http(s). You’ll want to make your GPG public key available too, so that xdg-app can check that the packages were really signed by you.

Installing with xdg-app

I uploaded the resulting xdg-app repository for PrefixSuffix to the website, so you should be able to install it like so:

$ wget https://murraycu.github.io/prefixsuffix/keys/prefixsuffix.gpg
$ xdg-app add-remote --user --gpg-import=prefixsuffix.gpg prefixsuffix https://murraycu.github.io/prefixsuffix/repo/
$ xdg-app install-app --user prefixsuffix io.github.murraycu.PrefixSuffix

I imagine that there will be a user interface for this in the future.

Then you can then run it like so, though it will also be available via your desktop menus like a regular application.

$ xdg-app run io.github.murraycu.PrefixSuffix

Here are similar instructions for installing my xdg-app Glom package.

I won’t promise to keep these packages updated, but I probably will if there is demand, and I’ll try to keep up to date on developments with xdg-app.

Modern C++: Variadic template parameters and tuples

Variadic Templates

C++11 lets us define variadic templates, taking any amount of parameters, of any type, instead of just a specific number of parameters. Then, when we use the template, we can not only specify the types, we can specify an arbitrary amount of different types. For instance,

template <class... T_values>
class Base {
public
  virtual void something(T_values... values) = 0;
};

class Derived1 : public Base<int, short, double> {
public:
  void something(int a, short b, double c) override;
};

class Derived2 : public Base<std::string, char> {
public:
  void something(std::string a, char b) override;
};

This is useful in extremely generic code such as libsigc++. We are gradually using this in libsigc++ and in glibmm, along with template aliases, to replace code that was previously generated by perl and python scripts to produce multiple versions of each C++ template, with various numbers of parameters. In the meantime, I’ve been playing with variadic templates in a separate experimental project and this is what I’ve learned along the way.

Parameter Packs

The “class… T_values” there is called a template parameter pack. If you are just templating a function then it’s called a function parameter pack:

class Thing {
public:
  template <class... T_values>
  void something(T_values... values) = 0;
};

Expanding a Parameter Pack

The “T… values” in that method signature is us expanding (or unpacking) the parameter pack in a function parameter list. You can use the … in various ways in the function parameter list. For instance, to receive the types as const references:

template <class... T_values>
class Thing {
public:
  void something(const T_values&... values) = 0;
};

In your template, you can then call another method with that parameter pack, by expanding (or unpacking) it into the function argument list. For instance, with “values…”:

template <class... T_values>
class Thing {
public:
  void something(T_values... values) {
    something_else(values...);
  };

You can write more complex patterns to change how the parameter pack is expanded. For instance:

template <class... T_values>
class Thing {
public:
  void something(T_values... values) {
    something_else((values + 1)...);
  };

  void other_thing(T_values... values) {
    something_else(const_cast<T>(values)...);
  };

Storing a Parameter Pack in a std::tuple

However, if you want to keep the parameter values around and use them at some later time, you’ll need to store them in a std::tuple. I think this is why std::tuple exists. For instance:

template <class... T_values>
class Thing {
public:
  void something(T_values... values) {
    tuple = std::tuple<T_values...>(values...);
  }

private:
  std::tuple<T_values..> tuple_;
};

Expanding a std::tuple

Then you have another problem. You probably want to call some method with those values. But now you have a tuple instead of a parameter pack. Trick with std::index_sequence<> and a helper method lets you call a normal method (that takes normal parameters), passing a tuple that holds those parameter values:

template <class... T_values>
class Thing {
public:
  void something(T_values... values) {
    tuple_ = std::tuple<T_values...>(values...);
  }

  void do_something_with_values() {
     call_yadda_with_tuple(tuple_,
      std::index_sequence_for<T_value...>())
  }

  void yadda(T... values);

private:
  //The helper method.
  template<std::size_t... Is>
  void call_yadda_with_tuple(const std::tuple<T_values...>& tuple,
    std::index_sequence<Is...>) {
    yadda(std::get<Is>(tuple)...);
  }

  std::tuple<T_values...> tuple_;
};

Unfortunately, this does clutter up your code. I haven’t yet managed to write a simple generic call_function_with_tuple(f, tuple) helper method. I hope it is possible.

Parameter packs are part of the C++ language. std::tuple<> and std::index_sequence<> are part of the C++ standard library, apparently added specifically for use with parameter packs. I can see the sense in keeping the language as simple as possible, as long as you can provide what you need via library code in that language. But the end result is not very attractive in this case. Luckily, hopefully, this isn’t something you’ll need to use much anyway.

At this point, any reader who already doesn’t like C++’s complexity will like it even less. Showing them a related thousand-line g++ compilation error should turn them away for good (clang++’s compilation errors are much clearer). But if you really like compile-time type-safety, and if you really like to avoid copy/pasted code, you might like that this is at all possible.

It would be understandable for coding guidelines to discourage the use of variadic templates except in special cases, until people are more familiar with them.

Manipulating Tuples

Of course, you might need to call methods with just some of the parameters from the parameter pack, or some combination of parameter packs. Once you have the parameters in a std::tuple, you can manipulate that tuple with some more template cleverness. For instance:

  • std::tuple_cat() concatenates two tuples into one.
  • But I recently needed a tuple_cdr() to remove the first item from the tuple, leaving me the rest.
  • A tuple_car() would just do std::get(tuple) to get the first time, so nobody bothers to write one.
  • I even needed a tuple_interlace() to interlace two tuples together, turning a, b and c, d into a, c, b, d.

It all starts to feel very lispy. Luckily there are lots of people on StackOverflow who enjoy discussing how to implement these. I feel there should be more functions like std::tuple_cat() in the standard C++ library, or maybe in some open source library.

Once you have your new tuple, you’ll probably need to use std::make_index_sequence<> instead of std::index_sequence_for<>, to call your call_*_with_tuple() helper method, like so:

void do_something_more() {
  const auto combined_tuple = std::tuple_cat(tuple1_, tuple2_);
  constexpr auto tuple_size =
      std::tuple_size<decltype(combined_tuple)>::value;
  call_yadda_with_tuple(tuple,
    std::make_index_sequence<tuple_size>());}

glibmm: Deprecated Glib::Threads

As of glibmm 2.47 (currently unstable), we have deprecated the glibmm threads API, which wrapped the glib threads API. That’s because C++11 finally has its own standard threading API, and in C++14 its complete enough to replace all of Glib::Threads.

Here’s what replaces what:

We’ve replaced use of Glib::Threads in our example code and tests, such as the example in our multithreading chapter in the gtkmm book.

This is quite a relief because we never wanted to be in the concurrency API business anyway, and having a standard C++ concurrency API makes it far easier to write cross-platform code. Now if we could just have a standard C++ network IO API and maybe a file structure API, that would be nice.

 

Coursera/Stanford course: Algorithms: Design and Analysis, Part 2

A few weeks ago I mentioned completing Part 1 of the online Coursera/Stanford “Algorithms: Design and Analysis” course. Part 2 of Algorithms: Design and Analysis isn’t due to start again until next year, but I didn’t want to wait, so I enrolled in the archived version of the course to watch the videos and do the assignments. I should be ready to just reuse my work when Part 2 starts again for real.

Part 2 was where things got really interesting. The assignments required implementing these algorithms, though the course covered others too:

  • A Greedy Algorithm for job scheduling.
  • Prim’s and Kruskal’s minimum spanning tree algorithms. (Both O(m log n) but Prims does better on dense graphs, with more edges.)
  • Modifying a minimum spanning tree to identify clusters.
  • The Knapsack problem (Dynamic Programming – both bottom-up and recursive).
  • Shortest Path with the Belmann-Ford SSSP (Single-Source Shortest Path) algorithm (O(mn) and works with negative paths, but fails with negative cycles) as an alternative to Dijkstra’s Shortest Path algorithm (O(m log n) and works only with positive paths).
  • All Pairs Shortest Path with the Floyd Warshall algorithm (dynamic programming) (O(n3) and works with negative paths, though it fails when it detects negative cycles). Best for dense graphs.
  • All Pairs Shortest path with Johnson’s algorithm, via one call to Belmann-Ford on a modified graph and repeated calls to Dijkstra on a reweighted graph. (O(mn log n) and works with negative paths, but fails with negative cycles.) Best for sparse graphs.
  • A dynamic programming algorithm for the Traveling Salesman Problem.
  • Local search with the 2Sat problem, using Papadimitriou’s Algorithm.

I particularly enjoyed exploring “dynamic programming”, which is really just avoiding unnecessary repeated work after you’ve identified the appropriate sub-problems. It’s identifying the sub problems that is really hard. I enjoyed playing with bottom-up dynamic programming (filling in an array as you go, often discarding the n-2th set of results as you go), and top-down dynamic programming, also known as memoization (usually recursing into sub problems and ideally not doing as many sub problems as you’d do working bottom up).

While implementing a dynamic programming solution for the Traveling Salesman Problem, I learned about Gosper’s Hack for iterating over subsets. It’s now a personal favorite in my toolbox.

As with part 1 of the course, I am not allowed to publish the code of my homework solutions. But I did create a public version of the knapsack problem for solving the Make Change problem without a canonical currency (not a real world set of coins), using dynamic programming, though you shouldn’t use that as a first way to understand the classic knapsack problem. I also implemented a simple greedy algorithm for the Make Change problem with a canonical currency (real world set of coins).

The Make Change problem was interesting because I’ve read that people can and should learn to recognize NP-Complete problems, such as the traveling salesman problem. However, it is not obvious which sets of coins would cause the Make Change problem to be solvable with a greedy algorithm and which would need dynamic programming, though it might at first seem like a minor detail. (I haven’t actually read that paper yet.)

I’ve also been reading through Steven Skiena’s The Algorithm Design Manual book, which I can highly recommend. It’s more practical and enjoyable than the classic Introduction to Algorithms book by Cormen et al.

Update: I did part 2 for real when it started again. Here is my certificate:

 

gtkmm 3.18 and glibmm 2.46

A couple of days ago I made the usual bunch of *mm releases for GNOME 3.18, including glibmm 2.46 and gtkmm 3.18, wrapping glib and GTK+ for C++. This adds the usual collection of new API from glib and GTK+, but the big change is the use of C++ 11.

Until C++11 is the default for g++, you’ll probably want to use the AX_CXX_COMPILE_STDCXX_11() autoconf macro. For instance, I used this to use C++11 in PrefixSuffix. Most people won’t need to make any other changes to their code or to their build. You won’t notice any change unless you care about using C++11 features.

Using C++11 in gtkmm and friends has been the most fun I’ve had with gtkmm since I had to delve deep into the GObject lifecycle back during gtkmm 1.2. I even made actual code changes to libsigc++, which I’m usually afraid to touch even though I’m the official maintainer.

Learning in general about the deep implications of  C++11’s new features reminded me how much I enjoyed learning about C++ at the beginning. It’s once again an interesting time for C++.