Skip to content

Modernising Rcpp and C++ Code With std::span

The programming language C++ is still widely used today, especially for high-performance computing. Out-of-date practices from the 80s and 90s should still work now because the language is designed to be highly backwards compatible. As C++ evolved, there were numerous efforts to modernise the language and update programming practices with new features to improve the C++ experience, such as memory-safety features.

As new practices are introduced, there are multiple ways to do the same thing. Entrenched programmers may stick with older practices, whereas modern programmers may keep their practices up to date. Newer programmers have a clean slate to learn best practices and make it a habit, but they may also pick up an old textbook or tutorial and learn out-of-date practices.

I’m guilty of this myself. When I wrote my R package poLCAParallel, I started off with out-of-date practices due to my past training. Over time, I’ve modernised the code with newer features and better practices. This keeps the code maintainable and helps catch commonly occurring bugs due to old practices.

In this blog post, we will look at std::span, introduced in C++20, and how it can be used to modernise C++ code. It allows the program to view existing blocks of memory without having to copy them. It also has built-in tools to avoid out-of-bounds errors. However, implementing these new features in your code too early may cause problems on systems which may not have caught up with the new C++ standards, which we will discuss. I will also suggest some practices when programming in C++ for R via Rcpp.

At the start of the blog post, we will assume the reader is using Linux with a g++ C++ compiler using the libstdc++ standard library. We will compile and run them via R using Rcpp, which the reader should have installed. Later on, we will explore some differences on MacOS and how some of the modern C++20 features are not available on older MacOS systems.

Separating the Two Worlds

In hybrid programming, you program in two or more languages. For example, my package uses R for the user interface and C++ for the underlying mathematical calculations. One practice I would suggest is to separate the two ecosystems. For example, I would like my C++ code not to depend on anything related to R. By doing so, the C++ code is isolated and can be compiled and tested without any R dependencies. This is useful to pinpoint bugs should one occur; is it related to R or C++?

We will start off with a foundational C++ code and build onto it throughout the blog post.

// filename: adder.cc

#include <cstddef>

#include <Rcpp.h>

class Adder {
 private:
  double* vector_;
  std::size_t length_;

 public:
  Adder(double* vector, std::size_t length): vector_(vector), length_(length) {}
};

// [[Rcpp::export]]
void AdderRcpp(Rcpp::NumericVector vector) {
  Adder adder(vector.begin(), vector.size());
}

To run this in R, compile it using Rcpp::sourceCpp() and pass a vector to the exported function. Nothing should be outputted as the code only does some basic variable initialisation.

Rcpp::sourceCpp("adder.cc")
AdderRcpp(c(1, 1, 1, 1, 1))

In the example, the function AdderRcpp() is an exported function which can be called from R by passing a vector. The library Rcpp represents this vector, passed from R, as a Rcpp::NumericVector object, which contains information about the data in this vector.

To isolate the C++ code from the R ecosystem, we extracted all of the required information from the Rcpp::NumericVector object. To access the data, we need the address of the data and the length of the vector. These can be obtained by using the methods vector.begin() and vector.size() respectively. We can pass and store them in the class Adder for future use.

You can see that the class Adder uses primitive C++ types only and is isolated from anything R or Rcpp related. This allows the class to be tested by itself by providing any data through the pointer double* vector and not necessarily from R. If we had to pass a Rcpp::NumericVector object or reference instead, the class is bound to the Rcpp ecosystem through Rcpp::NumericVector. This adds an unnecessary layer when it comes to testing this class, having to strictly pass data through an Rcpp::NumericVector object. By using a primitive pointer, we can pass data generated from external sources, such as from the standard C++ library, testing suites or any other program, making the class more general and flexible.

Old Habits Die Hard

We can demonstrate some old practices, one of which is using C-style arrays. In this approach, an array is represented by a pointer and an integer. The pointer, denoted by a * symbol after the type, contains the starting address of the array and allows you to read and write its elements. The integer, usually of type std::size_t, tracks the total number of elements in that array.

One example to demonstrate C-style arrays is to increment the last two elements of the array. We can call this method IncrementLastTwo(). We use a for loop, iterating from the second to last element to the last inclusive.

// filename: adder.cc

#include <Rcpp.h>

#include <cstddef>

class Adder {
 private:
  double* vector_;
  std::size_t length_;

 public:
  Adder(double* vector, std::size_t length)
      : vector_(vector), length_(length) {}

  void IncrementLastTwo() {
    for (std::size_t i = this->length_ - 1; i <= this->length_; i++) {
      this->vector_[i]++;
    }
  }
};

// [[Rcpp::export]]
void AdderRcpp(Rcpp::NumericVector vector) {
  Adder adder(vector.begin(), vector.size());
  adder.IncrementLastTwo();
}

In the code above, we passed a pointer, using vector.begin(), and the length of the array, using vector.size(), to the constructor of Adder. By passing a pointer, the method IncrementLastTwo() can modify the contents of the array without having to return anything.

We can test this in R

Rcpp::sourceCpp("adder.cc")
x <- c(1, 1, 1, 1, 1)
AdderRcpp(x)
print(x)

which outputs

1 1 1 1 2

Uh-oh! We see that there's a mistake in our code because only the last element of the array has been modified. Furthermore, the code wrote to memory out of bounds past the end of our vector! It is concerning that C++ will just write to memory not allocated to us without warning and with little trace, not appearing in our output. This sort of behaviour can be described as non-memory-safe.

Modifying Function Arguments

Notice how the R vector x was modified in the function call AdderRcpp(x) without using a return value. This is because we are passing a reference to that vector to the function, allowing it to be modified.

This sort of practice is common in C++ but may be very unusual in R. Some programmers will treat parameters as inputs and immutable, so modifying them instead of returning an output can be quite confusing and unexpected. Plan what sort of API you would like to present for your target user.

One of the reasons that this is not memory safe is because the pointer double* vector_ is primitive. You can add or subtract any number to it and you can attempt to read or write to some arbitrary piece of memory. We could have used the length of the vector std::size_t length_ to ensure our memory access is within bounds, but as shown above, this assumes we used the variable correctly in the first place.

C with Classes

The language C++ was marketed as "C with Classes". We've demonstrated this literally by using a C-style array inside a class. Later on in this blog, we will show how C-style array practices can be seen as legacy, replaced with modern C++ features. Thus "C with Classes" may be seen as an outdated view of C++, which arguably has turned into its own language.

The use of classes in this example is an overkill; we could have done the same thing using functions only. The reason we demonstrated classes here is to build up what you would typically see in C++ code, assigning data to a member variable and then modifying it later on.

Ownership is ambiguous with C-style arrays

When handling a pointer, also called a raw pointer, the ownership is ambiguous as the type itself does not imply who is responsible for the memory management of the array. In our example, the memory management is done by the R ecosystem through Rcpp::NumericVector.

We could have created an array using, for example, double* vector = new double[5] and pass that pointer to the class to be used for viewing and writing the array. However, by doing so, it is ambiguous who is responsible for managing the memory of the array. Is it the class or another function? The type double* on its own does not convey this information.

Creating arrays and passing raw pointers this way is considered an outdated practice.

Assertions

We can use assertions to test if variables are valid before using them. It is given that this->vector_[this->length - 1] accesses the last element of the vector. We can use this fact to write an assertion to ensure that we do not access data beyond the last element of the vector.

// filename: adder.cc

#undef NDEBUG

#include <Rcpp.h>

#include <cstddef>

class Adder {
 private:
  double* vector_;
  std::size_t length_;

 public:
  Adder(double* vector, std::size_t length)
      : vector_(vector), length_(length) {}

  void IncrementLastTwo() {
    for (std::size_t i = this->length_ - 1; i <= this->length_; i++) {
      // We add an assertion here
      assert(i < this->length_);
      this->vector_[i]++;
    }
  }
};

// [[Rcpp::export]]
void AdderRcpp(Rcpp::NumericVector vector) {
  Adder adder(vector.begin(), vector.size());
  adder.IncrementLastTwo();
}

When running it this time, you get an assertion error

R: adder.cc:21: void Adder::IncrementLastTwo(): Assertion `i < this->length_' failed.
Aborted (core dumped)

The assertion helped prevent out-of-bounds memory access and caught a bug.

Debug Mode

By default, Rcpp will run in no-debug mode, turning off all assertions. To turn on assertions, we have to undefine the no-debug mode environment variable with #undef NDEBUG

By contrast, C++ runs in debug mode by default, executing all assertions.

The reason assertions can be turned on and off is mainly for performance reasons. Doing many assertions and bound checks can slow down your code and is arguably unnecessary if you're certain your code works as it should. On the flip side, turning off assertions runs a risk of bugs slipping through by removing these safe-guards you've put in place. To balance the double-edged sword, I would recommend running in debug mode for your tests and turning it off for your production code.

Assertions should not be used to validate user input

Assertions should not be used to validate user input; they're for validating the machinery of your program in debug mode. In production code, you still want to validate user input.

With our design philosophy, by separating the code into an R interface and a C++ number crunching machine, we can put the user input validation step in R. This way, you can focus on testing or asserting the C++ machinery of your program under the assumption that the user input is valid.

In this example, it could be argued that the assertion is unnecessary because you can integrate the assertion into the condition component of the for loop. At every iteration, the condition component is checked before moving on, effectively acting like an assertion. However, with an assertion, you get an error message rather than business as usual in a for loop. Assertions are a useful tool to test the underlying logic of your code.

Of course, it is possible to write assertions incorrectly. False positives will trigger a programmer to fix the assertion, but a false negative could slip through unnoticed. Thus, it's important to write assertions throughout your code to ensure that if an assertion lets a bug through, you hope the next assertion down the pipeline will pick it up.

Modern Practices

With the disadvantages of using C-style arrays shown, we turn our attention to std::span, introduced in C++20. A std::span is a non-owning view of existing contiguous memory, such as the array represented by Rcpp::NumericVector. By using std::span, it tells the programmer that this object is not responsible for the memory management of the array, yet, we can read and write to that array without copying the contents of the array. The responsibility of memory management remains with the Rcpp::NumericVector and the R ecosystem.

To use std::span, we can implicitly convert a Rcpp::NumericVector object into a std::span object as shown below.

// [[Rcpp::plugins(cpp20)]]
// filename: adder.cc

#undef NDEBUG

#include <Rcpp.h>

#include <cstddef>
#include <span>

class Adder {
 private:
  std::span<double> vector_;

 public:
  Adder(std::span<double> vector) : vector_(vector) {}

  void IncrementLastTwo() {
    for (std::size_t i = this->vector_.size() - 1; i <= this->vector_.size();
         i++) {
      assert(i < this->vector_.size());
      this->vector_[i]++;
    }
  }
};

// [[Rcpp::export]]
void AdderRcpp(Rcpp::NumericVector vector) {
  // implict conversion here, we are passing a Rcpp::NumericVector to a
  // constructor which expects a std::span
  Adder adder(vector);
  adder.IncrementLastTwo();
}

The conversion is implicit because the constructor expects an argument of type std::span<double>, but we passed a Rcpp::NumericVector object. Behind the scenes, a std::span<double> object was created using the information from the Rcpp::NumericVector object.

Using C++20 Features

To use C++20 features, we had to add the comment // [[Rcpp::plugins(cpp20)]] to tell Rcpp we would like to use C++20 features. This is probably the most straight-forward way to do this for this example.

More advanced C++ users may instead set CXX_STD = CXX20 in their Makevars file or use the -std=c++20 flag.

There are many more practices

Because C++ is an old language and still evolving, using std::span is one of many practices. Other options include passing iterators, pointers, references or use a non-standard span from another library. The list goes on...

The code is similar to the previous section because a std::span<double> object can access elements of the array using square brackets [] too. One noticeable difference with using std::span<double> is that the object contains information about the length of the array as well. Hence, we can rely on std::span<double> giving us the length of the array without the programmer managing yet another variable.

We've purposefully left the error to trigger the assertion.

Further Memory-Safe Techniques

One of the problems with programming in C++ is that typing out a for loop can be quite tedious, especially for trivial cases where we use the variable i as an index only to access data.

for (std::size_t i = this->vector_.size() - 1; i <= this->vector_.size();
     i++) {}

The more code there is to write, the more likely we will make a mistake. One of the advantages of std::span is that it has built-in methods to iterate through a subspan without needing to manually manage iterative variables or the length of arrays.

The method std::span::subspan() returns a portion of the span as a std::span. To use the method, we pass the starting index and the length of the subspan we want. Furthermore, we can use a range-based for loop to iterate through the subspan for us. This is useful, for example, if we want to iterate through the last two elements.

Lastly, we can turn on a different type of debug mode, which does automatic checks by the standard library. For example, it can do bounds checking on std::span::subspan(). To invoke this, use #define _GLIBCXX_DEBUG.

// [[Rcpp::plugins(cpp20)]]
// filename: adder.cc

#define _GLIBCXX_DEBUG

#include <Rcpp.h>

#include <cstddef>
#include <span>

class Adder {
 private:
  std::span<double> vector_;

 public:
  Adder(std::span<double> vector) : vector_(vector) {}

  void IncrementLastTwo() {
    auto last_two = this->vector_.subspan(this->vector_.size() - 1, 2);
    // range-based for loop here
    for (auto& i : last_two) {
      i++;
    }
  }
};

// [[Rcpp::export]]
void AdderRcpp(Rcpp::NumericVector vector) {
  Adder adder(vector);
  adder.IncrementLastTwo();
}

The range-based for loop, in this example, looks like for (auto& i : last_two), which is simpler than the for loop we saw in the previous section. In the range-based for loop, we let the std::span do the iteration for us, without having to type out the initialiser and conditions for the for loop. By doing so, we don't have to be as verbose and the code is more compact, arguably making it easier to read.

The & symbol means we want a reference of the iterated value so that it can be modified in-place.

Using the auto type

We used the auto keyword to let the compiler work out the return type when calling std::span::subspan(). We've also used the auto keyword in the range-based for loop, so we don't have to specify, yet again, the type we are working with.

The auto keyword is useful to make code shorter and not too verbose by avoiding typing out the type we're working with multiple times, especially in cases where it's potentially trivial. However, some would argue it is better not to use the auto keyword so that it's crystal clear what we are working with. Have a think and judge if using the auto keyword is appropriate for your code.

We left the same bug, as in the previous section, to trigger the following error message

/usr/include/c++/13/span:429: constexpr std::span<_Type, 18446744073709551615> std::span<_Type, _Extent>::subspan(size_type, size_type) const [with _Type = double; long unsigned int _Extent = 18446744073709551615; size_type = long unsigned int]: Assertion '__offset + __count <= size()' failed.
Aborted (core dumped)

The error message is a bit more verbose and harder to read this time, but it at least hints that something went wrong in std::span::subspan(). The message Assertion '__offset + __count <= size()' failed. suggests we requested a subspan which goes beyond bounds. This should be enough clues for the programmer to fix the problem and correct the arguments in std::span::subspan() to be this->vector_.subspan(this->vector_.size() - 2, 2);.

Assertions and standard library debug modes

Assertions are still important even if the standard library debug mode does bound checking for you. When you write an assertion, you are validating the logic behind your code and providing a more readable error message should the assertion fail. You can treat the standard library debug mode as a safety net should anything slip through.

Other memory tools

There are many more tools to help catch out-of-bounds memory access. One of which is to use an AddressSanitizer with the flag -fsanitize=address. This useful tool will cause your software to crash with an error message should an out-of-bounds memory access occur. For further information, please refer to the documentation

Lagging Behind

It is a good idea to use or update code with modern features. This keeps the code maintained with up-to-date practices for the benefit of the programmer and the user. However, some commonly used systems may not have caught up and implemented these modern features. Using cutting edge features runs a risk of making your code not as portable as it was. One example I saw was on MacOS using Clang 14 and 15 with the error below.

adder.cc:29:9: error: no matching constructor for initialization of 'Adder'
  Adder adder(vector);
        ^     ~~~~~~
adder.cc:11:7: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'Rcpp::NumericVector' (aka 'Vector<14>') to 'const Adder' for 1st argument
class Adder {
      ^
adder.cc:11:7: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'Rcpp::NumericVector' (aka 'Vector<14>') to 'Adder' for 1st argument
class Adder {
      ^
adder.cc:16:3: note: candidate constructor not viable: no known conversion from 'Rcpp::NumericVector' (aka 'Vector<14>') to 'std::span<double>' for 1st argument
  Adder(std::span<double> vector) : vector_(vector) {}
  ^
1 error generated.

The error indicates that implicit conversion from Rcpp::NumericVector to std::span is not supported. To get around this, we have to explicitly do the conversion instead. To do this, we create a std::span object using its constructor and passing the pointer and size of the Rcpp::NumericVector, like in std::span<double>(vector.begin(), vector.size()).

// [[Rcpp::plugins(cpp20)]]
// filename: adder.cc

#include <Rcpp.h>

#include <cstddef>
#include <span>

class Adder {
 private:
  std::span<double> vector_;

 public:
  Adder(std::span<double> vector) : vector_(vector) {}

  void IncrementLastTwo() {
    auto last_two = this->vector_.subspan(this->vector_.size() - 1, 2);
    for (auto& i : last_two) {
      i++;
    }
  }
};

// [[Rcpp::export]]
void AdderRcpp(Rcpp::NumericVector vector) {
  // explicit conversion here using std::span<double>()
  Adder adder(std::span<double>(vector.begin(), vector.size()));
  adder.IncrementLastTwo();
}

Clang and MacOS

The default C++ compiler on MacOS is clang++ using the libc++ standard library.

In the R ecosystem, Linux and Windows machines would have g++ as the default C++ compiler using the libstdc++ standard library.

These are different compilers and standard libraries, developed by different organisations.

Debug mode on MacOS

Because MacOS uses the libc++ standard library, unlike in Linux, #define _GLIBCXX_DEBUG won't work to enter the standard library debug mode. You will have to invoke debug mode according to the libc++ standard library, but this is beyond the scope of this blog. See the documentation for further information.

There are pros and cons of doing explicit conversion. When using the constructor, it's crystal clear what's being passed and how the conversion is done. However, it can be a bit too verbose, especially if we're dealing with multiple vectors. If we know std::span can be constructed from a contiguous memory-owning object like Rcpp::NumericVector, we shouldn't have to repeat this fact every time we want to work with a span. In addition, having to write less code should reduce the chances of making typos.

Another option to fix this problem is to simply wait - waiting for newer MacOS computers and users to adopt Clang 16 or newer. We can also instruct MacOS users to install and use a newer version of Clang. These are all valid fixes to ensure your code is as portable as it can be, but they may cause some friction to MacOS users.

The Survey Says

In this section, we will look at a snippet of poLCAParallel and some of the practices we could implement, such as using iterators and assertions.

The package poLCAParallel involves the study of polytomous variables. They are variables which are like choosing an answer to a multiple-choice question. The choices do not need to be ordered. As a toy example, let's look at some of the surveys from YouGov UK.

Which of the following terms would you normally use to refer to an upholstered seat typically found in the main living room of a home?

Thinking about scones that come with jam and clotted cream, in which order do you think they should be added to the scone?

Do you personally consider a Jaffa Cake to be a biscuit or a cake?

One thing to note is that the number of choices for each question is not constant; one question may have three answers to choose from, another may have five. We will make the assumption of independence, for example, what you call a sofa won't affect the ordering you put cream or jam on a scone. We also assume the survey results generalise to the population for our model, treating the sampling error as negligible.

For our model, we define a vector which has the number of choices for each question n_outcomes <- c(5, 3, 3). Next, we define a vector which has the probabilities for each choice given a question probs <- c(0.12, 0.26, 0.59, 0.01, 0.02, 0.6, 0.29, 0.11, 0.51, 0.38, 0.11). They're ordered such that the probabilities for the five answers from the first question are placed first, followed by the answers from the second question and so on.

My answers to the survey questions are as follows. I use the word "sofa". I would put jam first so that it can be spread to lay a foundation for the cream. Lastly, I would consider a Jaffa cake as a biscuit because of its small shape, similar to other biscuits, and can be it consumed like one. With that said, you can record my response as a vector, each element correspond to my answer response <- c(3, 1, 1).

One of the steps in poLCAParallel is to calculate the likelihood. This is the probability that someone random would answer those three questions the same as me. In this example, we multiple three numbers together, 0.59, 0.6 and 0.51, from the vector probs.

Putting it into Practice

In terms of programming in C++, we will have to iterate through the three vectors response, n_outcomes and probs. The vectors response and n_outcomes are of the same length, one element for each question. Whereas probs has an element for each multiple choice answer and for each question. In addition, different questions have a different number of answers to choose from.

There are quite a few options to implement this. As discussed before, we will separate the two ecosystems by converting Rcpp::NumericVector into a std::span. In terms of iterating, we will use a range-based for loop to traverse through n_outcomes and iterators, provided by std::span, for the remaining vectors.

// [[Rcpp::plugins(cpp20)]]
// filename: survey.cc

#undef NDEBUG

#include <Rcpp.h>

#include <cstddef>
#include <span>

double Survey(std::span<const int> response, std::span<const int> n_outcomes,
              std::span<const double> probs) {
  // get iterators for response and probs
  auto response_i = response.begin();
  auto probs_i = probs.begin();

  double likelihood = 1;
  // range based for loop over n_outcomes
  for (auto n_outcome_i : n_outcomes) {
    assert(response_i < response.end());
    auto prob_response = probs_i + *(response_i++) - 1;
    assert(prob_response < probs.end());
    likelihood *= *prob_response;
    // move probs_i for the next survey question
    probs_i += n_outcome_i;
  }
  return likelihood;
}

// [[Rcpp::export]]
double SurveyRcpp(Rcpp::IntegerVector response, Rcpp::IntegerVector n_outcomes,
                  Rcpp::NumericVector probs) {
  return Survey(response, n_outcomes, probs);
}

The following R code runs our toy-example

Rcpp::sourceCpp("survey.cc")

n_outcomes <- c(5, 3, 3)
probs <- c(0.12, 0.26, 0.59, 0.01, 0.02, 0.6, 0.29, 0.11, 0.51, 0.38, 0.11)

response <- c(3, 1, 1)

likelihood <- SurveyRcpp(response, n_outcomes, probs)
print(likelihood)

There are quite a few things to unpack

  • We converted a Rcpp::IntegerVector to a std::span<const int>. The const keyword in the template ensures the contents of the arrays are not modified.
  • An iterator can be obtained using the method begin(), like in auto response_i = response.begin(). This can be dereferenced into a value, using the * symbol, and incremented for the next survey question, using the ++ symbol.
  • The method end() returns the address of the end of the array. We use this in our assertions to ensure our iterators do not go out of bounds.
  • Because each question has a different number of choices, we increment our iterator for the probabilities accordingly, like in probs_i += n_outcome_i

Zero-Based and One-Based Indexing

R uses one-based indexing, but C++ uses zero-based indexing. This can cause confusion or bugs if you are passing indices from R to C++ without being careful.

In the example above, this is why we subtracted one from the response in *(response_i++) - 1.

Bleeding Edge

One of the problems with the range-based for loops, in our example, is that it can only cater for one std::span. This is why we used iterators for the arrays response and probs.

However, C++23 introduced a zip view std::views::zip. This can be used to make a range-based for loop over two std::span. We can iterate through response and n_outcomes in the same for loop.

// [[Rcpp::plugins(cpp23)]]
// filename: survey.cc

#undef NDEBUG

#include <Rcpp.h>

#include <cstddef>
#include <ranges>
#include <span>

double Survey(std::span<const int> response, std::span<const int> n_outcomes,
              std::span<const double> probs) {
  auto probs_i = probs.begin();

  double likelihood = 1;
  for (auto [response_i, n_outcome_i] : std::views::zip(response, n_outcomes)) {
    auto prob_response = probs_i + response_i - 1;
    assert(prob_response < probs.end());
    likelihood *= *prob_response;
    probs_i += n_outcome_i;
  }
  return likelihood;
}

// [[Rcpp::export]]
double SurveyRcpp(Rcpp::IntegerVector response, Rcpp::IntegerVector n_outcomes,
                  Rcpp::NumericVector probs) {
  return Survey(response, n_outcomes, probs);
}

By putting response and n_outcomes in a zip, we can rely on std::span providing us with the correct in-bound values without having to be cautious and writing assertions for both of these arrays.

At the time of writing, C++23 is considered bleeding edge. Whilst it provides better practices as shown above, there could be a risk that some systems may not have implemented these new C++23 features, let alone C++20 ones. As a result, it's generally advised not to use new C++ standards too soon. But they can be pencilled in and rolled out once the compilers and standard libraries finish integrating the new C++ standards.

Conclusion

Old practices are allowed in C++ for backwards compatability but there are also newer ones to reduce the chances of introducing bugs and improving the C++ programming experience. However, some systems won't be compatible with these newer practices as they are too new and not yet implemented.

When I submitted my poLCAParallel package to CRAN, CRAN would do checks on my package on multiple systems, such as Windows, MacOS and Linux, and with different versions of R. We've found that a few C++20 features such as std::jthread and some implicit conversions are not available on older versions of Clang with libc++, hence the package would not compile on CRAN's older MacOS systems. Fortunately, their latest MacOS system can compile the package. We have the choice of adjusting our code to cater for older systems or instructing MacOS users to update their Clang compiler. We could even wait it out and the problem should fix itself as newer MacOS systems adopt newer Clang versions by default.

CRAN checks shows fail on MacOS systems CRAN checks packages using various systems and versions of R (called flavours in this table). The poLCAParallel fails to compile on older MacOS systems.

In terms of future work and maintenance, we plan to use newer C++20 and C++23 features such as std::jthread and std::views::zip to improve the code base. However, we also have to balance portability; to roll out these new practices once the majority of systems have implemented these new features. Being too early an adoptor could cause your software to be out of reach for the majority of your users.