Allocating Uninitialized Numeric Arrays Allocating Uninitialized Numeric Arrays Uninit.Arrays

Motivation

Need

very large contiguous memory block of fundamentals (ints / doubles / …)
for direct overwriting, e.g. as a storage target for computation results or (file) input
preferably with the safety & convenience of std::vector

Problem: Slow Value Initialization!

value initialization ⇒ fundamental type objects initialized with 0
default initialization ⇒ does nothing for fudamental types

Vectors of Numbers

std::vector<int> v (1'000'000'000);  // ≈4GB

vector value-initializes its underlying memory block
for fundamental types that means initialization with value 0
which can take many seconds for multi-gigabyte arrays!

Historical Note C++98

before C++11 vector did not value-initialize all its elements, but copied a prototype value into the elements which was at least as slow as value initialization

Solutions

`vector<default_init_allocator<T>>` `vector` with Allocator Allocator

Idea: allocator whose `construct` function forces default initialization

`vector<T,default_init_allocator<T>>`

all the convenience of std::vector
allocator prevents value-initialization
vector can be accessed normally
no overhead (at least on higher optimization levels -O2 / -O3)
can be passed to functions that take span<T> parameters
.data() pointer can be passed to C-style functions that take T* parameters
can't be passed to functions that take vector<T> parameters

This makes forced default initialization a pure allocation issue and decouples it from the numeric datatype. If you want to couple this property to the data type consider this solution.

#include <vector>
#include <chrono>
#include <iostream>
#include <random>
#include <algorithm>

// NOTES ON RUNNING THIS EXAMPLE:
// - make sure to compile it with "-O2" or "-O3"
// - you may need to reduce the vector size depending on your machine
// allocator adaptor that interposes 'construct' calls
// to convert value initialization into default initialization
// by Casey Carter (@codercasey)
template< typename T, 
          typename Alloc = std::allocator<T> >
class default_init_allocator : public Alloc
{
  using a_t = std::allocator_traits<Alloc>;
public:
  // obtain alloc<U> where U ≠ T
  template<typename U>
  struct rebind {     using other = default_init_allocator<U,
      typename a_t::template rebind_alloc<U> >;  };
  // make inherited ctors visible
  using Alloc::Alloc;  
  // default-construct objects
  template<typename U>
  void construct (U* ptr)     noexcept(    std::is_nothrow_default_constructible<      U>::value)
  { // 'placement new':
    ::new(static_cast<void*>(ptr)) U;  }
  // construct with ctor arguments
  template<typename U, typename... Args>
  void construct (U* ptr, Args&&... args) {     a_t::construct(
      static_cast<Alloc&>(*this),
      ptr, std::forward<Args>(args)...);  }
};

void demo () {
  std::vector<int,default_init_allocator<int>> v;
  v.resize(1'000'000'000);  // fast - no init!
}

void slow_demo () {
  std::vector<int> v;
  v.resize(1'000'000'000);
}

int main () {
  namespace sc = std::chrono;
  auto const tstart = sc::high_resolution_clock::now();
  demo();
  // slow_demo();
  auto const tstop = sc::high_resolution_clock::now();
  auto const elapsed = sc::duration_cast<sc::milliseconds>(tstop - tstart).count();
  std::cout << elapsed << "ms\n";
}

`vector<no_init<T>>` `no_init` C++11

Idea: custom, generic zero-overhead wrapper

`vector<no_init<T>>`

all the convenience of std::vector
wrapper prevents value-initialization
vector can be accessed just like without the wrapper
no overhead (at least on higher optimization levels -O2 / -O3)
additional indirection potentially annoying for debugging
can't be passed to functions that take vector<T> or span<T> parameters
.data() pointer can't be passed to C-style functions that take T*

This couples the initialization behavior to the numeric datatype and thus propagates it through interfaces (by mentioning the type no_init). If you don't want this, consider this solution.

#include <type_traits>  // std::is_fundamental
#include <vector>
template<typename T>
class no_init {
  static_assert(
    std::is_fundamental<T>::value, 
    "should be a fundamental type");
public: 
  // constructor without initialization
  no_init () noexcept {}
  // implicit conversion T → no_init<T>
  constexpr  no_init (T value) noexcept: v_{value} {}
  // implicit conversion no_init<T> → T
  constexpr  operator T () const noexcept { return v_; }
private:
  T v_;
};
void demo () {
  std::vector<no_init<int>> v;
  v.resize(1'000'000'000);  // fast - no init!
  v[1024] = 47;     
  int j = v[1024];  
  v.push_back(23);  
}

`make_unique_for_overwrite<T[]>(n)` `make_unique_for_overwrite` `make_unique_for_overwrite` C++20

#include <memory>
auto buf = std::make_unique_for_overwrite<T[]>(n);

We can't use make_unique<T[]>(n), because that would value-initialize the allocated array.

returns a unique_ptr that does automatic cleanup
can be used with span<T> parameters and passed to C functions
need to track array size separately (e.g., with a span)
less safe & less convenient than vector<no_init<T>>

use a span to access / pass the array around

#include <memory>
SampleStats statistics (Samples const& in) {
  // make uninitialized array
  auto buf = std::make_unique_for_overwrite<int[]>(in.size());
  // obtain view to it:
  std::span<int> results {buf.get(), in.size()};
  // do something with it
  gpu_statistics(in, results);
  prefix_sum(results);
  …
}  // memory automatically deallocated

`unique_ptr<T[]>(new T[n])` `unique_ptr<T[]>` `unique_ptr` C++11

#include <memory>
auto buf = std::unique_ptr<int[]>{new T[n]};

We can't use make_unique<T[]>(n), because that would value-initialize the allocated array.

unique_ptr does automatic cleanup
works as of C++11
need to track array size separately (e.g., with a span)
less safe & less convenient than vector<no_init<T>>

use a span to access / pass the array around

#include <memory>
SampleStats statistics (Samples const& in) {
  // make uninitialized array
  auto buf = std::unique_ptr<int[]>{new T[in.size()]};
  // obtain view to it:
  std::span<int> results {buf.get(), in.size()};
  // do something with it
  gpu_statistics(in, results);
  prefix_sum(results);
  …
}  // memory automatically deallocated

Legacy Compilers `new T[n]` C++98

T* buf = new T[n];
…
// important! delete if not needed any more
delete[] buf;

easy to forget to delete memory ⇒ leak-prone
error-prone and cumbersome separate tracking of array size
less safe & less convenient than vector<no_init<T>>

It's 2023 – Avoid raw operators new and delete in modern code bases! Only use them in implementations of memory managers like allocators .

`cudaMallocHost` CUDA CUDA

creates page-locked memory ⇒ faster copy to/from device
easy to forget to free memory ⇒ leak-prone
error-prone and cumbersome separate tracking of array size

// (2^30 x 4B) = 4GiB
const int n = 1 << 30;
auto const size = n * sizeof(int);
int *aHost;
cudaMallocHost( (void**)&aHost, size);
…
cudaFreeHost(aHost);

Related …

Last updated: 2021-05-26

Motivation

Need

Problem: Slow Value Initialization!

Vectors of Numbers

Solutions

vector<default_init_allocator<T>> vector with Allocator Allocator

Idea: allocator whose construct function forces default initialization

vector<T,default_init_allocator<T>>

vector<no_init<T>> no_init C++11

Idea: custom, generic zero-overhead wrapper

vector<no_init<T>>

make_unique_for_overwrite<T[]>(n) make_unique_for_overwrite make_unique_for_overwrite C++20

unique_ptr<T[]>(new T[n]) unique_ptr<T[]> unique_ptr C++11

Legacy Compilers new T[n] C++98

cudaMallocHost CUDA CUDA