Allocating Uninitialized Numeric Arrays Allocating Uninitialized Numeric Arrays Uninit.Arrays
Motivation
Need
- very large contiguous memory block of fundamentals (
int
s /double
s / …) - for direct overwriting, e.g. as a storage target for computation results or (file) input
- preferably with the safety & convenience of
std::vector
Problem: Slow Value Initialization!
- value initialization ⇒ fundamental type objects initialized with
0
- default initialization ⇒ does nothing for fudamental types
Vectors of Numbers
std::vector<int> v (1'000'000'000); // ≈4GB
vector
value-initializes its underlying memory block- for fundamental types that means initialization with value
0
- which can take many seconds for multi-gigabyte arrays!
Historical Note C++98
before C++11 vector
did not value-initialize all its elements, but
copied a prototype value into the elements which was at least as slow
as value initialization
Solutions
Idea: allocator whose construct
function forces default initialization
vector<T,default_init_allocator<T>>
- all the convenience of
std::vector
- allocator prevents value-initialization
- vector can be accessed normally
- no overhead (at least on higher optimization levels
-O2
/-O3
) - can be passed to functions that take
span<T>
parameters .data()
pointer can be passed to C-style functions that takeT*
parameters- can't be passed to functions that take
vector<T>
parameters
This makes forced default initialization a pure allocation issue and decouples it from the numeric datatype. If you want to couple this property to the data type consider this solution.
#include <vector>
#include <chrono>
#include <iostream>
#include <random>
#include <algorithm>
// NOTES ON RUNNING THIS EXAMPLE:
// - make sure to compile it with "-O2" or "-O3"
// - you may need to reduce the vector size depending on your machine
// allocator adaptor that interposes 'construct' calls
// to convert value initialization into default initialization
// by Casey Carter (@codercasey)
template< typename T,
typename Alloc = std::allocator<T> >
class default_init_allocator : public Alloc
{
using a_t = std::allocator_traits<Alloc>;
public:
// obtain alloc<U> where U ≠ T
template<typename U>
struct rebind { using other = default_init_allocator<U,
typename a_t::template rebind_alloc<U> >; };
// make inherited ctors visible
using Alloc::Alloc;
// default-construct objects
template<typename U>
void construct (U* ptr) noexcept( std::is_nothrow_default_constructible< U>::value)
{ // 'placement new':
::new(static_cast<void*>(ptr)) U; }
// construct with ctor arguments
template<typename U, typename... Args>
void construct (U* ptr, Args&&... args) { a_t::construct(
static_cast<Alloc&>(*this),
ptr, std::forward<Args>(args)...); }
};
void demo () {
std::vector<int,default_init_allocator<int>> v;
v.resize(1'000'000'000); // fast - no init!
}
void slow_demo () {
std::vector<int> v;
v.resize(1'000'000'000);
}
int main () {
namespace sc = std::chrono;
auto const tstart = sc::high_resolution_clock::now();
demo();
// slow_demo();
auto const tstop = sc::high_resolution_clock::now();
auto const elapsed = sc::duration_cast<sc::milliseconds>(tstop - tstart).count();
std::cout << elapsed << "ms\n";
}
Idea: custom, generic zero-overhead wrapper
vector<no_init<T>>
- all the convenience of
std::vector
- wrapper prevents value-initialization
- vector can be accessed just like without the wrapper
- no overhead (at least on higher optimization levels
-O2
/-O3
) - additional indirection potentially annoying for debugging
- can't be passed to functions that take
vector<T>
orspan<T>
parameters .data()
pointer can't be passed to C-style functions that takeT*
This couples the initialization behavior to the numeric datatype and
thus propagates it through interfaces (by mentioning the type no_init
).
If you don't want this, consider
this solution.
#include <type_traits> // std::is_fundamental
#include <vector>
template<typename T>
class no_init {
static_assert(
std::is_fundamental<T>::value,
"should be a fundamental type");
public:
// constructor without initialization
no_init () noexcept {}
// implicit conversion T → no_init<T>
constexpr no_init (T value) noexcept: v_{value} {}
// implicit conversion no_init<T> → T
constexpr operator T () const noexcept { return v_; }
private:
T v_;
};
void demo () {
std::vector<no_init<int>> v;
v.resize(1'000'000'000); // fast - no init!
v[1024] = 47;
int j = v[1024];
v.push_back(23);
}
make_unique_for_overwrite<T[]>(n)
make_unique_for_overwrite
make_unique_for_overwrite
C++20
#include <memory>
auto buf = std::make_unique_for_overwrite<T[]>(n);
We can't use make_unique<T[]>(n)
, because that would
value-initialize the allocated array.
- returns a
unique_ptr
that does automatic cleanup - can be used with
span<T>
parameters and passed to C functions - need to track array size separately (e.g., with a
span
) - less safe & less convenient than
vector<no_init<T>>
use a span to access / pass the array around
#include <memory>
SampleStats statistics (Samples const& in) {
// make uninitialized array
auto buf = std::make_unique_for_overwrite<int[]>(in.size());
// obtain view to it:
std::span<int> results {buf.get(), in.size()};
// do something with it
gpu_statistics(in, results);
prefix_sum(results);
…
} // memory automatically deallocated
unique_ptr<T[]>(new T[n])
unique_ptr<T[]>
unique_ptr
C++11
#include <memory>
auto buf = std::unique_ptr<int[]>{new T[n]};
We can't use make_unique<T[]>(n)
, because that would
value-initialize the allocated array.
unique_ptr
does automatic cleanup- works as of C++11
- need to track array size separately (e.g., with a
span
) - less safe & less convenient than
vector<no_init<T>>
use a span to access / pass the array around
#include <memory>
SampleStats statistics (Samples const& in) {
// make uninitialized array
auto buf = std::unique_ptr<int[]>{new T[in.size()]};
// obtain view to it:
std::span<int> results {buf.get(), in.size()};
// do something with it
gpu_statistics(in, results);
prefix_sum(results);
…
} // memory automatically deallocated
T* buf = new T[n];
…
// important! delete if not needed any more
delete[] buf;
- easy to forget to delete memory ⇒ leak-prone
- error-prone and cumbersome separate tracking of array size
- less safe & less convenient than
vector<no_init<T>>
It's 2023 –
Avoid raw operators new
and delete
in modern code bases!
Only use them in implementations of memory managers like
allocators
.
- creates page-locked memory ⇒ faster copy to/from device
- easy to forget to free memory ⇒ leak-prone
- error-prone and cumbersome separate tracking of array size
// (2^30 x 4B) = 4GiB
const int n = 1 << 30;
auto const size = n * sizeof(int);
int *aHost;
cudaMallocHost( (void**)&aHost, size);
…
cudaFreeHost(aHost);
Comments…