Copy Semantics
Copy semantics are the rules and mechanisms governing how objects are duplicated or cloned when they’re assigned to another object or passed as function arguments. These rules ensure that the objects are equal
and independent
.
Equality
: The original object (referred to as the “source”) and the new object (the “destination”) have identical content,source == destination
. In other words, they are equal in terms of their values or state.Independent
: Modifications made to one object do not affect the other. They remain separate and independent instances, even though they might contain the same initial data.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
int main() {
int a = 10;
int b = a;
assert(x == y);
Minion kevin {};
Minion stuart = kevin;
assert(kevin == stuart);
BigGodzilla balu {};
BigGodzilla kalu {}
kalu = balu;
assert(kalu == balu);
}
So basic concept of copy is very simple, whether you have a fundamental datatype like int
, small structures like Minion
or huge structures like BigGodzilla
after the assignment operation =
both the objects should have same contents.
Implicit Copy
Implicit copying is a process where a copy of an object is created automatically by the programming language or compiler without explicit instructions from the programmer. This behavior is common for the objects which follow value semantics
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <cassert>
struct Rectangle { // plain old datatypes (POD)
int length;
int breadth;
};
int area(Rectangle r) {
return r.length * r.breadth;
}
int main() {
int x = 10;
int y = x; // 1. implicit copy of the value
assert(x == y);
Rectangle rect {10, 20};
assert(area(rect) == 200); // 2. implicit copy, pass by value
}
- In this example value of
x
is implicitly copied toy
. If you later modifyy
, it does not affect the value ofx
. - Similarly while calling a function
area(rect)
,rect
is implicitly copied tor
.
To facilitate implicit copying, the compiler internally generates default copy constructors and copy assignment operators, as needed. These generated functions perform a “member-wise copy”, meaning they copy each member from the source object to the destination object.
Implicit copying behaves as expected for basic data types like int and plain old data types. However, it’s crucial to be mindful of potential side effects when dealing with pointers.
Shallow Copy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#include <cassert>
int main() {
int* srcPtr = new int(10); // 1. memory allocation and initialization
int* destPtr = srcPtr; // 2. copy value from srcPtr to destPtr
assert(srcPtr == destPtr);
assert(*srcPtr == *destPtr);
*destPtr = 20;
assert(*srcPtr == 20); // 3. changes reflected in other object
delete srcPtr;
delete destPtr; // 4. dangling pointer, double free
}
1
2
3
4
5
g++ -std=c++11 -fsanitize=address shallow_copy.cpp && ./a.out
=================================================================
==18713==ERROR: AddressSanitizer: attempting double-free on 0x602000000010 in thread T0:
#0 0x7fb4d741c650 in operator delete(void*)
In this example, we show the potential issues with implicit copying using pointers:
srcPtr
allocates memory on the heap and initializes it with the value 10.destPtr
is assigned the memory address from srcPtr, so both pointers point to the same memory.No independent modifications
: Modifying one pointer (e.g., destPtr = 20) affects the other (srcPtr), as they point to the same memory location..Double free
: DeletingsrcPtr
frees the associated memory. However,destPtr
is left as a dangling pointer because it still points to the now-deleted memory. Attempting to deletedestPtr
would lead to a double-free issue, causing undefined behavior.Undefined behavior
: Any operations ondestPtr
after the memory is freed can result in undefined behavior because the memory is no longer valid.
Similar example with user defined structure.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <cstddef>
struct DynamicArray {
DynamicArray(size_t size)
: m_size {size}
, m_ptr {new int[m_size]}
{}
~DynamicArray() {
delete[] m_ptr;
}
size_t m_size;
int* m_ptr;
};
int main() {
DynamicArray arr(10);
{
DynamicArray arr_copy(arr); // Copies arr into arr_copy. member by member
}
}
1
2
3
4
5
g++ -std=c++11 -fsanitize=address shallow_copy_struct.cpp && ./a.out
=================================================================
==11528==ERROR: AddressSanitizer: attempting double-free on 0x604000000010 in thread T0:
#0 0x7ff4a3010780 in operator delete[](void*)
To avoid the issues of implicit copying with pointers, especially when dealing with dynamic memory allocation, it’s often necessary to implement deep copy mechanisms that duplicate not only the pointers but also the data they point to, ensuring independent and safe handling of objects.
Deep Copy
Deep copy involves recreating both the object and the data it holds, guaranteeing that modifications in one instance don’t impact others.
Deep copy is implemented explicitly by the programmer by providing user defined copy constructor
and copy assignment operator
.
As can be seen in the image after copy operation members of destination
and source
both point to different address but have same contents (shapes).
Copy Constructor
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#include <iostream>
#include <algorithm>
DynamicArray(const DynamicArray& o)
: m_size {o.m_size}
, m_ptr {new int[m_size]} // 1. memory allocation
{
std::copy(o.m_ptr, o.m_ptr + o.m_size, m_ptr); // 2. copy values from source to dest
}
void printArray(DynamicArray arr) {
for (size_t i = 0; i < arr.m_size; ++i) {
std::cout << i << std::endl;
}
}
int main() {
DynamicArray arr(10);
{
DynamicArray arr_copy(arr); // Invokes copy constructor
}
printArray(arr); // creates a copy of arr and passes to printArray
}
m_ptr {new int[m_size]}
: Separate memory is allocated toarr_copy.m_ptr
std::copy(o.m_ptr, o.m_ptr + o.m_size, m_ptr);
: copies values stored at memory address pointed byarr.m_ptr
to newly allocated memory addressarr_copy.m_ptr
Why Copy Constructor Takes Argument By Reference
Canonical signature of copy constructor is DynamicArray(const DynamicArray& o)
.
If it will receive the argument by pass by value, then when copy constructor is invoked it will need a copy of the argument which will in turn invoke the copy constructor, which would again call the copy constructor and this will continue recursively until stack is full.
So it takes a parameter by reference.
Why Copy Constructor Takes Const Argument
To avoid accidental modifications to the source object. Also const reference const &
allows copy constructor to receive temporary objects
.
Copy Assignment
Copy constructor solves only the half problems, we can get into same issues if copy assignment operator is not provided.
1
2
3
4
5
int main() {
DynamicArray arr(10);
DynamicArray other(5);
other = arr; // Invokes compiler provided copy assignment
}
Since both the objects already exist, other = arr
invokes compiler provided assignment operator, which performs member by member copy and will result in the same issue of two pointers pointing to the same memory.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
DynamicArray& operator=(const DynamicArray& o) {
if (this == &o) { // 1. prevents self assignment, arr = arr
return *this;
}
delete[] m_ptr; // 2. delete any existing memory if any
m_size = o.m_size;
m_ptr = new int[m_size];
std::copy(o.m_ptr, o.m_ptr + o.m_size, m_ptr);
return *this;
}
int main() {
DynamicArray arr(10);
DynamicArray other(1);
arr = other; // Since arr already exists, invokes copy assignment
}
Implementation of copy constructor and copy assignment operator is almost same with three small but important differences.
- Self assignment check
if (this == &o)
: Statement such asarr = arr
is a self assignment, if the program does not check for self assignment then it will result in deletion ofm_ptr
first and on next line try to allocate new memory and will loose its original content. - Delete pre-allocated memory
delete[] m_ptr;
: In assignment both the object already exist andm_ptr
might be pointing to valid memory. Allocating new memory without delete will result in memory leak. return *this;
: Returning reference to self is not mandatory but then you cannot perform assignment chaining(a = b = c)
.
Note
If you find the need to provide a custom implementation for either the
copy constructor
,copy assignment operator
, ordestructor
, it’s a strong indication that you should consider providing custom implementations for all three of them. This principle is commonly referred to as theRule of Three
.
Conclusion
Understanding copy semantics is crucial for managing the behavior of your C++ programs and controlling how user defined structures are copied especially when pointer member variables are involved.
Copy semantics is an important concept but it has its own downsides. For smaller size objects, it is tolerable, but for larger ones, it leads to noticeable performance degradation due to the creation of numerous temporary copies.
To address this inefficiency, c++11 introduced the concept of move semantics
. If you’re looking to optimize your code and understand the inner workings of move semantics, dive into our next blog on the topic.