Category Archives: data alignment

Data Structure Alignment in C++ on x86 and x64 Machine

Data structure alignment is the way data is arranged and accessed in memory. It consists of two separate but related issues: data alignment and data structure padding.

In this article we will discuss about memory alignment for simple struct.

All of the codes are tested on Windows 8 64-bit using GCC compiler suite. OK, I said I use 64-bit Windows 8, but the title suggest we discuss both 32 and 64 bit, therefore I will also give both code.

Before we start, guess what the output of this program is? Write down your answer, don’t compile it yet.

#include <iostream>
using namespace std;

// Alignment requirements

// char         1 byte
// short int    2 bytes
// int          4 bytes
// double       8 bytes

struct A
{
	char c;
	short s;
};

struct B
{
	short s;
	char c;
	int i;
};

struct C
{
	char c;
	double d;
	int i;
};

struct D
{
	double d;
	int i;
	char c;
};

int main()
{
	cout << "The sizeof A is: " << sizeof(A) << endl;
	cout << "The sizeof B is: " << sizeof(B) << endl;
	cout << "The sizeof C is: " << sizeof(C) << endl;
	cout << "The sizeof D is: " << sizeof(D) << endl;
	return 0;
}

Now read this article.

Definition of Data Alignment

Every data type in C/C++ will have data alignment requirement (in fact, it is mandated by processor architecture, not by language).

MemoryAlignment1

Memory is byte addressable and arranged sequentially. If the memory is arranged as single bank of one byte width, the processor needs to issue 4 memory read cycles to fetch an integer. We can save lot of work when we read all 4 bytes of integer in one memory cycle only. To take such advantage, the memory will be arranged as group of 4 banks.

The memory addressing still be sequential. If bank 0 occupies an address X, bank 1, bank 2 and bank 3 will be at (X + 1), (X + 2) and (X + 3) addresses. If an integer of 4 bytes is allocated on X address (X is multiple of 4), the processor needs only one memory cycle to read entire integer.

Where as, if the integer is allocated at an address other than multiple of 4, it spans across two rows of the banks. Such an integer requires two memory read cycle to fetch the data.

MemoryAlignment2

A variable’s data alignment deals with the way the data stored in these banks. It is expressed as the numeric address module of power of 2. For example, the address 0x0001103F modulo 4 is 3; that address is said to be aligned to 4n+3, where 4 indicates the chosen power of 2. The alignment of an address depends on the chosen power of two. The same address modulo 8 is 7.

The natural alignment of int on 32-bit machine is 4 bytes. When a data type is naturally aligned, the CPU fetches it in minimum read cycles.

Similarly, the natural alignment of several data type are listed here:

For 32-bit x86:

  • A “char” (one byte) will be 1-byte aligned
  • A “short int” (two bytes) will be 2-byte aligned
  • An “int” (four bytes) will be 4-byte aligned
  • A “long” (four bytes) will be 4-byte aligned
  • A “double (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux (8-byte with -malign-double compile time option).
  • A “long long” (eight bytes) will be 8-byte aligned.
  • A “long double (ten bytes with C++Builder and DMC, eight bytes with Visual C++, twelve bytes with GCC) will be 8-byte aligned with C++Builder, 2-byte aligned with DMC, 8-byte aligned with Visual C++ and 4-byte aligned with GCC.
  • Any “pointer” (four bytes) will be 4-byte aligned. (e.g.: char*, int*)

A notable difference in alignment for 64-bit system when compared to 32-bit system:

  • A “long” (eight bytes) will be 8-byte aligned.
  • A “double“ (eight bytes) will be 8-byte aligned.
  • A “long double“ (eight bytes with Visual C++, sixteen bytes with GCC) will be 8-byte aligned with Visual C++ and 16-byte aligned with GCC.
  • Any “pointer” (eight bytes) will be 8-byte aligned.

So it means, a short int can be stored in bank 0 – bank 1 pair or bank 2 – bank 3 pair. A double requires 8 bytes, and occupies two rows in the memory banks. Any misalignment of double will force more than two read cycles to fetch double data.

As seen before, double variable will be allocated on 8 byte boundary on 32 bit machine and requires two memory read cycles. On a 64 bit machine, based on number of banks, double variable will be allocated on 8 byte boundary and requires only one memory read cycle.

So, we can formulate that a memory address A, is said to be N-byte aligned when A is a multiple of N bytes (where N is power of 2). A memory access is said to be aligned when the datum being accessed is N bytes long and the datum address is N-byte aligned. When a memory access is not aligned, it is said to be misaligned. Note that by definition byte memory accesses are always aligned.

Structure and Padding to Align the Data

In C/C++, structures are used as a data pack (composite data). It doesn’t provide any data encapsulation or data hiding features (except when we define it with the way we define a class).

As stated before, a good aligned data in memory can ease the fetch process. Because of the alignment requirements of various data types, every member of structure should be naturally aligned. The members of structures allocated sequentially increasing order.

Now, alignment should be used to balance the structure. The term balance here refer to make every member naturally aligned (remember, short int use 2 bytes and can be put on a pair of byte 0-byte 1 or byte 2-byte 3 but not byte 1-byte 2). Therefore we need to do something to make them in correct position (align).

The method we use is padding. Padding is only inserted when a structure member is followed by a member with a larger alignment requirement or at the end of the structure.

There is an alternative way, reordering the members, however C/C++ do not allow the compiler to reorder structure members to save space. This job should be done manually.

So how this stuff works?

Remember, we cannot say that the aggregate size of a struct is only sum of all the components. There exists a padding. The padding boundary also depend on the 32-bit or 64-bit architecture of the CPU and the OS. The alignment is done on the basis of the highest size of the variable in the structure.

Let’s view this little structure. When we count it, we should get 8 bytes as total size:

struct Mix
{
	char Data1;
	short Data2;
	int Data3;
	char Data4;
};

After compilation, appropriate paddings will be inserted to ensure a proper alignment for each of its member:

struct Mix
{
	char Data1;          // 1 byte
	char Padding1[1];
	short Data2;         // 2 bytes
	int Data3;           // 4 bytes
	char Data4;          // 1 byte
	char Padding2[3];
};

We see two padding there, Padding1 and Padding2. Remember that short require 2-bytes alignment. Hence, it cannot be placed right after Data1, because it would be put on Bank 1-Bank 2 pair. We add padding between them so when we fetch the Data2 we will have minimum fetch.

After the Data4, there is also a padding with 3 bytes at the end.

Now the compiled size of the structure is 12 bytes. It is important to note that the last member is padded with the number of bytes required so that the total size of the structure should be a multiple of the largest alignment of any structure member (alignment(int) in this case, which = 4

Let’s review the output for previous snippet. If you are confused, first refer to the previous section (Data Alignment)

For 64-bit OS user:

  1. The sizeof A is: 4
  2. The sizeof B is: 8
  3. The sizeof C is: 24
  4. The sizeof D is: 16

For 32-bit Windows user:

  1. The sizeof A is: 4
  2. The sizeof B is: 8
  3. The sizeof C is: 24
  4. The sizeof D is: 16

For 32-bit Linux user:

  1. The sizeof A is: 4
  2. The sizeof B is: 8
  3. The sizeof C is: 16
  4. The sizeof D is: 16

You can also prove it by yourself.

How do we get that?

Structure A

struct A
{
	char c;
	short s;
};

We have two members here, c as character, and s as short integer. Char is 1 byte and Short is 2 bytes. The total should be 3, but it’s 4.

If the short int element is immediately allocated after the char element, it will start at an odd address boundary. Therefore a padding is inserted there so now the structure will be:

struct A
{
	char c;
	char Padding;
	short s;
};

And the total sizeof(A) = sizeof(char) + 1 (padding) + sizeof(short) = 1 + 1 + 2 = 4 bytes.

Structure B

struct B
{
	short s;
	char c;
	int i;
};

We have three members here, s as short integer, c as character, and i as integer. Char is 1 byte, Short is 2 bytes, and Integer is 4 bytes. The total should be 7, but it’s 8.

It has the same reason as first example. As i is immediately after c, it will start at an odd address boundary. Therefore a padding is inserted. Now the structure will be:

struct B
{
	short s;
	char c;
	char Padding;
	int i;
};

And the total sizeof(B) = sizeof(short) + sizeof(char) +  1 (padding) + sizeof(int) = 2 + 1 + 1  + 4 = 8 bytes.

Structure C

struct C
{
	char c;
	double d;
	int i;
};

Now this is the trickiest part.

We have three members here, c as character, d as double float, and i as integer. If your architecture is 64-bit, you get Double as 8 bytes while 32 you get 4 bytes. Other than those, all other value is remain same. Char is 1 byte, and Integer is 4 bytes. The total should be 7, but it’s 8.

Now, the after compilation for x64 we got:

struct C
{
	char c;             // 1 byte
	char Padding1[7];
	double d;           // 8 bytes
	int i;              // 4 bytes
	char Padding2[4];
};

So you would wonder, why the padding Padding1 is 7 bytes instead of 3 bytes? Remember that the boundary is determined by the largest element’s boundary. We have double which is 8 bytes.

So the total size would be: sizeof(C) = sizeof(char) + 7 (padding) + sizeof(double) + sizeof(int) + 4 (padding) = 1 + 7 + 8 + 4 + 4 = 24

Now we see for the x86 Linux (gcc) case:

struct C
{
	char c;             // 1 byte
	char Padding1[3];
	double d;           // 4 bytes
	int i;              // 4 bytes
	char Padding2[4];
};

Here we have double as 4 bytes. As very same argument, we insert padding between c and d. There is padding at the end to meet natural alignment so it fit power of 2 size.

So the total size would be: sizeof(C) = sizeof(char) + 3 (padding) + sizeof(double) + sizeof(int) + 4 (padding)  = 1 + 3 + 4 + 4 + 4 = 16

Structure D

struct D
{
	double d;
	int i;
	char c;
};

We still have three members here, d as double, i as integer, and c as character. Char is 1 byte, double is 8 bytes or 4 bytes depend on which system you are (see previous explanation), and Integer is 4 bytes.

Both 64 and 32 bit will have following alignment after compilation:

struct D
{
	double d;           // 8 bytes
	int i;              // 4 bytes
	char c;		    // 1 byte
	char Padding1[3];
};

So you might expect, the padding is 3 byte at the end of struct so we can’t ensure the size is natural aligned.

So the total size would be: sizeof(D) = sizeof(double) + sizeof(int) + sizeof(char) + 3 (padding) = 8 + 4 + 1 + 3 = 16