Memory alignment in C++ and C# and probably in every other language that can integrate with C++
I've learned something new today. It all starts with an innocuous question: Given the following struct, tell me what is its size:
The first answer is 4+1+8+1+2+1 = 17. Nope! It's 24.
Well, it is called memory alignment and it has to do with the way CPUs work. They have memory registers of fixed size, various caches with different sizes and speeds, etc. Basically, when you ask for a 4 byte int, it needs to be "aligned" so that you get 4 bytes from the correct position into a single register. Otherwise the CPU needs to take two registers (let's say 1 byte in one and 3 bytes in another) then mask and shift both and add them into another register. That is unbelievably expensive at that level.
So, why 24? i1 is an int, it needs to be aligned on positions that are multiple of 4 bytes. 0 qualifies, so it takes 4 bytes. Then there is a char. Chars are one byte, can be put anywhere, so the size becomes 5 bytes. However, a long is 8 bytes, so it needs to be on a position that is a multiple of 8. That is why we add 3 bytes as padding, then we add the long in. Now the size is 16. One more char → 17. Shorts are 2 bytes, so we add one more padding byte to get to 18, then the short is added. The size is 20. And in the end you get the last char in, getting to 21. But now, the struct needs to be aligned with itself, meaning with the largest primitive used inside it, in our case the long with 8 bytes. That is why we add 3 more bytes so that the struct has a size that is a multiple of 8.
Note that a struct containing a struct will align it to its largest primitive element, not the actual size of the child struct. It's a recursive process.
Can we do something about it? What if I want to spend speed on memory or disk space? We can use directives such as StructLayout. It receives a LayoutKind - which defaults to Sequential, but can also be Auto or Explicit - and a numeric Pack parameter. Auto rearranges the order of the members of the class, so it takes the least amount of space. However, this has some side effects, like getting errors when you want to use Marshal.SizeOf. With Explicit, each field needs to be adorned with a FieldOffset attribute to determine the exact position in memory; that also means you can use several fields on the same position, like in:
More information here: Advanced c# programming 6: Everything about memory allocation in .NET
Update: I've created a piece of code to actually test for this:
Considering the original MyStruct, the size reported by all three ways of computing size is 24. I had to test the idea that the maximum byte padding is 4, so I used this structure:
For Pack = 1, I got the consistent 17 bytes. For Pack=4, I got consistent values of 20. For Pack=8 or higher, I got the weird 20-24-20 result, which suggests packing works differently for decimals than for other values. I've replaced the decimal with a struct containing two long values and the consistent result was back to 24, but then again, that's expected. Funny thing is that Guid is also a 16 byte value, although it is itself a struct, and the resulting size was 20. Guid is not a value type, though.
The only conclusion I can draw is that what I wrote in this post is true. Also, StructLayout Pack does not work as I had expected, instead it provides a minimum packing size, not a maximum one. If the biggest element in the struct is 8 bytes, then the minimum between the Pack value and 8 will be used for alignment.
All this if you are not using decimals... then all bets are off! From my discussions with Filip B. Vondrášek in the comments of this post, I've reached the conclusion that decimals are internally structs that are aligned to their largest element, an int, so to 4 bytes. However, it seems Marshal.sizeof misreports the size of structs containing decimals, for some reason.
In fact, all "simple" types are structs internally, as described by the C# language specification, but the Decimal struct also implements IDeserializationEventListener, but I don't see how this would influence things. Certainly the compilers have optimizations for working with primitive types. This is as deep as I want to go with this, anyway.
public struct MyStructLet's assume that this is in 32bit C++ or C#.
{
public int i1;
public char c1;
public long l1;
public char c2;
public short s1;
public char c3;
}
The first answer is 4+1+8+1+2+1 = 17. Nope! It's 24.
Well, it is called memory alignment and it has to do with the way CPUs work. They have memory registers of fixed size, various caches with different sizes and speeds, etc. Basically, when you ask for a 4 byte int, it needs to be "aligned" so that you get 4 bytes from the correct position into a single register. Otherwise the CPU needs to take two registers (let's say 1 byte in one and 3 bytes in another) then mask and shift both and add them into another register. That is unbelievably expensive at that level.
So, why 24? i1 is an int, it needs to be aligned on positions that are multiple of 4 bytes. 0 qualifies, so it takes 4 bytes. Then there is a char. Chars are one byte, can be put anywhere, so the size becomes 5 bytes. However, a long is 8 bytes, so it needs to be on a position that is a multiple of 8. That is why we add 3 bytes as padding, then we add the long in. Now the size is 16. One more char → 17. Shorts are 2 bytes, so we add one more padding byte to get to 18, then the short is added. The size is 20. And in the end you get the last char in, getting to 21. But now, the struct needs to be aligned with itself, meaning with the largest primitive used inside it, in our case the long with 8 bytes. That is why we add 3 more bytes so that the struct has a size that is a multiple of 8.
Note that a struct containing a struct will align it to its largest primitive element, not the actual size of the child struct. It's a recursive process.
Can we do something about it? What if I want to spend speed on memory or disk space? We can use directives such as StructLayout. It receives a LayoutKind - which defaults to Sequential, but can also be Auto or Explicit - and a numeric Pack parameter. Auto rearranges the order of the members of the class, so it takes the least amount of space. However, this has some side effects, like getting errors when you want to use Marshal.SizeOf. With Explicit, each field needs to be adorned with a FieldOffset attribute to determine the exact position in memory; that also means you can use several fields on the same position, like in:
[StructLayout(LayoutKind.Explicit)]The Pack parameter tells the system on how to align the fields. 0 is the default, but 1 will make the size of the first struct above to actually be 17.
public struct MyStruct
{
[FieldOffset(0)]
public int i1;
[FieldOffset(4)]
public int i2;
[FieldOffset(0)]
public long l1;
}
[StructLayout(LayoutKind.Sequential, Pack = 1)]Other values can be 2,4,8,16,32,64 or 128. You can test on how the performance is affected by this, as an exercise.
public struct MyStruct
{
public int i1;
public char c1;
public long l1;
public char c2;
public short s1;
public char c3;
}
More information here: Advanced c# programming 6: Everything about memory allocation in .NET
Update: I've created a piece of code to actually test for this:
unsafe static void Main(string[] args)
{
var st = new MyStruct();
Console.WriteLine($"sizeof:{sizeof(MyStruct)} Marshal.sizeof:{Marshal.SizeOf(st)} custom sizeof:{MySizeof(st)}");
Console.ReadKey();
}
private static long MySizeof(MyStruct st)
{
long before = GC.GetTotalMemory(true);
MyStruct[] array = new MyStruct[100000];
long after = GC.GetTotalMemory(true);
var size = (after - before) / array.Length;
return size;
}
Considering the original MyStruct, the size reported by all three ways of computing size is 24. I had to test the idea that the maximum byte padding is 4, so I used this structure:
public struct MyStructSince long is 8 bytes and byte is 1, I expected the size to be 16 and it was, not 12. However, I decided to also try with a decimal instead of the long. Decimal values have 16 bytes, so if my interpretation was correct, 17 bytes should be aligned with the size of the biggest struct primitive field: a multiple of 16, so 32. The result was weirdly inconsistent: sizeof:20 Marshal.sizeof:24 custom sizeof:20, which suggests an alignment to 4 or 8 bytes, not 16. So I started playing with the StructLayoutAttribute:
{
public long l;
public byte b;
}
[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct MyStruct
{
public decimal d;
public byte b;
}
For Pack = 1, I got the consistent 17 bytes. For Pack=4, I got consistent values of 20. For Pack=8 or higher, I got the weird 20-24-20 result, which suggests packing works differently for decimals than for other values. I've replaced the decimal with a struct containing two long values and the consistent result was back to 24, but then again, that's expected. Funny thing is that Guid is also a 16 byte value, although it is itself a struct, and the resulting size was 20. Guid is not a value type, though.
The only conclusion I can draw is that what I wrote in this post is true. Also, StructLayout Pack does not work as I had expected, instead it provides a minimum packing size, not a maximum one. If the biggest element in the struct is 8 bytes, then the minimum between the Pack value and 8 will be used for alignment.
The alignment of the type is the size of its largest element (1, 2, 4, 8, etc., bytes) or the specified packing size, whichever is smaller.
All this if you are not using decimals... then all bets are off! From my discussions with Filip B. Vondrášek in the comments of this post, I've reached the conclusion that decimals are internally structs that are aligned to their largest element, an int, so to 4 bytes. However, it seems Marshal.sizeof misreports the size of structs containing decimals, for some reason.
In fact, all "simple" types are structs internally, as described by the C# language specification, but the Decimal struct also implements IDeserializationEventListener, but I don't see how this would influence things. Certainly the compilers have optimizations for working with primitive types. This is as deep as I want to go with this, anyway.
Comments
I've updated the post with the information I could find and your corrections. I've used your name as well. Yes, it's an interesting topic, although in my entire career I didn't actually need memory compression or to care about alignment in memory. I might, though.
SideriteHmm, I'm not sure if it's a bug in Marshal.SizeOf. This: https://stackoverflow.com/a/49459858/1169354 says that sizeof() returns how much memory is needed in managed code in order to allocate the memory, while Marshal.SizeOf() returns how much memory is needed for non-managed code. Even though I'm not entirely sure of the difference, I would have to look into it more. This is a very interesting discussion, thanks both for that and also for your blog post!
Filip B. Vondrášekhttps://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/value-types declares decimals as different from structs, even if there is a Decimal struct and I am sure they have their internals identical. In my tests described in the blog post Marshal.SizeOf returns different sizes from sizeof() and actual calculation of size for decimals, but the result is consistent with any other "normal" struct. The custom size calculation is actually measuring used memory, so it makes sense that the 24 result from Marshal.SizeOf is the one that is wrong. This agrees with you, in the sense that Guid and Decimal/decimal have a 4 byte alignment, but it suggests a bug in Marshal.SizeOf, at least. More info here: https://blog.tedd.no/2018/03/18/sizeof-vs-marshal-sizeof/ where it shows that SizeOfHelper is an internal call, which is unlikely to ever get changed. I will have to read more on decimals. I always knew that they were kind of slow and inconsistent compared to other primitive types, but never cared enough to look for why.
Siderite1) Structs *are* value types. 2) I am not sure how Guid behaves, but the implementation (https://referencesource.microsoft.com/#mscorlib/system/guid.cs) is 1 int, 2 shorts and 8 bytes. So it should be aligned to 4 bytes. Also, here is the implementation of decimal: https://referencesource.microsoft.com/#mscorlib/system/decimal.cs, you can clearly see it contains 4 ints.
Filip B. VondrášekI would have agreed with you if 1) decimal was not a value type 2) Guid (or any other struct) would have behaved in the same way. I still think there are some weird inconsistencies there. And yes, structs are being aligned recursively, I will have to check the post to see if I expressed this clearly. Thanks!
SideriteHello Siderite, you are actually right. I was just checking it myself in code. I was confused by how it handles structs in structs. :) Which is also what we're seeing with decimals! Decimal is a struct containing 4 ints. Even though decimal takes 16 bytes, the largest element in decimal itself is only 4 bytes. So, in fact, C# adds padding not for the largest element in the struct, but if it also contains a struct, it recursively gets the largest element in that. Hopefully I wrote it in a readable way.
Filip B. VondrášekThanks for the comment. I've updated the post to prove you are wrong... but also that what I wrote was not exactly true in all cases. Have you any idea what decimals work differently than other types?
Siderite[This comment was proven wrong, read the following discussion] In C# the most bytes it can add as padding is 4 bytes, not the size of the largest element.
Filip B. Vondrášek