EP75 Unions

119 ้˜…่ฏป4ๅˆ†้’Ÿ

Unions, a low-level feature inherited from the C programming language, allow more than one member to share the same memory area.

unionๆ˜ฏ็ปงๆ‰ฟ่‡ชC็ผ–็จ‹่ฏญ่จ€็š„ไฝŽ็บง็‰นๆ€ง๏ผŒๅ…่ฎธๅคšไธชๆˆๅ‘˜ๅ…ฑไบซ็›ธๅŒ็š„ๅ†…ๅญ˜ๅŒบๅŸŸใ€‚

Unions are very similar to structs with the following main differences:

  • Unions are defined by the union keyword.
  • The members of a union are not independent; they share the same memory area.

Just like structs, unions can have member functions as well.

ๅฐฑๅƒ็ป“ๆž„ไธ€ๆ ท๏ผŒ่”ๅˆไนŸๅฏไปฅๆœ‰ๆˆๅ‘˜ๅ‡ฝๆ•ฐใ€‚

The examples below will produce different results depending on whether they are compiled on a 32-bit or a 64-bit environment. To avoid getting confusing results, please use the -m32 compiler switch when compiling the examples in this chapter. Otherwise, your results may be different than mine due to alignment, which we will see in a later chapter.

ไธ‹้ข็š„็คบไพ‹ๅฐ†ๆ นๆฎๅฎƒไปฌๆ˜ฏๅœจ32ไฝ่ฟ˜ๆ˜ฏ64ไฝ็Žฏๅขƒไธญ็ผ–่ฏ‘่€Œไบง็”ŸไธๅŒ็š„็ป“ๆžœใ€‚ไธบไบ†้ฟๅ…ๅพ—ๅˆฐๆททๆท†็š„็ป“ๆžœ๏ผŒๅœจ็ผ–่ฏ‘ๆœฌ็ซ ็š„็คบไพ‹ๆ—ถ๏ผŒ่ฏทไฝฟ็”จ-m32็ผ–่ฏ‘ๅ™จๅผ€ๅ…ณใ€‚ๅฆๅˆ™๏ผŒ็”ฑไบŽๅฏน้ฝ๏ผŒๆ‚จ็š„็ป“ๆžœๅฏ่ƒฝไธŽๆˆ‘็š„็ป“ๆžœไธๅŒ๏ผŒๆˆ‘ไปฌๅฐ†ๅœจๅŽ้ข็š„็ซ ่Š‚ไธญ็œ‹ๅˆฐ่ฟ™ไธ€็‚นใ€‚

Naturally, struct objects are as large as necessary to accommodate all of their members:

// Note: Please compile with the -m32 compiler switch
struct S {
    int i;
    double d;
}

// ...

    writeln(S.sizeof);

Since int is 4 bytes long and double is 8 bytes long, the size of that struct is the sum of their sizes:

12

In contrast, the size of a union with the same members is only as large as its largest member:

union U {
    int i;
    double d;
}

// ...

    writeln(U.sizeof);

The 4-byte int and the 8-byte double share the same area. As a result, the size of the entire union is the same as its largest member:

8

Unions are not a memory-saving feature. It is impossible to fit multiple data into the same memory location. The purpose of a union is to use the same area for different type of data at different times. Only one of the members can be used reliably at one time. However, although doing so may not be portable to different platforms, union members can be used for accessing fragments of other members.

่”ๅˆไธๆ˜ฏไธ€ไธช่Š‚็œๅ†…ๅญ˜็š„ๅŠŸ่ƒฝใ€‚ๅฐ†ๅคšไธชๆ•ฐๆฎๆ”พๅ…ฅๅŒไธ€ไธชๅ†…ๅญ˜ไฝ็ฝฎๆ˜ฏไธๅฏ่ƒฝ็š„ใ€‚่”ๅˆ็š„็›ฎ็š„ๆ˜ฏๅœจไธๅŒ็š„ๆ—ถ้—ดไธบไธๅŒ็ฑปๅž‹็š„ๆ•ฐๆฎไฝฟ็”จ็›ธๅŒ็š„ๅŒบๅŸŸใ€‚ไธ€ๆฌกๅช่ƒฝๅฏ้ ๅœฐไฝฟ็”จๅ…ถไธญไธ€ไธชๆˆๅ‘˜ใ€‚็„ถ่€Œ๏ผŒๅฐฝ็ฎก่ฟ™ๆ ทๅšๅฏ่ƒฝๆ— ๆณ•็งปๆคๅˆฐไธๅŒ็š„ๅนณๅฐ๏ผŒไฝ†่”ๅˆๆˆๅ‘˜ๅฏไปฅ็”จไบŽ่ฎฟ้—ฎๅ…ถไป–ๆˆๅ‘˜็š„็‰‡ๆฎตใ€‚

One of the examples below takes advantage of typeid to disallow access to members other than the one that is currently valid.

The following diagram shows how the 8 bytes of the union above are shared by its members:

       0      1      2      3      4      5      6      7
โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€
   โ”‚<โ”€โ”€โ”€  4 bytes for int  โ”€โ”€โ”€>                            โ”‚
   โ”‚<โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  8 bytes for double  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€>โ”‚
โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€

Either all of the 8 bytes are used for the double member, or only the first 4 bytes are used for the int member and the other 4 bytes are unused.

Unions can have as many members as needed. All of the members would share the same memory location.

The fact that the same memory location is used for all of the members can have surprising effects. For example, let's initialize a union object by its int member and then access its double member:

auto u = U(42);    // initializing the int member
writeln(u.d);      // accessing the double member

Initializing the int member by the value 42 sets just the first 4 bytes, and this affects the double member in an unpredictable way:

2.07508e-322

Depending on the endianness of the microprocessor, the 4 bytes may be arranged in memory as 0|0|0|42, 42|0|0|0, or in some other order. For that reason, the value of the double member may appear differently on different platforms.

75.1 Anonymous unions

Anonymous unions specify what members of a user-defined type share the same area:

struct S {
    int first;

    union {
        int second;
        int third;
    }
}

// ...

    writeln(S.sizeof);

The last two members of S share the same area. So, the size of the struct is a total of two ints: 4 bytes needed for first and another 4 bytes to be shared by second and third:

8

75.2 Dissecting other members

Unions can be used for accessing individual bytes of variables of other types. For example, they make it easy to access the 4 bytes of an IPv4 address individually.

The 32-bit value of the IPv4 address and a fixed-length array can be defined as the two members of a union:

union IpAddress {
    uint value;
    ubyte[4] bytes;
}

The members of that union would share the same memory area as in the following figure:

        0          1          2          3
โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€
   โ”‚ <โ”€โ”€โ”€โ”€  32 bits of the IPv4 address  โ”€โ”€โ”€โ”€> โ”‚
   โ”‚ bytes[0] โ”‚ bytes[1] โ”‚ bytes[2] โ”‚ bytes[3] โ”‚
โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€

For example, when an object of this union is initialized by 0xc0a80102 (the value that corresponds to the dotted form 192.168.1.2), the elements of the bytes array would automatically have the values of the four octets:

import std.stdio;

void main() {
    auto address = IpAddress(0xc0a80102);
    writeln(address.bytes);
}

When run on a little-endian system, the octets would appear in reverse of their dotted form:

[2, 1, 168, 192]

The reverse order of the octets is another example of how accessing different members of a union may produce unpredictable results. This is because the behavior of a union is guaranteed only if that union is used through just one of its members. There are no guarantees on the values of the members other than the one that the union has been initialized with.

Although it is not directly related to this chapter, bswap from the core.bitop module is useful in dealing with endianness issues. bswap returns its parameter after swapping its bytes. Also taking advantage of the endian value from the std.system module, the octets of the previous IPv4 address can be printed in the expected order after swapping its bytes:

import std.system;
import core.bitop;

// ...

    if (endian == Endian.littleEndian) {
        address.value = bswap(address.value);
    }

The output:

[192, 168, 1, 2]

Please take the IpAddress type as a simple example; in general, it would be better to consider a dedicated networking module for non-trivial programs.