ARM ×86-64: Everything You Should Know

ARM x86-64 (also called x86_64, x64, or AMD64) is the 64-bit CPU architecture that is used in Intel and AMD processors. It is an extension of the 32-bit x86 (i386) architecture.

A 64-bit version of the x86 instruction set was first announced in 1999. It introduced two new modes of operation: 64-bit mode and compatibility mode, along with a new 4-level paging mode.

The ARM ×86-64 architecture is used in most CPUs for home computers and servers in use today. ARM x86-64 is designed with a complex instruction set computing (CISC) approach.

CISC seeks to have many different instructions and to perform complex procedures in only a single instruction.

Arm x86-64 also expands general-purpose registers to 64-bit, expands the number of them from 8 (some of which had limited or fixed functionality, e.g., for stack management) to 16 (fully general), and provides numerous other enhancements.

Floating-point arithmetic is supported via mandatory SSE2-like instructions, and x87/MMX style registers are generally not used (but still available even in 64-bit mode); instead, a set of 16 vector registers, 128 bits each, is used.

(Each register can store one or two double-precision numbers, one to four single-precision numbers, or various integer formats.)

In 64-bit mode, instructions are modified to support 64-bit operands and 64-bit addressing.

The Brief History of ARM x86

AMD developed the x86-64 architecture design and instruction set in 1999. The first commercially available CPU capable of using it was released in 2003. It was designed as a simple 64-bit extension to the existing x86 instruction set.

This lets it maintain full compatibility with existing operating systems and software without any changes or performance impact. Since this version was developed by AMD, it is called amd64 in some technical sources.

The compatibility mode defined in the architecture allows 16-bit and 32-bit user applications to run unmodified, coexisting with 64-bit applications if the 64-bit operating system supports them.

As the full ARM ×86-64 instruction sets remain implemented in hardware without any intervening emulation, these older executables can run with little or no performance penalty, while newer or modified applications can take advantage of new features of the processor design to achieve performance improvements.

Also, a processor supporting x86-64 still powers on in real mode for full backward compatibility with the 8086, as x86 processors supporting protected mode have done since the 80286.

How Arm x86-64 works

The Arm x86-64 instruction set is a 64-bit architecture. This means that the CPU’s registers, instructions, memory addresses, and operands (numbers to be worked with) are all 64-bits long.

A 64-bit CPU could theoretically access up to 16 exabytes of memory address space. Since this is far more than is needed by any system, most current implementations usually limit the physical memory addresses to 48 bits, or 256 terabytes of RAM.

The system’s virtual memory is similarly limited to 48 bits in many implementations. Newer Intel and AMD processors (Ice Lake or newer) support Level-5 paging, which raises this to 57-bits, or 128 petabytes.

To keep 64-bit memory addresses in sync and to support the limited addressing space, the most significant 16 bits must be the same.

This results in two chunks of usable memory space. The lower half goes from 0000000000000000 to 00007FFFFFFFFFFF and the upper half goes from FFFF800000000000 to FFFFFFFFFFFFFFFF.

This is known as canonical addressing. Many operating systems reserve the higher half for protected system or kernel memory and the lower half for user or program use.

Some Architectural features of ARM x86-64

1. 64-bit integer capability

All general-purpose registers (GPRs) are expanded from 32 bits to 64 bits, and all arithmetic and logical operations, memory-to-register and register-to-memory operations, etc., can operate directly on 64-bit integers. Pushes and pops on the stack default to 8-byte strides, and pointers are 8 bytes wide.

2. SSE instructions

The original AMD64 architecture adopted Intel’s SSE and SSE2 as core instructions. These instruction sets provide a vector supplement to the scalar x87 FPU, for the single-precision and double-precision data types. SSE2 also offers integer vector operations, for data types ranging from 8-bit to 64-bit precision.

This puts the vector capabilities of the architecture on par with those of the most advanced x86 processors of their time.

These instructions can also be used in 32-bit mode. The increase in 64-bit processors has made these vector capabilities ubiquitous in home computers, allowing the improvement of the standards of 32-bit applications.

The 32-bit edition of Windows 8, for example, requires the presence of SSE2 instructions. SSE3 instructions and later Streaming SIMD Extensions instruction sets are not standard features of the architecture.

3. Additional registers

In addition to increasing the size of the general-purpose registers, the number of named general-purpose registers is increased from eight (i.e., eax, ecx, edx, ebx, esp, ebp, esi, edi) in x86 to 16 (i.e., rax, rcx, RDX, rbx, rsp, rbp, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15).

It is, therefore, possible to keep more local variables in registers rather than on the stack, and to let registers hold frequently accessed constants; arguments for small and fast subroutines may also be passed in registers to a greater extent.

AMD64 still has fewer registers than many RISC instruction sets (e.g. PA-RISC, Power ISA, and MIPS have 32 GPRs; Alpha, 64-bit ARM, and SPARC have 31) or VLIW-like machines such as the IA-64 (which has 128 registers).

However, an AMD64 implementation may have far more internal registers than the number of architectural registers exposed by the instruction set. For example, AMD Zen cores have 168 64-bit integers and 160 128-bit vector floating-point physical internal registers.

4. Larger virtual address space

The AMD64 architecture defines a 64-bit virtual address format, of which the low-order 48 bits are used in current implementations.

This allows up to 256 TB (248 bytes) of virtual address space. The architecture definition allows this limit to be raised in future implementations to the full 64 bits, extending the virtual address space to 16 EB (264 bytes). This is compared to just 4 GB (232 bytes) for the x86.

This means that very large files can be operated on by mapping the entire file into the process’s address space (which is often much faster than working with file read/write calls), rather than having to map regions of the file into and out of the address space.

5. Additional XMM (SSE) registers

Similarly, the number of 128-bit XMM registers (used for streaming SIMD instructions) has also increased from 8 to 16.

The traditional x87 FPU register stack is not included in the register file size extension in 64-bit mode, compared with the XMM registers used by SSE2, which did get extended.

The x87 register stack is not a simple register file, although it does allow direct access to individual registers through low-cost exchange operations.

Pros of ARM x86-64 processors

1. Low Costs

ARM processors are affordable to create and typically don’t require expensive equipment to do so. ARM processors are often ideal for lower-cost devices, such as mobile phones.

2. Better Battery Life

Additionally, ARM processors consume less battery due to their single-cycle computing set; therefore, ARM processors have a better battery life. This is also often ideal for mobile devices that are often used without a power connection, unlike laptop computers.

3. Simple Design

Due to its RISC design, which has a less complex architecture, ARM processors are simpler in design and are often much more compact. This allows the processors to be implemented in smaller devices. This is a benefit to the growing consumer demand for more handheld and portable devices.

4. Lower Heat Generation

ARM processors also generate less heat, allowing devices like smartphones or tablets to be thinner and be constantly held by the user.

5. Low Power Requirements

ARM ×86-64 processors also operate with low power requirements and consume less power compared to other processors due to their RISC architecture design. This is also due to its ability to run only one cycle to execute a command, reducing functions.

Cons of ARM x86-64 Processors

ARM ×86-64 is not compatible with x86 programs like Windows OS. The speeds are limited on some processors.

The simpler instruction set may be inadequate for heavier workloads. ARM ×86-64 has a limited calculation capacity. Performance depends on the ability of the programmer to execute properly, which often requires highly skilled programmers.

Testing and Development using ARM-based Devices. Total Phase offers various debugging and development tools that allow users to test their embedded system devices as well as monitor the bus for any errors or inconsistencies.

Conclusion

In summary, the ARM x86-64 architecture is a vast processor adopted for desktop and laptop personal computers and servers that were commonly configured for 16GB of memory or more.

It has effectively replaced the discontinued Intel Itanium architecture (formerly IA-64) and the 32-bit x86, which was originally intended to replace the x86 architecture. x86-64 and Itanium are not compatible on the native instruction set level, and operating systems and applications compiled for one architecture cannot be run on the other natively.