Power to the People -
the OpenJDK PowerPC/AIX Port

Steve Pool (IBM) , Götz Lindenmaier & Volker Simonis (SAP)

Current Status

C++-Interpreter and C2-Server compiler have been ported
Successfully passed the Java SE 7 TCK on Linux and AIX in 64-bit mode
JEP for integration of the port into the JDK 8 main branch submitted
- JEP 175 : Integrate PowerPC/AIX Port into JDK 8
Nightly builds on porting platforms and Oracle platforms
Some testing (mainly jtreg regression tests and benchmarks)

40 mailing list subscribers

Build Results

http://cr.openjdk.java.net/~simonis/ppc-aix-port/

Performance - JVM98

Performance - JVM2008

Our background - the SAP JVM

The SAP JVM supports Java 1.4, 5, 6, 7 and runs on 15 platforms:

Linux on x86, x86_64, IA64, PPC64, zSeries; Windows on x86, x86_64, IA64;
Solaris on x86_64, SPARC; HPUX on IA64, PARISC; AIX and AS400 on PPC64; MacOS X on x86_64

..and we provide support for any SAP JVM version until the end of days:)

The SAP JVM is derived from the Sun/Oracle code base:

with custom ports to the platforms not supported by Oracle
with enhancments mainly in the supportibility area
with special addons (e.g. SAP JVM Profiler)

We constantly integrate Oracle changes:

leading to an increasing code divergence between our and Oracle's version
initially we only got "source drops" from Sun/Oracle:
- i.e. the source code of every released Java version and every JDK update
after more than 7 years, merging has become a nightmare

The OpenJDK Project and how we joined

Announced at JavaOne 2006

open source implementation of Java SE
licensed under GPLv2 (with Classpath exception)

SAP can't use OpenJDK directly:

because its customers expect a commercially licensed JDK
because it also has to deliver JDKs 1.4 and 5

It took 5 years until SAP "officially" joined the OpenJDK project:

convincing SAP executives/developers to join an open source project was not easy
Oracel's Sun acquisition was not helpful either:)
we had to ensure that we get contributed code back under our commercial license

Today, the OpenJDK is a playground and collaboration space for different implementers:

IBM, RedHat, Apple, Twitter, Azul, SAP, ..

The OpenJDK Source Tree

The OpenJDK consists of two major building blocks

the HotSpot Virtual Machine (~1.600 files, ~340.000 loc)

coded in C++ and Assembler

well organized by operating system and architecture

hotspot/src/cpu       hotspot/src/cpu/ppc      hotspot/src/os/aix        hotspot/src/os_cpu/aix_ppc
hotspot/src/os        hotspot/src/cpu/sparc    hotspot/src/os/bsd        hotspot/src/os_cpu/bsd_x86
hotspot/src/os_cpu    hotspot/src/cpu/x86      hotspot/src/os/linux      hotspot/src/os_cpu/bsd_zero
hotspot/src/share     hotspot/src/cpu/zero     hotspot/src/os/posix      hotspot/src/os_cpu/linux_ppc
                                               hotspot/src/os/solaris    hotspot/src/os_cpu/linux_sparc
                                               hotspot/src/os/windows    hotspot/src/os_cpu/linux_x86
                                                                         hotspot/src/os_cpu/linux_zero
                                                                         hotspot/src/os_cpu/solaris_sparc
                                                                         hotspot/src/os_cpu/solaris_x86
                                                                         hotspot/src/os_cpu/windows_x86

the Java class library (~1.1000 files, ~235.000 loc)
- coded in Java with a considerable amount of native parts(C and C++)
- the native parts only divided into a *nix flavor and Windows
```
jdk/src/solaris
jdk/src/windows
```

The HotSpot VM

The HotSpot VM first appeared in 2000 with Java 1.3 and is constantly evolving since then:

mostly architecture dependent parts:

Bytecode Interpreters
- Template Interpreter
- C++ Interpreter
JIT Compilers
- C1 aka "Client compiler"
- C2 aka "Server compiler"

mostly OS dependent parts:

Runtime system
- Memory handling (VM Heap, Java Heap, CodeCache)
- Process/Thread/Signal handling

mostly generic parts:

Garbage collectors
Class loader/verifiers

Porting the HotSpot VM - Effort

Taking the Linux/x86_64 version as reference implementation:

hotspot/src/share            (~1100 files, ~100.000 loc)

hotspot/src/os/linux         (  ~25 files,   ~9.000 loc)
hotspot/src/os_cpu/linux_x86 (  ~20 files,   ~3.500 loc)
hotspot/src/cpu/x86          ( ~100 files,  ~90.000 loc)

these numbers include both interpreters and both JIT compilers

We started with an interpreter-only version (using the C++Interpreter on Linux):
```
hotspot/src/os_cpu/linux_ppc (  ~10 files,   ~1.500 loc)
hotspot/src/cpu/ppc          (  ~60 files,  ~22.000 loc)
```
- successfully runs JVM98 and can be used as bootstrap JDK

We are currently working on the:
C2 JIT compiler:

hotspot/src/os_cpu/linux_ppc (+  ~6 files,+    ~400 loc)
hotspot/src/cpu/ppc          (+ ~20 files,+ ~25.000 loc)

AIX port:

hotspot/src/os/aix           (  ~30 files,+ ~14.000 loc)
hotspot/src/os_cpu/aix_ppc   (  ~15 files,+   ~2000 loc)

we already have this code in the SAP JVM - just have to bring it to the OpenJDK

The C++Interpreter

consists of a huge interpreter loop written in C++
and a so called "frame manager" written in Assembler

the "frame manager" is a frameless method which handles Java method invocations
this keeps the Java frames continous on the mixed Java/Native stack
and we only have one activation of the C++ interpreter loop on top of the stack
see hotspot/src/cpu/ppc64/vm/cppInterpreter_ppc64.cpp (~3000 lines of assembler code)

+------------------+
|                  |  C++ interpreter loop
+------------------+
|xxxxxxxxxxxxxxxxxx|  java frame n
+------------------+
:       ....       :
+------------------+
|xxxxxxxxxxxxxxxxxx|  java frame 0
+------------------+
|//////////////////|  vm
|//////////////////|

One big challange when porting the C++Interpreter is that you first have implement
a Macro Assembler for your architecture!

the OpenJDK contains macro assemblers for x86 (~12.000 loc), SPARC (~8.000 loc) and ppc64 (~9.000 loc)
- see hotspot/src/cpu/<arch>/vm/assembler_<arch>.{hpp,inline.hpp,cpp}
the Macro Assembler can be reused for the JIT compilers

The C2 "Server" JIT Compiler

The C2 "Server" JIT Compiler is the biggest (and most complicated) part of the HotSpot VM.
It consists of three main parts:

the generic optimizer written in C++ (under src/share/vm/opto)
an "Architecture Definition Language" and Compiler written in C++ (under src/share/vm/adlc)

the "Architecture Definition" file written in ADL (under src/cpu/<arch>/vm/<arch>.ad)

hotspot/src/share/vm/opto         ( ~110 files, ~128.000 loc)
hotspot/src/share/vm/adlc         (  ~23 files,  ~26.000 loc)

hotspot/src/cpu/x86/vm/x86_32.ad  (              ~14.000 loc)
hotspot/src/cpu/x86/vm/x86_64.ad  (              ~13.000 loc)
hotspot/src/cpu/sparc/vm/sparc.ad (              ~10.000 loc)
hotspot/src/cpu/ia64/vm/ia64.ad   (              ~26.000 loc)
hotspot/src/cpu/ppc/vm/ppc_64.ad  (              ~14.000 loc)

For every new architecture the corresponding AD file has to be written which means:

defining the different registers
defining the different calling conventions
defining concrete "encodings" (i.e. assembler instructions) for the abstract optimizer nodes

Changes in shared code

Basic changes:

make
includes of platform headers where needed in shared code
C syntax adaptions for compilers on PPC (xlc)
Platform dependent macros etc. implemented in shared files

Changes in shared code

Adaptions and fixes of existing features in the C2 compiler

cast 32-bit integers to 64-bit for native calls
pass arguments in register AND stack
constant table: adl does not support keyword constanttablebase in calls
we load inline cache IC and call target from constant table →
we misuse another in of calls and add the constant table base node there

adl: specify that a node should / cannot be rematerialized

instruct loadConP(iRegPdst dst, immP src) %{
  match(Set dst src);
  ins_cannot_rematerialize(true);
  format %{ "LD $dst, offset, $constanttablebase \t// load ptr $src from table, late expanded " %}
  lateExpand( lateExpand_load_ptr_constant(dst, src, constanttablebase) );
%}

adl: specify fields to be added to nodes

ins_attrib ins_field_cbuf_insts_offset(-1);

instruct exLoadConL(iRegLdst dst, immL src, iRegLdst toc, immI isoop) %{
  effect(DEF dst, USE src, USE toc, USE isoop);
  // Needed so that CallDynamicJavaDirect can compute the address of this
  // instruction for relocation.
  ins_field_cbuf_insts_offset(int);
  format %{ "LD $dst, offset, $toc \t// load long(isoop=$isoop) $src from TOC" %}
  ins_encode( ppc_enc_load_long_constL(dst, src, toc, isoop) );
%}

implicit null checks: consider whether the zero page is read protected
register allocation: fix problem with rematerialization
register allocation: fix yank_node() for ppc ir graph patterns

Changes in shared code

Extended and new features: Cpp-Interpreter

To use the Cpp-Interpreter in a fully fletched VM we implemented support for:

bytecode profiling (large)
weak memory ordering (sophisticated)
on stack replacement (OSR)
G1 garbage collection
method handles
compressed Oops

Changes in shared code

Extended and new features: C2 compiler

extend Load and Store nodes to know about memory ordering
if requested and supported in the ad file, they issue load_aquire or release_store.
issue memory barriers as needed on PPC
trampolies: we use trampolines for calls/branches that are sometimes close, sometimes far.
Extended relocations to support this.
new phase lateExpand:

expand mach nodes representing several assembler instructions
after register allocation
to get nodes matching assembler instructions to ease scheduling
required for data flow edges register allocation can not deal with
e.g. values not spillable


                                                      RegN                                     
                                                ________|__________
          RegN                                 |                   |                           
            |                                  |  Decode_NN_shift  |                           
    ________|__________                        |___________________|                           
   |                   |                                |                                       
   |     Decode_NN     |       lateExpand               |  this value is not RegN nor RegP             
   |___________________|     ==============>    ________|__________                            
            |                                  |                   |                           
            |                                  |   Decode_NN_add   |                           
          RegP                                 |___________________|                           
                                                        |                                       
                                                      RegP

Changes in shared code:

Extended and new features: runtime

safefetch: use generated code instead of inline assembly: better portable
fix memory ordering of taskqueue used by GCs
around 15 other memory ordering fixes in runtime code

Otimizations not yet contributed

shorten branches & small constant pool (ppc only)
Common subexpression elemination phase after matching (shared)
round robin register allocation (shared)
code scheduler for Power6 (ppc only)
better heap placement for compressed Oops (shared)
more aggressive matching of Decode nodes in unscaled compressed oops mode (shared)
reduce memory barriers in card marking (shared)

Porting-Lessons learned

During the last year we've ported HotSpot to quite some new platforms and we learned t

the HotSpot has a very steep learning curve
there is not much documentation available
defining concrete "encodings" (i.e. assembler instructions) for the abstract optimizer nodes

Questions?

Rechenmaschiene von Philipp Matthäus Hahn (1739-1790) Quelle: Württembergisches Landesmuseum Stuttgart

Power to the People - the OpenJDK PowerPC/AIX Port

Current Status

Build Results

Performance - JVM98

Performance - JVM2008

Our background - the SAP JVM

The OpenJDK Project and how we joined

The OpenJDK Source Tree

The HotSpot VM

Porting the HotSpot VM - Effort

The C++Interpreter

The C2 "Server" JIT Compiler

Changes in shared code

Changes in shared code

Changes in shared code

Changes in shared code

Changes in shared code:

Otimizations not yet contributed

Porting-Lessons learned

Questions?

Power to the People -
the OpenJDK PowerPC/AIX Port