Power to the People -
the OpenJDK PowerPC/AIX Port
Steve Pool (IBM) , Götz Lindenmaier & Volker Simonis (SAP)
Current Status
- C++-Interpreter and C2-Server compiler have been ported
- Successfully passed the Java SE 7 TCK on Linux and AIX in 64-bit mode
- JEP for integration of the port into the JDK 8 main branch submitted
- Nightly builds on porting platforms and Oracle platforms
- Some testing (mainly
jtreg
regression tests and benchmarks)
- 40 mailing list subscribers
Our background - the SAP JVM
The SAP JVM supports Java 1.4, 5, 6, 7 and runs on 15 platforms:
- Linux on x86, x86_64, IA64, PPC64, zSeries; Windows on x86, x86_64, IA64;
Solaris on x86_64, SPARC; HPUX on IA64, PARISC; AIX and AS400 on PPC64; MacOS X on x86_64
..and we provide support for any SAP JVM version until the end of days:)
The SAP JVM is derived from the Sun/Oracle code base:
- with custom ports to the platforms not supported by Oracle
- with enhancments mainly in the supportibility area
- with special addons (e.g. SAP JVM Profiler)
We constantly integrate Oracle changes:
- leading to an increasing code divergence between our and Oracle's version
- initially we only got "source drops" from Sun/Oracle:
- i.e. the source code of every released Java version and every JDK update
- after more than 7 years, merging has become a nightmare
The OpenJDK Project and how we joined
Announced at JavaOne 2006
- open source implementation of Java SE
- licensed under GPLv2 (with Classpath exception)
SAP can't use OpenJDK directly:
- because its customers expect a commercially licensed JDK
- because it also has to deliver JDKs 1.4 and 5
It took 5 years until SAP "officially" joined the OpenJDK project:
- convincing SAP executives/developers to join an open source project was not easy
- Oracel's Sun acquisition was not helpful either:)
- we had to ensure that we get contributed code back under our commercial license
Today, the OpenJDK is a playground and collaboration space for different implementers:
- IBM, RedHat, Apple, Twitter, Azul, SAP, ..
The OpenJDK Source Tree
The OpenJDK consists of two major building blocks
- the HotSpot Virtual Machine (~1.600 files, ~340.000 loc)
- the Java class library (~1.1000 files, ~235.000 loc)
The HotSpot VM
The HotSpot VM first appeared in 2000 with Java 1.3 and is constantly evolving since then:
- mostly architecture dependent parts:
- Bytecode Interpreters
- Template Interpreter
- C++ Interpreter
- JIT Compilers
- C1 aka "Client compiler"
- C2 aka "Server compiler"
- mostly OS dependent parts:
- Runtime system
- Memory handling (VM Heap, Java Heap, CodeCache)
- Process/Thread/Signal handling
- mostly generic parts:
- Garbage collectors
- Class loader/verifiers
Porting the HotSpot VM - Effort
- Taking the Linux/x86_64 version as reference implementation:
hotspot/src/share (~1100 files, ~100.000 loc)
hotspot/src/os/linux ( ~25 files, ~9.000 loc)
hotspot/src/os_cpu/linux_x86 ( ~20 files, ~3.500 loc)
hotspot/src/cpu/x86 ( ~100 files, ~90.000 loc)
- these numbers include both interpreters and both JIT compilers
- We are currently working on the:
C2 JIT compiler:
hotspot/src/os_cpu/linux_ppc (+ ~6 files,+ ~400 loc)
hotspot/src/cpu/ppc (+ ~20 files,+ ~25.000 loc)
AIX port:
hotspot/src/os/aix ( ~30 files,+ ~14.000 loc)
hotspot/src/os_cpu/aix_ppc ( ~15 files,+ ~2000 loc)
- we already have this code in the SAP JVM - just have to bring it to the OpenJDK
The C++Interpreter
- consists of a huge interpreter loop written in C++
- and a so called "frame manager" written in Assembler
- the "frame manager" is a frameless method which handles Java method invocations
- this keeps the Java frames continous on the mixed Java/Native stack
- and we only have one activation of the C++ interpreter loop on top of the stack
- see hotspot/src/cpu/ppc64/vm/cppInterpreter_ppc64.cpp (~3000 lines of assembler code)
+------------------+
| | C++ interpreter loop
+------------------+
|xxxxxxxxxxxxxxxxxx| java frame n
+------------------+
: .... :
+------------------+
|xxxxxxxxxxxxxxxxxx| java frame 0
+------------------+
|//////////////////| vm
|//////////////////|
One big challange when porting the C++Interpreter is that you first have implement
a Macro Assembler for your architecture!
- the OpenJDK contains macro assemblers for x86 (~12.000 loc), SPARC (~8.000 loc) and ppc64 (~9.000 loc)
- see hotspot/src/cpu/<arch>/vm/assembler_<arch>.{hpp,inline.hpp,cpp}
- the Macro Assembler can be reused for the JIT compilers
The C2 "Server" JIT Compiler
The C2 "Server" JIT Compiler is the biggest (and most complicated) part of the HotSpot VM.
It consists of three main parts:
- the generic optimizer written in C++ (under src/share/vm/opto)
- an "Architecture Definition Language" and Compiler written in C++ (under src/share/vm/adlc)
- the "Architecture Definition" file written in ADL (under src/cpu/<arch>/vm/<arch>.ad)
hotspot/src/share/vm/opto ( ~110 files, ~128.000 loc)
hotspot/src/share/vm/adlc ( ~23 files, ~26.000 loc)
hotspot/src/cpu/x86/vm/x86_32.ad ( ~14.000 loc)
hotspot/src/cpu/x86/vm/x86_64.ad ( ~13.000 loc)
hotspot/src/cpu/sparc/vm/sparc.ad ( ~10.000 loc)
hotspot/src/cpu/ia64/vm/ia64.ad ( ~26.000 loc)
hotspot/src/cpu/ppc/vm/ppc_64.ad ( ~14.000 loc)
For every new architecture the corresponding AD file has to be written which means:
- defining the different registers
- defining the different calling conventions
- defining concrete "encodings" (i.e. assembler instructions) for the abstract optimizer nodes
Changes in shared code
Basic changes:
- make
- includes of platform headers where needed in shared code
- C syntax adaptions for compilers on PPC (xlc)
- Platform dependent macros etc. implemented in shared files
Changes in shared code
Adaptions and fixes of existing features in the C2 compiler
- cast 32-bit integers to 64-bit for native calls
- pass arguments in register AND stack
- constant table: adl does not support keyword constanttablebase in calls
we load inline cache IC and call target from constant table →
we misuse another in of calls and add the constant table base node there
- adl: specify that a node should / cannot be rematerialized
instruct loadConP(iRegPdst dst, immP src) %{
match(Set dst src);
ins_cannot_rematerialize(true);
format %{ "LD $dst, offset, $constanttablebase \t// load ptr $src from table, late expanded " %}
lateExpand( lateExpand_load_ptr_constant(dst, src, constanttablebase) );
%}
- adl: specify fields to be added to nodes
ins_attrib ins_field_cbuf_insts_offset(-1);
instruct exLoadConL(iRegLdst dst, immL src, iRegLdst toc, immI isoop) %{
effect(DEF dst, USE src, USE toc, USE isoop);
// Needed so that CallDynamicJavaDirect can compute the address of this
// instruction for relocation.
ins_field_cbuf_insts_offset(int);
format %{ "LD $dst, offset, $toc \t// load long(isoop=$isoop) $src from TOC" %}
ins_encode( ppc_enc_load_long_constL(dst, src, toc, isoop) );
%}
- implicit null checks: consider whether the zero page is read protected
- register allocation: fix problem with rematerialization
- register allocation: fix yank_node() for ppc ir graph patterns
Changes in shared code
Extended and new features: Cpp-Interpreter
To use the Cpp-Interpreter in a fully fletched VM
we implemented support for:
- bytecode profiling (large)
- weak memory ordering (sophisticated)
- on stack replacement (OSR)
- G1 garbage collection
- method handles
- compressed Oops
Changes in shared code
Extended and new features: C2 compiler
- extend Load and Store nodes to know about memory ordering
if requested and supported in the ad file, they issue load_aquire or release_store.
- issue memory barriers as needed on PPC
- trampolies: we use trampolines for calls/branches that are sometimes close,
sometimes far.
Extended relocations to support this.
- new phase lateExpand:
- expand mach nodes representing several assembler instructions
- after register allocation
- to get nodes matching assembler instructions to ease scheduling
- required for data flow edges register allocation can not deal with
- e.g. values not spillable
RegN
________|__________
RegN | |
| | Decode_NN_shift |
________|__________ |___________________|
| | |
| Decode_NN | lateExpand | this value is not RegN nor RegP
|___________________| ==============> ________|__________
| | |
| | Decode_NN_add |
RegP |___________________|
|
RegP
Changes in shared code:
Extended and new features: runtime
- safefetch: use generated code instead of inline assembly: better portable
- fix memory ordering of taskqueue used by GCs
- around 15 other memory ordering fixes in runtime code
Otimizations not yet contributed
- shorten branches & small constant pool (ppc only)
- Common subexpression elemination phase after matching (shared)
- round robin register allocation (shared)
- code scheduler for Power6 (ppc only)
- better heap placement for compressed Oops (shared)
- more aggressive matching of Decode nodes in unscaled compressed oops mode (shared)
- reduce memory barriers in card marking (shared)
Porting-Lessons learned
During the last year we've ported HotSpot to quite some new platforms and we learned t
- the HotSpot has a very steep learning curve
- there is not much documentation available
- defining concrete "encodings" (i.e. assembler instructions) for the abstract optimizer nodes
Reactivating and supporting currently unused code:
- C++Interpreter
Changes in shared code
- building a core (interpreter-only) VM
- generating SafeFetch stubs
- extending the ADLC (LateExpand nodes)
- configurable stack growth direction