After the initial OpenCL programs
clinfo
helloworld
helloworld2
on my Nexus 4 phone, I am trying to use some existing Java-OpenCL frameworks (like JOCL, JavCL, OpenCL4Java and aparapi) for the Android based OpenCL developement.
So aftersome initial studies, fixed on AMD's aparapi and started porting the X86/X86_64 OpenCL-Java framework to any Android-based phone with OpenCL implementation (on GPU).
As part of helloworld2, I have mapped all OpenCL API methods using generic way irrespective of the underlying OpenCL SDK vendor.
Now I ventured to port aparapi from X86/X86_64 to ARM (on android ). To simplify the porting exercise, I took svn release version 68 as starting point for aparapi project.
Need to hack following of the code to accommodate Qualcomm OpenCL SDK on the Android on ARM:
So after all these major changes, the OpenCL program totally written in java worked on my Nexus 4. Once I get time, I will cleanup code and submit my code to my site
Here is the log from my Nexus 4 for this whole exercise (with generated OpenCL in logs):
I/System.out( 2379): getClassBytes Needed for :org/aopencl/aparapi/MyKernel.class , res-id:7f040000
I/System.out( 2379): getClassBytes:java.util.zip.ZipEntry, getting bytes for:org/aopencl/aparapi/MyKernel.class
D/dalvikvm( 2379): GC_CONCURRENT freed 394K, 5% free 9008K/9444K, paused 2ms+3ms, total 16ms
I/System.out( 2379): Got :org/aopencl/aparapi/MyKernel.class, size:-1
I/System.out( 2379): extractBytes:710
D/aopencl ( 2379): Trying shared libraries at following locations:
D/aopencl ( 2379): /system/lib/libOpenCL.so
D/aopencl ( 2379): /system/vendor/lib/egl/libGLES_mali.so
D/aopencl ( 2379): /system/lib/libllvm-a3xx.so
D/aopencl ( 2379): Using the Shared library:/system/lib/libOpenCL.so
E/aparapi ( 2379): platform 0 QUALCOMM
E/aparapi ( 2379): CL_DEVICE_MAX_COMPUTE_UNITS 4
E/aparapi ( 2379): CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS 3
E/aparapi ( 2379): CL_DEVICE_MAX_WORK_GROUP_SIZE 256
E/aparapi ( 2379): CL_DEVICE_GLOBAL_MEM_SIZE 1910623136
E/aparapi ( 2379): CL_DEVICE_LOCAL_MEM_SIZE 1910623168
E/aparapi ( 2379): device[0x722d6d50]: Type:
E/aparapi ( 2379): GPU
E/aparapi ( 2379):
E/aparapi ( 2379):
E/aparapi ( 2379): JNIContext 7289b008
E/aparapi ( 2379): JNIContext valid:1
W/aparapi ( 2379): getExtensions JNIContext :7289b008
W/aparapi ( 2379): getExtensions2 JNIContext :cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_qcom_default_relaxed_math cl_qcom_perf_monitor
D/KernelRunner( 2379): Capabilities initialized to :[cl_qcom_perf_monitor, cl_khr_byte_addressable_store, cl_qcom_default_relaxed_math, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_khr_global_int32_extended_atomics, cl_khr_fp16]
D/KernelRunner( 2379): passed requiresDoublePragma
D/KernelRunner( 2379): passed requiresByteAddressableStorePragma
D/KernelRunner( 2379): passed requiresAtomic32Pragma
D/KernelRunner( 2379): Gen-OpenCL:typedef struct This_s{
D/KernelRunner( 2379): __global int *squares;
D/KernelRunner( 2379): __global int *values;
D/KernelRunner( 2379): int passid;
D/KernelRunner( 2379): }This;
D/KernelRunner( 2379): int get_pass_id(This *this){
D/KernelRunner( 2379): return this->passid;
D/KernelRunner( 2379): }
D/KernelRunner( 2379): __kernel void run(
D/KernelRunner( 2379): __global int *squares,
D/KernelRunner( 2379): __global int *values,
D/KernelRunner( 2379): int passid
D/KernelRunner( 2379): ){
D/KernelRunner( 2379): This thisStruct;
D/KernelRunner( 2379): This* this=&thisStruct;
D/KernelRunner( 2379): this->squares = squares;
D/KernelRunner( 2379): this->values = values;
D/KernelRunner( 2379): this->passid = passid;
D/KernelRunner( 2379): {
D/KernelRunner( 2379): int gid = get_global_id(0);
D/KernelRunner( 2379): this->squares[gid] = this->values[gid] + (this->values[gid] / 2);
D/KernelRunner( 2379): return;
D/KernelRunner( 2379): }
D/KernelRunner( 2379): }
E/aparapi ( 2379): clBuildProgram start clBuildProgram-ptr:722c533d, cl-prgm-ptr:735e4b38, dev-idc:1, devs:735d6ac8
E/aparapi ( 2379): clBuildProgram end:0
D/KernelRunner( 2379): passed buildProgramJNI
D/KernelRunner( 2379): arg 0, squares, type=1688, primitiveSize=4
D/KernelRunner( 2379): arg 1, values, type=1288, primitiveSize=4
E/aparapi ( 2379): in setArgs arg 0 squares type 00001688
E/aparapi ( 2379): in setArgs arg 1 values type 00001288
D/KernelRunner( 2379): saw newArrayRef for squares = [I@4248aa40, newArrayLen = 32
D/KernelRunner( 2379): saw newArrayRef for values = [I@4248a9a8, newArrayLen = 32
E/aparapi ( 2379): for globalSize=32, stepping localSize from 32, returning localSize=32
D/KernelRunner( 2379): Need to resync arrays on org.aopencl.aparapi.MyKernel
E/aparapi ( 2379): got type for squares: 00001688
E/aparapi ( 2379): testing for Resync javaArray squares: old=0x0, new=0x2e900005
E/aparapi ( 2379): Resync javaArray for squares: 0x2e900005 0x0
E/aparapi ( 2379): NewWeakGlobalRef for squares, set to 0x1d600003
E/aparapi ( 2379): updateKernel, args[0].sizeInBytes=128
E/aparapi ( 2379): got type for values: 00001288
E/aparapi ( 2379): testing for Resync javaArray values: old=0x0, new=0x1e300009
E/aparapi ( 2379): Resync javaArray for values: 0x1e300009 0x0
E/aparapi ( 2379): NewWeakGlobalRef for values, set to 0x1d600007
E/aparapi ( 2379): updateKernel, args[1].sizeInBytes=128
E/aparapi ( 2379): back from updateKernel
E/aparapi ( 2379): got type for arg 0, squares, type=00001688
E/aparapi ( 2379): runKernel: arrayOrBuf ref 0x1d600003, oldAddr=0x0, newAddr=0x4248aa50, ref.mem=0x0, isArray=1
E/aparapi ( 2379): at memory addr 0x4248aa50, contents:
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379):
D/aparapi ( 2379): squares 0 clCreateBuffer(context, CL_MEM_USE_HOST_PTR|CL_MEM_READ_WRITE, size=00000080 bytes, address=4248aa50, &status)
E/aparapi ( 2379): writing buffer 0 squares
E/aparapi ( 2379): got type for arg 1, values, type=00001288
E/aparapi ( 2379): runKernel: arrayOrBuf ref 0x1d600007, oldAddr=0x0, newAddr=0x4248a9b8, ref.mem=0x0, isArray=1
E/aparapi ( 2379): at memory addr 0x4248a9b8, contents:
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 01
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379):
D/aparapi ( 2379): values 1 clCreateBuffer(context, CL_MEM_USE_HOST_PTR|CL_MEM_READ_ONLY, size=00000080 bytes, address=4248a9b8, &status)
E/aparapi ( 2379): writing buffer 1 values
E/aparapi ( 2379): reading buffer 0 squares
D/KernelRunner( 2379): executeOpenCL completed. _globalSize=32
I/System.out( 2379): Execution mode=GPU
clinfo
helloworld
helloworld2
on my Nexus 4 phone, I am trying to use some existing Java-OpenCL frameworks (like JOCL, JavCL, OpenCL4Java and aparapi) for the Android based OpenCL developement.
So aftersome initial studies, fixed on AMD's aparapi and started porting the X86/X86_64 OpenCL-Java framework to any Android-based phone with OpenCL implementation (on GPU).
As part of helloworld2, I have mapped all OpenCL API methods using generic way irrespective of the underlying OpenCL SDK vendor.
Now I ventured to port aparapi from X86/X86_64 to ARM (on android ). To simplify the porting exercise, I took svn release version 68 as starting point for aparapi project.
Need to hack following of the code to accommodate Qualcomm OpenCL SDK on the Android on ARM:
- Need to use reflection to get sun.misc.Unsafe.
- Not all Unsafe methods are available on Android version
- So no float, double, boolean and their corresponding array versions
- Apache Unsafe internal field names are different than Sun's Unsafe
- Removed some hard codings on JNI code, so that it works for QUALCOMM SDK also
- QualComm SDK crashes the Dalvik-VM, if the OpenCL-Kernel is in inner class (and field names contains '$')
- Current AMD OpenCL code generation depends on class-files availability on classpath. In Android it is dex-file, so need to package the original classpath also into apk file and access it via raw resource
So after all these major changes, the OpenCL program totally written in java worked on my Nexus 4. Once I get time, I will cleanup code and submit my code to my site
Here is the log from my Nexus 4 for this whole exercise (with generated OpenCL in logs):
I/System.out( 2379): getClassBytes Needed for :org/aopencl/aparapi/MyKernel.class , res-id:7f040000
I/System.out( 2379): getClassBytes:java.util.zip.ZipEntry, getting bytes for:org/aopencl/aparapi/MyKernel.class
D/dalvikvm( 2379): GC_CONCURRENT freed 394K, 5% free 9008K/9444K, paused 2ms+3ms, total 16ms
I/System.out( 2379): Got :org/aopencl/aparapi/MyKernel.class, size:-1
I/System.out( 2379): extractBytes:710
D/aopencl ( 2379): Trying shared libraries at following locations:
D/aopencl ( 2379): /system/lib/libOpenCL.so
D/aopencl ( 2379): /system/vendor/lib/egl/libGLES_mali.so
D/aopencl ( 2379): /system/lib/libllvm-a3xx.so
D/aopencl ( 2379): Using the Shared library:/system/lib/libOpenCL.so
E/aparapi ( 2379): platform 0 QUALCOMM
E/aparapi ( 2379): CL_DEVICE_MAX_COMPUTE_UNITS 4
E/aparapi ( 2379): CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS 3
E/aparapi ( 2379): CL_DEVICE_MAX_WORK_GROUP_SIZE 256
E/aparapi ( 2379): CL_DEVICE_GLOBAL_MEM_SIZE 1910623136
E/aparapi ( 2379): CL_DEVICE_LOCAL_MEM_SIZE 1910623168
E/aparapi ( 2379): device[0x722d6d50]: Type:
E/aparapi ( 2379): GPU
E/aparapi ( 2379):
E/aparapi ( 2379):
E/aparapi ( 2379): JNIContext 7289b008
E/aparapi ( 2379): JNIContext valid:1
W/aparapi ( 2379): getExtensions JNIContext :7289b008
W/aparapi ( 2379): getExtensions2 JNIContext :cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_qcom_default_relaxed_math cl_qcom_perf_monitor
D/KernelRunner( 2379): Capabilities initialized to :[cl_qcom_perf_monitor, cl_khr_byte_addressable_store, cl_qcom_default_relaxed_math, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_khr_global_int32_extended_atomics, cl_khr_fp16]
D/KernelRunner( 2379): passed requiresDoublePragma
D/KernelRunner( 2379): passed requiresByteAddressableStorePragma
D/KernelRunner( 2379): passed requiresAtomic32Pragma
D/KernelRunner( 2379): Gen-OpenCL:typedef struct This_s{
D/KernelRunner( 2379): __global int *squares;
D/KernelRunner( 2379): __global int *values;
D/KernelRunner( 2379): int passid;
D/KernelRunner( 2379): }This;
D/KernelRunner( 2379): int get_pass_id(This *this){
D/KernelRunner( 2379): return this->passid;
D/KernelRunner( 2379): }
D/KernelRunner( 2379): __kernel void run(
D/KernelRunner( 2379): __global int *squares,
D/KernelRunner( 2379): __global int *values,
D/KernelRunner( 2379): int passid
D/KernelRunner( 2379): ){
D/KernelRunner( 2379): This thisStruct;
D/KernelRunner( 2379): This* this=&thisStruct;
D/KernelRunner( 2379): this->squares = squares;
D/KernelRunner( 2379): this->values = values;
D/KernelRunner( 2379): this->passid = passid;
D/KernelRunner( 2379): {
D/KernelRunner( 2379): int gid = get_global_id(0);
D/KernelRunner( 2379): this->squares[gid] = this->values[gid] + (this->values[gid] / 2);
D/KernelRunner( 2379): return;
D/KernelRunner( 2379): }
D/KernelRunner( 2379): }
E/aparapi ( 2379): clBuildProgram start clBuildProgram-ptr:722c533d, cl-prgm-ptr:735e4b38, dev-idc:1, devs:735d6ac8
E/aparapi ( 2379): clBuildProgram end:0
D/KernelRunner( 2379): passed buildProgramJNI
D/KernelRunner( 2379): arg 0, squares, type=1688, primitiveSize=4
D/KernelRunner( 2379): arg 1, values, type=1288, primitiveSize=4
E/aparapi ( 2379): in setArgs arg 0 squares type 00001688
E/aparapi ( 2379): in setArgs arg 1 values type 00001288
D/KernelRunner( 2379): saw newArrayRef for squares = [I@4248aa40, newArrayLen = 32
D/KernelRunner( 2379): saw newArrayRef for values = [I@4248a9a8, newArrayLen = 32
E/aparapi ( 2379): for globalSize=32, stepping localSize from 32, returning localSize=32
D/KernelRunner( 2379): Need to resync arrays on org.aopencl.aparapi.MyKernel
E/aparapi ( 2379): got type for squares: 00001688
E/aparapi ( 2379): testing for Resync javaArray squares: old=0x0, new=0x2e900005
E/aparapi ( 2379): Resync javaArray for squares: 0x2e900005 0x0
E/aparapi ( 2379): NewWeakGlobalRef for squares, set to 0x1d600003
E/aparapi ( 2379): updateKernel, args[0].sizeInBytes=128
E/aparapi ( 2379): got type for values: 00001288
E/aparapi ( 2379): testing for Resync javaArray values: old=0x0, new=0x1e300009
E/aparapi ( 2379): Resync javaArray for values: 0x1e300009 0x0
E/aparapi ( 2379): NewWeakGlobalRef for values, set to 0x1d600007
E/aparapi ( 2379): updateKernel, args[1].sizeInBytes=128
E/aparapi ( 2379): back from updateKernel
E/aparapi ( 2379): got type for arg 0, squares, type=00001688
E/aparapi ( 2379): runKernel: arrayOrBuf ref 0x1d600003, oldAddr=0x0, newAddr=0x4248aa50, ref.mem=0x0, isArray=1
E/aparapi ( 2379): at memory addr 0x4248aa50, contents:
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379):
D/aparapi ( 2379): squares 0 clCreateBuffer(context, CL_MEM_USE_HOST_PTR|CL_MEM_READ_WRITE, size=00000080 bytes, address=4248aa50, &status)
E/aparapi ( 2379): writing buffer 0 squares
E/aparapi ( 2379): got type for arg 1, values, type=00001288
E/aparapi ( 2379): runKernel: arrayOrBuf ref 0x1d600007, oldAddr=0x0, newAddr=0x4248a9b8, ref.mem=0x0, isArray=1
E/aparapi ( 2379): at memory addr 0x4248a9b8, contents:
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 01
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379):
D/aparapi ( 2379): values 1 clCreateBuffer(context, CL_MEM_USE_HOST_PTR|CL_MEM_READ_ONLY, size=00000080 bytes, address=4248a9b8, &status)
E/aparapi ( 2379): writing buffer 1 values
E/aparapi ( 2379): reading buffer 0 squares
D/KernelRunner( 2379): executeOpenCL completed. _globalSize=32
I/System.out( 2379): Execution mode=GPU