Skip to content

Latest commit

 

History

History
239 lines (214 loc) · 10.3 KB

README_ARM.rst

File metadata and controls

239 lines (214 loc) · 10.3 KB

ARM enviroment for Ubuntu Intel

In order to create executable code for a platform other than the one on which the compiler is running we can use cross compilers. In this case we want to create executables for ARM platforms but execute them on Intel platform with Ubuntu. The only package that is necessary (at least in Ubuntu 15.04) is gcc-arm-linux-gnueabihf.

Once this is installed the way to compile an example is:

$ arm-linux-gnueabihf-gcc -static  -mfpu=neon -flax-vector-conversions -DSHUFFLE_NEON_ENABLED -O3 simple.c ../blosc/blosc.c ../blosc/blosclz.c ../blosc/shuffle.c ../blosc/shuffle-generic.c ../blosc/bitshuffle-generic.c ../blosc/shuffle-neon.c ../blosc/bitshuffle-neon.c -I../blosc -o simple -lpthread

Another example for runnig a bench with the cross compliler:

$ arm-linux-gnueabihf-gcc -static  -mfpu=neon -flax-vector-conversions -DSHUFFLE_NEON_ENABLED -O3 bench.c  ../blosc/blosc.c ../blosc/blosclz.c ../blosc/shuffle.c ../blosc/shuffle-generic.c ../blosc/bitshuffle-generic.c ../blosc/shuffle-neon.c ../blosc/bitshuffle-neon.c -I../blosc -o bench -lpthread

In these cases the NEON flags are: -mfpu=neon -flax-vector-conversions

This is explained in detail in: http://linux-sunxi.org/Toolchain

This way you can develop and debug aplications for ARM on intel machines as if you were in ARM platforms.

Benchmark for ODROID-XU3

This is a benchmark to compare the speeds between the NEON and the generic implementation.

CPU: ARMv7 Processor rev 3 (v7l)
Compiler: gcc, version gcc-4.8.real
Optimizations: -O3
OS: Ubuntu 14.04 trusty
  • NEON implementation results:
Blosc version: 2.0.0a1 ($Date:: 2015-07-30 #$)
List of supported compressors in this build: blosclz
Supported compression libraries:
  BloscLZ: 1.0.5
  LZ4: unknown
  Snappy: unknown
  Zlib: unknown
Using compressor: blosclz
Using shuffle type: shuffle
Running suite: suite
--> 1, 2097152, 8, 19, blosclz, shuffle
********************** Run info ******************************
Blosc version: 2.0.0a1 ($Date:: 2015-07-30 #$)
Using synthetic data with 19 significant bits (out of 32)
Dataset size: 2097152 bytes Type size: 8 bytes
Working set: 64.0 MB                Number of threads: 1
********************** Running benchmarks *********************
memcpy(write):               2265.1 us, 883.0 MB/s
memcpy(read):                1196.9 us, 1671.0 MB/s
Compression level: 0
comp(write):          984.8 us, 2030.9 MB/s   Final bytes: 2097168  Ratio: 1.00
decomp(read):        1202.8 us, 1662.8 MB/s   OK
Compression level: 1
comp(write):         5863.7 us, 341.1 MB/s    Final bytes: 584976  Ratio: 3.59
decomp(read):        1014.1 us, 1972.3 MB/s   OK
Compression level: 2
comp(write):         6229.8 us, 321.0 MB/s    Final bytes: 584976  Ratio: 3.59
decomp(read):        1013.4 us, 1973.5 MB/s   OK
Compression level: 3
comp(write):         6603.8 us, 302.9 MB/s    Final bytes: 584976  Ratio: 3.59
decomp(read):        1012.9 us, 1974.4 MB/s   OK
Compression level: 4
comp(write):         6792.6 us, 294.4 MB/s    Final bytes: 557840  Ratio: 3.76
decomp(read):         983.5 us, 2033.6 MB/s   OK
Compression level: 5
comp(write):         8598.3 us, 232.6 MB/s    Final bytes: 557840  Ratio: 3.76
decomp(read):         983.4 us, 2033.9 MB/s   OK
Compression level: 6
comp(write):         9866.2 us, 202.7 MB/s    Final bytes: 546320  Ratio: 3.84
decomp(read):        1079.1 us, 1853.3 MB/s   OK
Compression level: 7
comp(write):         9334.9 us, 214.2 MB/s    Final bytes: 216528  Ratio: 9.69
decomp(read):        1959.4 us, 1020.7 MB/s   OK
Compression level: 8
comp(write):         9221.1 us, 216.9 MB/s    Final bytes: 216528  Ratio: 9.69
decomp(read):        1972.1 us, 1014.2 MB/s   OK
Compression level: 9
comp(write):         8452.0 us, 236.6 MB/s    Final bytes: 153160  Ratio: 13.69
decomp(read):        2780.0 us, 719.4 MB/s    OK
--> 2, 2097152, 8, 19, blosclz, shuffle
********************** Run info ******************************
Blosc version: 2.0.0a1 ($Date:: 2015-07-30 #$)
Using synthetic data with 19 significant bits (out of 32)
Dataset size: 2097152 bytes Type size: 8 bytes
Working set: 64.0 MB                Number of threads: 2
********************** Running benchmarks *********************
memcpy(write):               2258.0 us, 885.8 MB/s
memcpy(read):                1194.9 us, 1673.7 MB/s
Compression level: 0
comp(write):          831.2 us, 2406.0 MB/s   Final bytes: 2097168  Ratio: 1.00
decomp(read):        1162.6 us, 1720.3 MB/s   OK
Compression level: 1
comp(write):         2975.1 us, 672.2 MB/s    Final bytes: 584976  Ratio: 3.59
decomp(read):         738.4 us, 2708.4 MB/s   OK
Compression level: 2
comp(write):         3156.8 us, 633.6 MB/s    Final bytes: 584976  Ratio: 3.59
decomp(read):         738.7 us, 2707.5 MB/s   OK
Compression level: 3
comp(write):         3347.0 us, 597.6 MB/s    Final bytes: 584976  Ratio: 3.59
decomp(read):         749.6 us, 2668.1 MB/s   OK
Compression level: 4
comp(write):         3486.2 us, 573.7 MB/s    Final bytes: 557840  Ratio: 3.76
decomp(read):         745.1 us, 2684.4 MB/s   OK
Compression level: 5
comp(write):         4488.5 us, 445.6 MB/s    Final bytes: 557840  Ratio: 3.76
decomp(read):         725.7 us, 2755.9 MB/s   OK
Compression level: 6
comp(write):         4998.5 us, 400.1 MB/s    Final bytes: 546320  Ratio: 3.84
decomp(read):         796.8 us, 2510.2 MB/s   OK
Compression level: 7
comp(write):         4780.3 us, 418.4 MB/s    Final bytes: 216528  Ratio: 9.69
decomp(read):        1383.9 us, 1445.1 MB/s   OK
Compression level: 8
comp(write):         4778.6 us, 418.5 MB/s    Final bytes: 216528  Ratio: 9.69
decomp(read):        1398.0 us, 1430.6 MB/s   OK
Compression level: 9
comp(write):         5884.6 us, 339.9 MB/s    Final bytes: 153160  Ratio: 13.69
decomp(read):        2647.7 us, 755.4 MB/s    OK

Round-trip compr/decompr on 3.8 GB
Elapsed time:          13.9 s, 609.1 MB/s
  • Generic implementation results:
Blosc version: 2.0.0a1 ($Date:: 2015-07-30 #$)
List of supported compressors in this build: blosclz
Supported compression libraries:
  BloscLZ: 1.0.5
  LZ4: unknown
  Snappy: unknown
  Zlib: unknown
Using compressor: blosclz
Using shuffle type: shuffle
Running suite: suite
--> 1, 2097152, 8, 19, blosclz, shuffle
********************** Run info ******************************
Blosc version: 2.0.0a1 ($Date:: 2015-07-30 #$)
Using synthetic data with 19 significant bits (out of 32)
Dataset size: 2097152 bytes Type size: 8 bytes
Working set: 64.0 MB                Number of threads: 1
********************** Running benchmarks *********************
memcpy(write):               2194.1 us, 911.5 MB/s
memcpy(read):                1170.8 us, 1708.2 MB/s
Compression level: 0
comp(write):          896.2 us, 2231.7 MB/s   Final bytes: 2097168  Ratio: 1.00
decomp(read):        1179.3 us, 1695.9 MB/s   OK
Compression level: 1
comp(write):         7534.4 us, 265.4 MB/s    Final bytes: 584976  Ratio: 3.59
decomp(read):        4117.1 us, 485.8 MB/s    OK
Compression level: 2
comp(write):         7895.6 us, 253.3 MB/s    Final bytes: 584976  Ratio: 3.59
decomp(read):        4106.7 us, 487.0 MB/s    OK
Compression level: 3
comp(write):         8262.2 us, 242.1 MB/s    Final bytes: 584976  Ratio: 3.59
decomp(read):        4113.9 us, 486.2 MB/s    OK
Compression level: 4
comp(write):         8495.6 us, 235.4 MB/s    Final bytes: 557840  Ratio: 3.76
decomp(read):        4042.6 us, 494.7 MB/s    OK
Compression level: 5
comp(write):         10321.0 us, 193.8 MB/s   Final bytes: 557840  Ratio: 3.76
decomp(read):        4033.9 us, 495.8 MB/s    OK
Compression level: 6
comp(write):         11675.3 us, 171.3 MB/s   Final bytes: 546320  Ratio: 3.84
decomp(read):        4096.4 us, 488.2 MB/s    OK
Compression level: 7
comp(write):         10193.0 us, 196.2 MB/s   Final bytes: 216528  Ratio: 9.69
decomp(read):        7150.9 us, 279.7 MB/s    OK
Compression level: 8
comp(write):         10192.3 us, 196.2 MB/s   Final bytes: 216528  Ratio: 9.69
decomp(read):        7167.7 us, 279.0 MB/s    OK
Compression level: 9
comp(write):         10418.7 us, 192.0 MB/s   Final bytes: 153160  Ratio: 13.69
decomp(read):        7870.8 us, 254.1 MB/s    OK
--> 2, 2097152, 8, 19, blosclz, shuffle
********************** Run info ******************************
Blosc version: 2.0.0a1 ($Date:: 2015-07-30 #$)
Using synthetic data with 19 significant bits (out of 32)
Dataset size: 2097152 bytes Type size: 8 bytes
Working set: 64.0 MB                Number of threads: 2
********************** Running benchmarks *********************
memcpy(write):               2243.1 us, 891.6 MB/s
memcpy(read):                1219.9 us, 1639.4 MB/s
Compression level: 0
comp(write):          846.8 us, 2361.9 MB/s   Final bytes: 2097168  Ratio: 1.00
decomp(read):        1182.1 us, 1691.9 MB/s   OK
Compression level: 1
comp(write):         3867.6 us, 517.1 MB/s    Final bytes: 584976  Ratio: 3.59
decomp(read):        2162.8 us, 924.7 MB/s    OK
Compression level: 2
comp(write):         4054.2 us, 493.3 MB/s    Final bytes: 584976  Ratio: 3.59
decomp(read):        2156.6 us, 927.4 MB/s    OK
Compression level: 3
comp(write):         4241.2 us, 471.6 MB/s    Final bytes: 584976  Ratio: 3.59
decomp(read):        2169.6 us, 921.8 MB/s    OK
Compression level: 4
comp(write):         4377.1 us, 456.9 MB/s    Final bytes: 557840  Ratio: 3.76
decomp(read):        7556.9 us, 264.7 MB/s    OK
Compression level: 5
comp(write):         5276.3 us, 379.1 MB/s    Final bytes: 557840  Ratio: 3.76
decomp(read):        7556.7 us, 264.7 MB/s    OK
Compression level: 6
comp(write):         6026.8 us, 331.9 MB/s    Final bytes: 546320  Ratio: 3.84
decomp(read):        3108.2 us, 643.5 MB/s    OK
Compression level: 7
comp(write):         5877.0 us, 340.3 MB/s    Final bytes: 216528  Ratio: 9.69
decomp(read):        3668.5 us, 545.2 MB/s    OK
Compression level: 8
comp(write):         5882.0 us, 340.0 MB/s    Final bytes: 216528  Ratio: 9.69
decomp(read):        3531.2 us, 566.4 MB/s    OK
Compression level: 9
comp(write):         7621.3 us, 262.4 MB/s    Final bytes: 153160  Ratio: 13.69
decomp(read):        4978.3 us, 401.7 MB/s    OK

Round-trip compr/decompr on 3.8 GB
Elapsed time:          21.9 s, 385.0 MB/s

We have achieved to implement shuffle NEON instructions for ARM that are twice as fast as the generic implementation.

Enjoy developing for ARM!

Lucian Marc