Product Documentation
Abstract
This document provides performance data for the MASS scalar, SIMD, and vector libraries for Linux (Little Endian), in comparison with the standard system math library.
Content
The following tables provide approximate performance data for the MASS scalar, vector, and SIMD libraries running under Linux (Little Endian) on IBM System p® and Power Architecture® machines.
The columns labelled libm and mass list the results obtained with the libm.a system library and the libmass.a MASS scalar library, respectively. The columns labelled simdgen, simdp8, simdp9, simdp10 list the results obtained with the libmass_simd.a, libmass_simdp8.a, libmass_simdp9.a, and libmass_simdp10.a MASS SIMD libraries. This data was obtained by timing many repetitions of a loop over 1024 random arguments and includes all overheads.
The columns labelled vgen, vp8, vp9, and vp10 list the results obtained with the libmassv.a, libmassvp8.a, libmassvp9.a, and the libmassvp10.a MASS vector libraries, respectively. This data was obtained by timing many repetitions of a vector function call on a 1024-element vector of random arguments, and includes all overheads.The Function column shows the name of the basic mathematical function. The names of the corresponding libm and MASS functions can be determined from it. For example, for double precision, acos corresponds to libm and scalar MASS function acos, SIMD MASS function acosd2, and vector MASS function vacos. For single-precision, acos corresponds to libm and scalar MASS function acosf, SIMD MASS function acosf4, and vector MASS function vsacos.
The timing method used brings the input and output vectors into the on-chip cache (because the loop or vector length is short enough for the vectors to fit in the cache). Performance can deteriorate significantly when the input and output vectors are not in the cache. Performance can also deteriorate for arguments at or near the end-points of the valid argument ranges. The input and output vectors were aligned on 16-byte boundaries. Reduced performance can occur for other alignments.
The system library measurements were made with the versions of the library available on the test systems. They may vary from the versions timed for previous versions of MASS. Users may experience performance that differs from that found in these tables.
Results will vary with vector length. Entries in the table where the library function does not exist, or the measurement was not done, are left blank.
Timings are given in processor cycles per element, where an element refers to an individual single- or double-precision value (except for cosisin and sincos, where it refers to a pair of values).
The following range keys give the argument ranges over which the performance was measured. The range [0,1], for example, indicates that arguments in the range 0 <= x <= 1 were used. (Note that these are only the ranges over which performance was measured, and not the maximal valid ranges for the functions.)
Range Key
A [0,1]
B [-1,1]
C [0,100]
D [-100,100]
E [-10,10]
F [-20,20]
G [1,100]
H [-1,100]
I [0,10]
J [0,100] (1st argument), [-100,100] (2nd argument)
POWER10 Performance
POWER10 Performance -- double precision functions
(cycles per element, vector length 1024)
func range libm scalar simdgen simdp10 vgen vp10 scalar simdgen simdp10 vgen vp10
acosh G 145.33 35.93 24.40 23.58 6.87 6.87 4.04 5.96 6.16 21.15 21.15
asin B 58.88 28.78 17.68 16.60 13.50 12.10 2.05 3.33 3.55 4.36 4.87
asinh D 135.50 38.30 28.52 28.30 8.21 8.21 3.54 4.75 4.79 16.50 16.50
atan D 59.33 24.68 18.10 16.80 15.00 13.80 2.40 3.28 3.53 3.96 4.30
atan2 D 104.77 35.95 25.90 24.45 12.20 10.90 2.91 4.05 4.29 8.59 9.61
atanh B 96.13 35.40 18.48 17.93 7.81 7.40 2.72 5.20 5.36 12.31 12.99
cbrt D 70.98 21.55 15.10 14.80 7.28 6.46 3.29 4.70 4.80 9.75 10.99
cos D 104.80 23.10 17.23 17.50 5.41 5.22 4.54 6.08 5.99 19.37 20.08
cosh D 69.57 19.80 13.70 13.60 10.00 9.11 3.51 5.08 5.12 6.96 7.64
div D 5.30 5.34 1.88 1.84
erf C 25.62 9.97 13.92 10.45 6.09 5.06 2.57 1.84 2.45 4.21 5.06
erfc C 50.55 19.92 23.10 23.50 33.60 26.30 2.54 2.19 2.15 1.50 1.92
exp D 53.88 12.90 11.73 11.75 3.74 3.54 4.18 4.59 4.59 14.41 15.22
exp2 D 54.17 10.30 9.44 5.16 4.23 5.26 5.74 10.50 12.81
exp2m1 D 16.20 14.45 9.85 8.12
expm1 D 46.03 22.70 13.90 13.62 5.99 5.11 2.03 3.31 3.38 7.68 9.01
hypot D 37.93 13.15 10.60 10.60 3.40 3.40 2.88 3.58 3.58 11.16 11.16
lgamma H 103.23 49.77 33.58 29.65 20.90 18.20 2.07 3.07 3.48 4.94 5.67
log C 56.15 16.20 10.30 11.35 4.94 4.16 3.47 5.45 4.95 11.37 13.50
log10 C 87.18 16.23 10.28 10.10 4.95 4.16 5.37 8.48 8.63 17.61 20.96
log1p H 58.60 19.60 12.70 12.28 7.45 6.15 2.99 4.61 4.77 7.87 9.53
log2 C 40.17 14.52 13.50 8.72 7.73 2.77 2.98 4.61 5.20
log21p H 11.98 11.80 6.48 5.35
pow J 162.00 48.67 28.08 27.22 19.40 15.60 3.33 5.77 5.95 8.35 10.38
qdrt C 9.78 9.83 9.42 9.31
rcbrt D 15.92 14.82 12.20 12.00
rec D 1.16 1.20
rqdrt C 5.74 6.06 2.56 2.60
rsqrt C 14.93 4.21 4.34 1.50 1.50
sin D 93.22 24.10 16.38 16.67 5.50 5.32 3.87 5.69 5.59 16.95 17.52
sinh D 79.13 18.30 12.70 12.10 10.50 10.50 4.32 6.23 6.54 7.54 7.54
sqrt C 22.22 14.60 4.87 4.66 1.80 1.80 1.52 4.56 4.77 12.34 12.34
tan B 124.17 27.62 14.82 14.75 7.05 7.05 4.50 8.38 8.42 17.61 17.61
tanh F 76.82 26.23 19.20 19.20 9.62 9.10 2.93 4.00 4.00 7.99 8.44
geomean 3.06 4.39 4.54 8.65 9.55
POWER10 Performance -- single-precision functions
(cycles per element, vector length 1024)
================== cycles ================== libm/ libm/ libm/ libm/ libm/
Func Range libm scalar simdgen simdp10 vgen vp10 scalar simdgen simdp10 vgen vp10
acosh G 54.77 27.30 8.55 7.94 2.85 2.79 2.01 6.41 6.90 19.22 19.63
asin B 41.60 19.20 7.76 7.44 2.61 2.60 2.17 5.36 5.59 15.94 16.00
asinh D 51.57 32.43 11.30 10.80 3.60 3.65 1.59 4.56 4.78 14.33 14.13
atan D 42.90 20.80 5.16 4.85 1.73 1.61 2.06 8.31 8.85 24.80 26.65
atan2 D 121.50 35.95 7.21 6.54 2.51 2.43 3.38 16.85 18.58 48.41 50.00
atanh B 100.25 30.00 8.36 8.41 3.90 3.79 3.34 11.99 11.92 25.71 26.45
cbrt D 73.77 20.60 6.13 6.14 2.32 2.36 3.58 12.03 12.01 31.80 31.26
cos D 40.17 20.58 9.80 9.51 2.22 2.16 1.95 4.10 4.22 18.09 18.60
cosh D 48.08 19.05 7.56 6.82 2.31 2.28 2.52 6.36 7.05 20.81 21.09
div D 2.62 2.65 0.96 0.93
erf C 24.63 10.20 7.54 7.34 2.81 2.89 2.41 3.27 3.36 8.77 8.52
erfc C 49.53 13.95 20.73 17.75 8.84 7.50 3.55 2.39 2.79 5.60 6.60
exp D 36.72 14.25 6.49 5.72 1.96 2.00 2.58 5.66 6.42 18.73 18.36
exp2 D 15.90 14.03 5.90 5.34 1.80 1.85 1.13 2.69 2.98 8.83 8.59
exp2m1 D 5.41 4.61 1.68 1.66
expm1 D 42.82 13.75 5.98 5.08 1.79 1.82 3.11 7.16 8.43 23.92 23.53
hypot D 21.75 11.90 4.80 4.59 1.82 1.82 1.83 4.53 4.74 11.95 11.95
lgamma H 61.80 53.37 18.20 15.07 7.41 6.95 1.16 3.40 4.10 8.34 8.89
log C 17.25 16.15 7.13 6.63 2.34 2.36 1.07 2.42 2.60 7.37 7.31
log10 C 43.35 16.20 7.13 6.62 2.34 2.36 2.68 6.08 6.55 18.53 18.37
log1p H 65.42 19.20 6.62 5.98 2.08 2.10 3.41 9.88 10.94 31.45 31.15
log2 C 19.50 17.23 6.59 6.20 2.25 2.18 1.13 2.96 3.15 8.67 8.94
log21p H 6.12 5.88 2.05 1.94
pow J 45.72 34.90 15.80 15.60 18.80 18.00 1.31 2.89 2.93 2.43 2.54
qdrt C 2.32 2.31 1.01 1.03
rcbrt D 7.73 7.61 2.67 2.65
rec D 0.43 0.45
rqdrt C 2.15 2.15 0.94 0.93
rsqrt C 1.91 1.91 0.59 0.59
sin D 33.45 26.90 10.00 9.52 2.07 2.03 1.24 3.35 3.51 16.16 16.48
sinh D 62.07 19.10 8.69 8.89 3.74 3.71 3.25 7.14 6.98 16.60 16.73
sqrt C 14.20 13.30 1.86 1.75 0.74 0.74 1.07 7.63 8.11 19.14 19.19
tan B 107.67 27.55 12.10 11.90 4.95 4.85 3.91 8.90 9.05 21.75 22.20
tanh F 84.58 14.87 7.26 7.20 2.45 2.44 5.69 11.65 11.75 34.52 34.66
geomean 2.19 5.59 5.98 15.72 15.94
POWER9 Performance -- double precision functions
(cycles per element, vector length 1024)
====== speedup =====
========== cycles ========== libm/ libm/ libm/
Function Range libm mass simdp9 vp9 mass simdp9 vp9
===================================================================
acos B 94.98 64.23 41.54 26.61 1.48 2.29 3.57
acosh G 183.53 53.40 50.85 22.39 3.44 3.61 8.20
anint A - 17.72 - - - - -
asin B 97.63 62.97 44.93 26.63 1.55 2.17 3.67
asinh D 178.04 61.69 68.78 25.00 2.89 2.59 7.12
atan2 D 238.75 76.64 61.39 40.94 3.12 3.89 5.83
atan D 137.11 42.50 45.36 32.57 3.23 3.02 4.21
atanh B 128.56 55.86 38.17 24.14 2.30 3.37 5.33
cbrt D 103.65 37.25 29.54 15.79 2.78 3.51 6.56
copysign D 32.41 8.07 - - 4.02 - -
cos D 181.04 50.68 46.44 14.01 3.57 3.90 12.93
cos B 80.12 19.58 18.46 14.04 4.09 4.34 5.71
cosh D 86.66 49.09 30.49 22.41 1.77 2.84 3.87
cosisin D - 70.99 49.19 17.19 - - -
cosisin B - 33.44 23.37 17.21 - - -
div D - - 16.89 8.82 - - -
erfc C 70.40 37.54 56.48 65.61 1.88 1.25 1.07
erf C 34.63 20.62 38.53 17.06 1.68 0.90 2.03
exp2 D 97.96 - 22.84 13.95 - 4.29 7.02
exp2m1 D - - 29.60 20.58 - - -
exp D 83.09 21.68 26.00 12.18 3.83 3.20 6.82
expm1 D 81.26 48.60 29.85 16.53 1.67 2.72 4.91
hypot D 55.97 36.68 23.86 10.46 1.53 2.35 5.35
lgamma H 301.19 100.23 61.53 41.88 3.00 4.90 7.19
log10 C 136.15 38.21 28.82 16.52 3.56 4.72 8.24
log1p H 73.66 41.88 27.83 16.58 1.76 2.65 4.44
log21p H - - 24.96 14.49 - - -
log2 C 71.91 - 31.93 21.24 - 2.25 3.39
log C 113.75 38.20 28.45 16.53 2.98 4.00 6.88
pow J 189.41 133.48 85.54 63.04 1.42 2.21 3.00
qdrt C - - 26.67 19.53 - - -
rcbrt D - - 30.34 24.14 - - -
recip D - - 8.95 3.33 - - -
rqdrt C - - 16.30 7.14 - - -
rsqrt C - 29.13 11.70 4.64 - - -
sincos D - 66.60 50.62 15.69 - - -
sincos B - 28.73 22.99 15.73 - - -
sin D 181.58 51.73 43.72 14.80 3.51 4.15 12.27
sin B 97.68 21.11 18.88 14.84 4.63 5.17 6.58
sinh D 103.05 44.71 30.02 21.27 2.31 3.43 4.85
sqrt C 18.87 29.36 13.33 5.69 0.64 1.42 3.31
tan D 235.22 44.34 35.96 20.20 5.30 6.54 11.64
tan B 189.96 27.24 30.71 17.21 6.97 6.19 11.04
tanh F 118.09 60.80 42.17 23.59 1.94 2.80 5.01
POWER9 Performance -- single-precision functions
(cycles per element, vector length 1024)
====== speedup =====
========== cycles ========== libm/ libm/ libm/
Function Range libm mass simdp9 vp9 mass simdp9 vp9
===================================================================
acos B 76.26 45.30 14.00 6.75 1.68 5.45 11.30
acosh G 129.07 47.70 19.75 7.34 2.71 6.54 17.58
asin B 69.61 43.50 14.29 6.72 1.60 4.87 10.36
asinh D 122.57 53.69 25.94 8.79 2.28 4.73 13.95
atan2 D 161.18 90.47 12.94 6.33 1.78 12.46 25.47
atan D 59.65 50.35 10.02 4.89 1.18 5.95 12.21
atanh B 154.96 49.95 17.93 10.99 3.10 8.64 14.10
cbrt D 105.58 38.56 12.58 6.08 2.74 8.40 17.36
copysign D 34.98 13.17 - - 2.66 - -
cos D 139.51 60.90 14.16 5.84 2.29 9.85 23.88
cos B 71.18 16.67 14.21 5.90 4.27 5.01 12.06
cosh D 71.72 37.32 14.30 6.76 1.92 5.01 10.61
cosisin D - - 13.22 6.78 - - -
cosisin B - - 12.29 6.78 - - -
div D - - 6.51 2.70 - - -
erfc C 73.37 27.39 33.05 17.70 2.68 2.22 4.15
erf C 37.37 21.95 15.41 7.48 1.70 2.43 5.00
exp2 D 97.70 - 11.91 5.49 - 8.20 17.79
exp2m1 D - - 11.39 4.99 - - -
exp D 59.22 30.26 12.34 5.80 1.96 4.80 10.20
expm1 D 89.37 41.47 11.58 5.33 2.16 7.72 16.77
hypot D 32.32 24.04 10.82 4.88 1.34 2.99 6.63
lgamma H 253.97 156.32 28.36 15.78 1.62 8.95 16.10
log10 C 119.85 31.39 13.55 6.14 3.82 8.85 19.51
log1p H 96.14 72.11 12.91 5.91 1.33 7.44 16.27
log21p H - - 12.39 5.83 - - -
log2 C 76.49 - 13.03 5.81 - 5.87 13.16
log C 78.59 31.38 13.54 6.15 2.50 5.80 12.78
pow J 271.29 109.89 31.61 44.20 2.47 8.58 6.14
qdrt C - - 6.49 2.61 - - -
rcbrt D - - 14.30 6.82 - - -
recip D - - 2.97 1.28 - - -
rqdrt C - - 5.94 2.43 - - -
rsqrt C - - 3.94 1.66 - - -
sincos D - - 12.67 6.27 - - -
sincos B - - 12.73 6.26 - - -
sin D 59.91 61.45 13.42 5.47 0.97 4.46 10.95
sin B 37.53 21.93 13.48 5.53 1.71 2.78 6.79
sinh D 94.85 39.89 18.41 10.65 2.38 5.15 8.91
sqrt C 15.16 - 5.13 2.12 - 2.95 7.14
tan D 171.98 50.79 23.21 12.83 3.39 7.41 13.41
tan B 107.45 20.05 23.30 12.90 5.36 4.61 8.33
tanh F 142.69 48.97 14.28 6.75 2.91 9.99 21.13
POWER8 Performance
POWER8 Performance -- double precision functions
(cycles per element, vector length 1024)
====== speedup =====
========== cycles ========== libm/ libm/ libm/
Function Range libm mass simdp8 vp8 mass simdp8 vp8
===================================================================
acos B 111.91 56.54 36.05 27.56 1.98 3.10 4.06
acosh G 325.25 94.24 46.24 18.63 3.45 7.03 17.46
anint A - 19.49 - - - - -
asin B 114.50 55.49 39.69 27.59 2.06 2.89 4.15
asinh D 294.40 95.35 56.00 20.73 3.09 5.26 14.20
atan2 D 259.30 81.63 65.25 38.15 3.18 3.97 6.80
atan D 108.14 65.89 37.12 30.36 1.64 2.91 3.56
atanh B 173.98 77.77 35.18 20.30 2.24 4.95 8.57
cbrt D 225.12 48.32 32.50 14.72 4.66 6.93 15.29
copysign D 42.71 11.11 - - 3.84 - -
cos D 380.42 49.69 37.32 10.77 7.66 10.19 35.34
cos B 317.25 20.36 15.04 10.70 15.58 21.09 29.65
cosh D 342.38 44.62 26.13 20.82 7.67 13.10 16.44
cosisin D - 63.03 33.72 11.91 - - -
cosisin B - 33.84 20.41 11.98 - - -
div D - - 14.30 7.45 - - -
erfc C 220.13 45.18 51.91 61.19 4.87 4.24 3.60
erf C 69.03 23.85 32.63 16.77 2.89 2.12 4.12
exp2 D 279.70 - 23.85 13.84 - 11.73 20.20
exp2m1 D - - 35.10 19.89 - - -
exp D 334.48 30.62 23.78 11.12 10.92 14.06 30.07
expm1 D 138.01 49.05 27.45 15.57 2.81 5.03 8.87
hypot D 220.40 43.78 22.94 7.79 5.03 9.61 28.30
lgamma H 633.50 105.72 74.14 40.34 5.99 8.55 15.71
log10 C 290.30 36.76 27.05 14.28 7.90 10.73 20.32
log1p H 119.58 43.83 25.26 15.76 2.73 4.73 7.59
log21p H - - 24.68 13.62 - - -
log2 C - - 31.86 18.36 - - -
log C 228.86 36.89 26.86 14.27 6.20 8.52 16.04
pow J 389.52 118.37 77.52 56.18 3.29 5.02 6.93
qdrt C - - 21.41 20.36 - - -
rcbrt D - - 32.99 27.69 - - -
recip D - - 7.90 2.47 - - -
rqdrt C - - 12.91 5.26 - - -
rsqrt C - 37.23 8.98 3.17 - - -
sincos D - 57.95 39.08 12.06 - - -
sincos B - 29.24 19.85 12.01 - - -
sin D 388.78 51.80 36.29 11.36 7.50 10.71 34.22
sin B 302.70 21.25 15.78 11.30 14.24 19.18 26.79
sinh D 332.38 41.27 26.03 18.05 8.05 12.77 18.42
sqrt C 28.13 37.10 10.30 3.87 0.76 2.73 7.27
tan D 472.61 46.19 34.28 15.45 10.23 13.79 30.58
tan B 400.18 43.70 32.69 15.30 9.16 12.24 26.16
tanh F 224.74 55.07 38.48 20.89 4.08 5.84 10.76
POWER8 Performance -- single-precision functions
(cycles per element, vector length 1024)
====== speedup =====
========== cycles ========== libm/ libm/ libm/
Function Range libm mass simdp8 vp8 mass simdp8 vp8
===================================================================
acos B 95.16 40.58 16.50 6.61 2.35 5.77 14.4
acosh G 164.97 52.98 18.25 6.19 3.11 9.04 26.65
asin B 94.47 40.38 16.25 6.33 2.34 5.81 14.93
asinh D 158.47 55.66 22.33 7.29 2.85 7.09 21.73
atan2 D 191.34 85.53 14.46 5.78 2.24 13.24 33.11
atan D 72.61 47.25 10.22 3.84 1.54 7.11 18.9
atanh B 179.43 56.24 17.38 8.99 3.19 10.33 19.95
cbrt D 178.71 45.32 11.76 5.54 3.94 15.19 32.27
copysign D 42.26 13.10 - - 3.23 - -
cos D 152.64 50.16 13.00 5.27 3.04 11.74 28.95
cos B 70.22 17.22 13.00 5.27 4.08 5.4 13.32
cosh D 289.26 40.45 16.50 5.10 7.15 17.54 56.72
cosisin D - - 13.90 5.72 - - -
cosisin B - - 13.84 5.68 - - -
div D - - 5.30 1.95 - - -
erfc C 153.28 29.65 45.63 18.75 5.17 3.36 8.18
erf C 61.76 22.96 15.31 7.21 2.69 4.03 8.56
exp2 D 271.28 - 12.25 4.34 - 22.14 62.52
exp2m1 D - - 10.96 3.92 - - -
exp D 312.53 30.69 13.84 4.53 10.18 22.57 68.94
expm1 D 130.19 37.03 12.69 4.38 3.52 10.26 29.72
hypot D 83.99 29.5 9.96 3.61 2.85 8.43 23.28
lgamma H 332.99 146.75 32.13 16.31 2.27 10.36 20.42
log10 C 157.29 33.9 14.49 5.76 4.64 10.85 27.30
log1p H 117.1 69.99 13.38 4.77 1.67 8.75 24.55
log21p H - - 12.26 5.02 - - -
log2 C 115.33 - 13.49 5.76 - 8.55 20.02
log C 103.72 33.82 14.4 5.71 3.07 7.2 18.17
pow J 418.22 101.55 27.27 43.29 4.12 15.33 9.66
qdrt C - - 4.86 1.92 - - -
rcbrt D - - 15.06 6.19 - - -
recip D - - 2.80 0.92 - - -
rqdrt C - - 4.48 1.77 - - -
rsqrt C - - 2.93 1.16 - - -
sincos D - - 13.62 5.67 - - -
sincos B - - 13.48 5.64 - - -
sin D 151.57 51.93 12.04 4.69 2.92 12.59 32.32
sin B 61.35 23.57 12.02 4.68 2.6 5.10 13.11
sinh D 290.02 44.54 17.25 8.65 6.51 16.82 33.51
sqrt C 19.61 - 3.75 1.55 - 5.23 12.62
tan D 207.41 49.28 25.45 10.68 4.21 8.15 19.43
tan B 128.40 27.44 25.44 10.67 4.68 5.05 12.03
tanh F 236.01 40.82 16.11 5.22 5.78 14.65 45.25
Was this topic helpful?
Document Information
Modified date:
27 July 2022
UID
swg27049073