Using the SIMD libraries

The MASS SIMD library contains a set of frequently used math intrinsic functions that provide improved performance over the corresponding standard system library functions.

The SIMD libraries shipped with IBM® Open XL C/C++ are listed below:
libmass_simd.a
The generic SIMD library that runs on any supported POWER® processor. It provides balanced performance tuning across the range of processors while favoring Power9 and Power10. Unless your application requires this portability, use the appropriate architecture-specific library below for maximum performance.
libmass_simdp7.a
Contains functions that are tuned for the POWER7 architecture.
libmass_simdp8.a
Contains functions that are tuned for the POWER8 architecture.
libmass_simdp9.a
Contains functions that are tuned for the POWER9 architecture.
libmass_simdp10.a
Contains functions that are tuned for the Power10 architecture.
libmass_simdp11.a
Contains functions that are tuned for the Power11 architecture.
If you want to use the MASS SIMD functions, follow the procedure below:
  1. Provide the prototypes for the functions by including mass_simd.h in your source files.
  2. Choose the appropriate MASS SIMD library above and link it with your application. For instructions, see Compiling and linking a program with MASS.

The single-precision MASS SIMD functions accept single-precision arguments and return single-precision results. Likewise, the double-precision MASS SIMD functions accept double-precision arguments and return double-precision results. They are summarized in Table 1.

Table 1. MASS SIMD functions
Double-precision function Single-precision function Description Double-precision function prototype Single-precision function prototype
acosd2 acosf4 Computes the arc cosine of each element of vx. vector double acosd2 (vector double vx); vector float acosf4 (vector float vx);
acoshd2 acoshf4 Computes the arc hyperbolic cosine of each element of vx. vector double acoshd2 (vector double vx); vector float acoshf4 (vector float vx);
asind2 asinf4 Computes the arc sine of each element of vx. vector double asind2 (vector double vx); vector float asinf4 (vector float vx);
asinhd2 asinhf4 Computes the arc hyperbolic sine of each element of vx. vector double asinhd2 (vector double vx); vector float asinhf4 (vector float vx);
atand2 atanf4 Computes the arc tangent of each element of vx. vector double atand2 (vector double vx); vector float atanf4 (vector float vx);
atan2d2 atan2f4 Computes the arc tangent of each element of vx/vy. vector double atan2d2 (vector double vx, vector double vy); vector float atan2f4 (vector float vx, vector float vy);
atanhd2 atanhf4 Computes the arc hyperbolic tangent of each element of vx. vector double atanhd2 (vector double vx); vector float atanhf4 (vector float vx);
cbrtd2 cbrtf4 Computes the cube root of each element of vx. vector double cbrtd2 (vector double vx); vector float cbrtf4 (vector float vx);
cosd2 cosf4 Computes the cosine of each element of vx. vector double cosd2 (vector double vx); vector float cosf4 (vector float vx);
coshd2 coshf4 Computes the hyperbolic cosine of each element of vx. vector double coshd2 (vector double vx); vector float coshf4 (vector float vx);
cosisind2 cosisinf4 Computes the cosine and sine of each element of x, and stores the results in y and z as follows:

cosisind2 (x,y,z) sets y and z to {cos(x1), sin(x1)} and {cos(x2), sin(x2)} where x={x1,x2}.

cosisinf4 (x,y,z) sets y and z to {cos(x1), sin(x1), cos(x2), sin(x2)} and {cos(x3), sin(x3), cos(x4), sin(x4)} where x={x1,x2,x3,x4}.

void cosisind2 (vector double x, vector double *y, vector double *z) void cosisinf4 (vector float x, vector float *y, vector float *z)
divd2 divf4 Computes the quotient vx/vy. vector double divd2 (vector double vx, vector double vy); vector float divf4 (vector float vx, vector float vy);
erfcd2 erfcf4 Computes the complementary error function of each element of vx. vector double erfcd2 (vector double vx); vector float erfcf4 (vector float vx);
erfd2 erff4 Computes the error function of each element of vx. vector double erfd2 (vector double vx); vector float erff4 (vector float vx);
expd2 expf4 Computes the exponential function of each element of vx. vector double expd2 (vector double vx); vector float expf4 (vector float vx);
exp2d2 exp2f4 Computes 2 raised to the power of each element of vx. vector double exp2d2 (vector double vx); vector float exp2f4 (vector float vx);
expm1d2 expm1f4 Computes (the exponential function of each element of vx) - 1. vector double expm1d2 (vector double vx); vector float expm1f4 (vector float vx);
exp2m1d2 exp2m1f4 Computes (2 raised to the power of each element of vx) -1. vector double exp2m1d2 (vector double vx); vector float exp2m1f4 (vector float vx);
hypotd2 hypotf4 For each element of vx and the corresponding element of vy, computes sqrt(x*x+y*y). vector double hypotd2 (vector double vx, vector double vy); vector float hypotf4 (vector float vx, vector float vy);
lgammad2 lgammaf4 Computes the natural logarithm of the absolute value of the Gamma function of each element of vx . vector double lgammad2 (vector double vx); vector float lgammaf4 (vector float vx);
logd2 logf4 Computes the natural logarithm of each element of vx. vector double logd2 (vector double vx); vector float logf4 (vector float vx);
log2d2 log2f4 Computes the base-2 logarithm of each element of vx. vector double log2d2 (vector double vx); vector float log2f4 (vector float vx);
log10d2 log10f4 Computes the base-10 logarithm of each element of vx. vector double log10d2 (vector double vx); vector float log10f4 (vector float vx);
log1pd2 log1pf4 Computes the natural logarithm of each element of (vx +1). vector double log1pd2 (vector double vx); vector float log1pf4 (vector float vx);
log21pd2 log21pf4 Computes the base-2 logarithm of each element of (vx +1). vector double log21pd2 (vector double vx); vector float log21pf4 (vector float vx);
powd2 powf4 Computes each element of vx raised to the power of the corresponding element of vy. vector double powd2 (vector double vx, vector double vy); vector float powf4 (vector float vx, vector float vy);
qdrtd2 qdrtf4 Computes the quad root of each element of vx. vector double qdrtd2 (vector double vx); vector float qdrtf4 (vector float vx);
rcbrtd2 rcbrtf4 Computes the reciprocal of the cube root of each element of vx. vector double rcbrtd2 (vector double vx); vector float rcbrtf4 (vector float vx);
recipd2 recipf4 Computes the reciprocal of each element of vx. vector double recipd2 (vector double vx); vector float recipf4 (vector float vx);
rqdrtd2 rqdrtf4 Computes the reciprocal of the quad root of each element of vx. vector double rqdrtd2 (vector double vx); vector float rqdrtf4 (vector float vx);
rsqrtd2 rsqrtf4 Computes the reciprocal of the square root of each element of vx. vector double rsqrtd2 (vector double vx); vector float rsqrtf4 (vector float vx);
sincosd2 sincosf4 Computes the sine and cosine of each element of vx. void sincosd2 (vector double vx, vector double *vs, vector double *vc); void sincosf4 (vector float vx, vector float *vs, vector float *vc);
sind2 sinf4 Computes the sine of each element of vx. vector double sind2 (vector double vx); vector float sinf4 (vector float vx);
sinhd2 sinhf4 Computes the hyperbolic sine of each element of vx. vector double sinhd2 (vector double vx); vector float sinhf4 (vector float vx);
sqrtd2 sqrtf4 Computes the square root of each element of vx. vector double sqrtd2 (vector double vx); vector float sqrtf4 (vector float vx);
tand2 tanf4 Computes the tangent of each element of vx. vector double tand2 (vector double vx); vector float tanf4 (vector float vx);
tanhd2 tanhf4 Computes the hyperbolic tangent of each element of vx. vector double tanhd2 (vector double vx); vector float tanhf4 (vector float vx);