Using the SIMD libraries

The MASS SIMD library contains a set of frequently used math intrinsic functions that provide improved performance over the corresponding standard system library functions.

The SIMD libraries shipped with IBM® Open XL C/C++ are listed below:

libmass_simd.a: The generic SIMD library that runs on any supported POWER® processor. It provides balanced performance tuning across the range of processors while favoring Power9 and Power10. Unless your application requires this portability, use the appropriate architecture-specific library below for maximum performance.
libmass_simdp7.a: Contains functions that are tuned for the POWER7 architecture.
libmass_simdp8.a: Contains functions that are tuned for the POWER8 architecture.
libmass_simdp9.a: Contains functions that are tuned for the POWER9 architecture.
libmass_simdp10.a: Contains functions that are tuned for the Power10 architecture.
libmass_simdp11.a: Contains functions that are tuned for the Power11 architecture.

If you want to use the MASS SIMD functions, follow the procedure below:

Provide the prototypes for the functions by including mass_simd.h in your source files.
Choose the appropriate MASS SIMD library above and link it with your application. For instructions, see Compiling and linking a program with MASS.

The single-precision MASS SIMD functions accept single-precision arguments and return single-precision results. Likewise, the double-precision MASS SIMD functions accept double-precision arguments and return double-precision results. They are summarized in Table 1.

Table 1. MASS SIMD functions
Double-precision function	Single-precision function	Description	Double-precision function prototype	Single-precision function prototype
acosd2	acosf4	Computes the arc cosine of each element of `vx`.	vector double acosd2 (vector double vx);	vector float acosf4 (vector float vx);
acoshd2	acoshf4	Computes the arc hyperbolic cosine of each element of `vx`.	vector double acoshd2 (vector double vx);	vector float acoshf4 (vector float vx);
asind2	asinf4	Computes the arc sine of each element of `vx`.	vector double asind2 (vector double vx);	vector float asinf4 (vector float vx);
asinhd2	asinhf4	Computes the arc hyperbolic sine of each element of `vx`.	vector double asinhd2 (vector double vx);	vector float asinhf4 (vector float vx);
atand2	atanf4	Computes the arc tangent of each element of `vx`.	vector double atand2 (vector double vx);	vector float atanf4 (vector float vx);
atan2d2	atan2f4	Computes the arc tangent of each element of `vx/vy`.	vector double atan2d2 (vector double vx, vector double vy);	vector float atan2f4 (vector float vx, vector float vy);
atanhd2	atanhf4	Computes the arc hyperbolic tangent of each element of `vx`.	vector double atanhd2 (vector double vx);	vector float atanhf4 (vector float vx);
cbrtd2	cbrtf4	Computes the cube root of each element of `vx`.	vector double cbrtd2 (vector double vx);	vector float cbrtf4 (vector float vx);
cosd2	cosf4	Computes the cosine of each element of `vx`.	vector double cosd2 (vector double vx);	vector float cosf4 (vector float vx);
coshd2	coshf4	Computes the hyperbolic cosine of each element of `vx`.	vector double coshd2 (vector double vx);	vector float coshf4 (vector float vx);
cosisind2	cosisinf4	Computes the cosine and sine of each element of `x`, and stores the results in `y` and `z` as follows: `cosisind2 (x,y,z)` sets y and z to `{cos(x1), sin(x1)}` and `{cos(x2), sin(x2)}` where `x={x1,x2}`. `cosisinf4 (x,y,z)` sets y and z to `{cos(x1), sin(x1), cos(x2), sin(x2)}` and `{cos(x3), sin(x3), cos(x4), sin(x4)}` where `x={x1,x2,x3,x4}`.	void cosisind2 (vector double x, vector double y, vector double z)	void cosisinf4 (vector float x, vector float y, vector float z)
divd2	divf4	Computes the quotient `vx/vy`.	vector double divd2 (vector double vx, vector double vy);	vector float divf4 (vector float vx, vector float vy);
erfcd2	erfcf4	Computes the complementary error function of each element of `vx`.	vector double erfcd2 (vector double vx);	vector float erfcf4 (vector float vx);
erfd2	erff4	Computes the error function of each element of `vx`.	vector double erfd2 (vector double vx);	vector float erff4 (vector float vx);
expd2	expf4	Computes the exponential function of each element of `vx`.	vector double expd2 (vector double vx);	vector float expf4 (vector float vx);
exp2d2	exp2f4	Computes 2 raised to the power of each element of `vx`.	vector double exp2d2 (vector double vx);	vector float exp2f4 (vector float vx);
expm1d2	expm1f4	Computes (the exponential function of each element of `vx`) - 1.	vector double expm1d2 (vector double vx);	vector float expm1f4 (vector float vx);
exp2m1d2	exp2m1f4	Computes (2 raised to the power of each element of `vx`) -1.	vector double exp2m1d2 (vector double vx);	vector float exp2m1f4 (vector float vx);
hypotd2	hypotf4	For each element of `vx` and the corresponding element of `vy`, computes `sqrt(xx+yy)`.	vector double hypotd2 (vector double vx, vector double vy);	vector float hypotf4 (vector float vx, vector float vy);
lgammad2	lgammaf4	Computes the natural logarithm of the absolute value of the Gamma function of each element of `vx` .	vector double lgammad2 (vector double vx);	vector float lgammaf4 (vector float vx);
logd2	logf4	Computes the natural logarithm of each element of `vx`.	vector double logd2 (vector double vx);	vector float logf4 (vector float vx);
log2d2	log2f4	Computes the base-2 logarithm of each element of `vx`.	vector double log2d2 (vector double vx);	vector float log2f4 (vector float vx);
log10d2	log10f4	Computes the base-10 logarithm of each element of `vx`.	vector double log10d2 (vector double vx);	vector float log10f4 (vector float vx);
log1pd2	log1pf4	Computes the natural logarithm of each element of `(vx +1)`.	vector double log1pd2 (vector double vx);	vector float log1pf4 (vector float vx);
log21pd2	log21pf4	Computes the base-2 logarithm of each element of `(vx +1)`.	vector double log21pd2 (vector double vx);	vector float log21pf4 (vector float vx);
powd2	powf4	Computes each element of `vx` raised to the power of the corresponding element of `vy`.	vector double powd2 (vector double vx, vector double vy);	vector float powf4 (vector float vx, vector float vy);
qdrtd2	qdrtf4	Computes the quad root of each element of `vx`.	vector double qdrtd2 (vector double vx);	vector float qdrtf4 (vector float vx);
rcbrtd2	rcbrtf4	Computes the reciprocal of the cube root of each element of `vx`.	vector double rcbrtd2 (vector double vx);	vector float rcbrtf4 (vector float vx);
recipd2	recipf4	Computes the reciprocal of each element of `vx`.	vector double recipd2 (vector double vx);	vector float recipf4 (vector float vx);
rqdrtd2	rqdrtf4	Computes the reciprocal of the quad root of each element of `vx`.	vector double rqdrtd2 (vector double vx);	vector float rqdrtf4 (vector float vx);
rsqrtd2	rsqrtf4	Computes the reciprocal of the square root of each element of `vx`.	vector double rsqrtd2 (vector double vx);	vector float rsqrtf4 (vector float vx);
sincosd2	sincosf4	Computes the sine and cosine of each element of `vx`.	void sincosd2 (vector double vx, vector double vs, vector double vc);	void sincosf4 (vector float vx, vector float vs, vector float vc);
sind2	sinf4	Computes the sine of each element of `vx`.	vector double sind2 (vector double vx);	vector float sinf4 (vector float vx);
sinhd2	sinhf4	Computes the hyperbolic sine of each element of `vx`.	vector double sinhd2 (vector double vx);	vector float sinhf4 (vector float vx);
sqrtd2	sqrtf4	Computes the square root of each element of `vx`.	vector double sqrtd2 (vector double vx);	vector float sqrtf4 (vector float vx);
tand2	tanf4	Computes the tangent of each element of `vx`.	vector double tand2 (vector double vx);	vector float tanf4 (vector float vx);
tanhd2	tanhf4	Computes the hyperbolic tangent of each element of `vx`.	vector double tanhd2 (vector double vx);	vector float tanhf4 (vector float vx);