whats faster for repacking mono4 stream?

DSP related issues, mathematics, processing and techniques
Post Reply
User avatar
Nubeat7
Posts: 1347
Joined: Sat Apr 14, 2012 9:59 am
Location: Vienna
Contact:

whats faster for repacking mono4 stream?

Post by Nubeat7 »

quick asm question again,

for optimizing my schematics i do a lot of repacking mono4 streams, after using just 2 channels most of the time (stereo) i often pack 2 stereo signals (from 2 mono4 nodes) into one mono4, instead of using unpacking and packing again i normally always used this:

Code: Select all

fld in1[0];   fstp out1n2[0];
fld in1[1];   fstp out1n2[1];
fld in2[0];   fstp out1n2[2];
fld in2[1];   fstp out1n2[3];

but i also could use this:

Code: Select all

movaps xmm0,in1;
movaps xmm1,in2;
shufps xmm0,xmm1,68;
movaps out,xmm0;

which i think should be faster? am i right that the shufps is faster?
KG_is_back
Posts: 1196
Joined: Tue Oct 22, 2013 5:43 pm
Location: Slovakia

Re: whats faster for repacking mono4 stream?

Post by KG_is_back »

The shufps takes only one cycle on most CPUs, In the first example you read four times from memory and write 4 times to memory, While in example 2 you read twice and read once, so it's definitely faster, as far as I can tell.

Have a look at the Opcode reference I've made recently and also you can easily use Code Speed tester to inspect the actual CPU load.
User avatar
martinvicanek
Posts: 1334
Joined: Sat Jun 22, 2013 8:28 pm

Re: whats faster for repacking mono4 stream?

Post by martinvicanek »

Yes, shufps is much faster. Also avoid using the stock Pack and Unpack modules as they essentially use fld and fstp. The worst example of "Verschlimmbesserung" (sorry about the German term) is the stock Stereo Clipper, where the Pack/Unpack modules overhead outweighs by far any potential CPU savings.
User avatar
Nubeat7
Posts: 1347
Joined: Sat Apr 14, 2012 9:59 am
Location: Vienna
Contact:

Re: whats faster for repacking mono4 stream?

Post by Nubeat7 »

thanks martin for the confirmation :)

but how to do it the other way around without fld / fstp

so if i have one mono4 input (2 x stereo) and i want to route them into 2 mono4 streams again

Code: Select all

fld in[0]; fstp out1[0];
fld in[1]; fstp out1[1];
fld in[2]; fstp out2[0];
fld in[3]; fstp out2[1];


couldn't figure out a way with shufps?
User avatar
martinvicanek
Posts: 1334
Joined: Sat Jun 22, 2013 8:28 pm

Re: whats faster for repacking mono4 stream?

Post by martinvicanek »

Like this?

Code: Select all

streamin pack;
streamout out0;
streamout out1;
int true=-1;   // binary 11111111111111111111111111111111
float mask0=01;

stage0;
fld true[0]; fst mask01[0]; fstp mask01[1];

stage 2;
movaps xmm0,pack;
movaps xmm1,xmm0;
shufps xmm1,xmm1,78;   // 0123 -> 2301 (23 are first)
andps xmm0,mask01;
movaps out0,xmm0;
andps xmm1,mask01;
movaps out1,xmm1;

Or, depending on what you do with the two outputs further on, you might even drop the masking: ;)

Code: Select all

streamin pack;
streamout out0;
streamout out1;

movaps xmm0,pack;
movaps out0,xmm0;
shufps xmm0,xmm0,78;   // 0123 -> 2301 (23 are first)
movaps out1,xmm0;
Post Reply