Fast Stream Array Access

Post any examples or modules that you want to share here
User avatar
martinvicanek
Posts: 1334
Joined: Sat Jun 22, 2013 8:28 pm

Fast Stream Array Access

Post by martinvicanek »

Following KG's excellent ASM posts over at FS Guru I stumbled over a possibility to considerably cut down CPU load for stream array access. As an example I am attaching a low-CPU delay (integer and interpolated variants). The design borrows from Trogz Toolz, he has some smart and highly optimized stuff there. Hard to believe there was still a factor of 3(!) of optimization potential to gain. :shock:

Boy this opens up possibilities: fast lookup tables, fast wavetable oscillators, you name it.
Attachments
fastDelay.fsm
(15.65 KiB) Downloaded 1277 times
tester
Posts: 1786
Joined: Wed Jan 18, 2012 10:52 pm
Location: Poland, internet

Re: Fast Stream Array Access

Post by tester »

It's great news Martin. Another set of "impossible" (since the SM age) problems will be solved. I can't wait to see it. :-)
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
Exo
Posts: 426
Joined: Wed Aug 04, 2010 8:58 pm
Location: UK
Contact:

Re: Fast Stream Array Access

Post by Exo »

Excellent nice work, yes stream arrays have always been very slow because of the need to unpack the channels.
This really is a game changer because a huge bottle neck has been removed :)

Gonna have a look see i can optimize a few other things with this :)
Flowstone Guru. Blog and download site for Flowstone.
Best VST Plugins. Initial Audio.
Exo
Posts: 426
Joined: Wed Aug 04, 2010 8:58 pm
Location: UK
Contact:

Re: Fast Stream Array Access

Post by Exo »

Hi Martin, do you think it is possible to do this trick with this code?

Code: Select all

polyintin addr;
polyintin max;
streamin index;
streamout out;

int zero = 0;
int temp = 0;
stage2;
mov eax,addr[0];
cmp eax,0;
jz bypass;

  cvtps2dq xmm0,index;
  maxps xmm0,zero;
  minps xmm0,max;
  pslld xmm0,2;
  paddd xmm0,addr;
  movaps temp,xmm0;
 
  //Read
  mov eax,temp[0];
  fld [eax] ; fstp out[0];

  mov eax,temp[1];
  fld [eax] ; fstp out[1];
 
  mov eax,temp[2];
  fld [eax] ; fstp out[2];

  mov eax,temp[3];
  fld [eax] ; fstp out[3];
   
bypass:


This reads directly from the address of a mem, instead of from the mem input or an array. Where eax is the actually memory address and we read the actual value by doing [eax] . I know it can work easy with the mem input because it is copied into a standard code array.
Flowstone Guru. Blog and download site for Flowstone.
Best VST Plugins. Initial Audio.
KG_is_back
Posts: 1196
Joined: Tue Oct 22, 2013 5:43 pm
Location: Slovakia

Re: Fast Stream Array Access

Post by KG_is_back »

It should be possible, as I have posted on the FS guru. http://flowstone.guru/blog/how-to-use-assembler-part-3-alu-fpu-and-array-management/ just after Martins example post. I didn't tested it though. In that particular case the problem is a little bit more complicated - you need to read values that are in different channels and put them into desired channel. Only way to do that is code branching to pick the right shufps action.
Another concern is what happens when array is not 4*N size (in samples), because with the last values you would also read data outside the mem when using movaps (which works on 16bit aligned data). That may or may not crash. Further testing has to be done...
User avatar
martinvicanek
Posts: 1334
Joined: Sat Jun 22, 2013 8:28 pm

Re: Fast Stream Array Access

Post by martinvicanek »

Exo wrote:Hi Martin, do you think it is possible to do this trick with this code? [...]

Hehe, that's what I am after as well. :mrgreen: So far I have only been able to do this with arrays declared in the same ASM module, though. KG has me lost, I'm curious what he will be pulling out his sleeve next. :ugeek: :ugeek:
Last edited by martinvicanek on Sun Oct 19, 2014 9:46 pm, edited 1 time in total.
KG_is_back
Posts: 1196
Joined: Tue Oct 22, 2013 5:43 pm
Location: Slovakia

Re: Fast Stream Array Access

Post by KG_is_back »

Nope... it seems the movaps works only with data that was declared as SSE array - which mems are not the case.
User avatar
martinvicanek
Posts: 1334
Joined: Sat Jun 22, 2013 8:28 pm

Re: Fast Stream Array Access

Post by martinvicanek »

Okay, that explains it. So could we declare an SSE array and copy the external mem to it in stage0 (basically what mem input in 3.0.5 does)? Then we'd have fast movaps/shufps access in stage2.
KG_is_back
Posts: 1196
Joined: Tue Oct 22, 2013 5:43 pm
Location: Slovakia

Re: Fast Stream Array Access

Post by KG_is_back »

That should do the trick.

BTW here is the code I came up with:

Code: Select all

streamin addr;
streamin max;
streamin index;
streamout out;

int zero = 0;
int temp = 0;
int temp2=0;
int I0=0;
int I1=1;
int I2=2;
int I3=3;
int In4=-4; //this is binary mask that makes last two bits zero
           //that means it rounds down to nearest multiple of 4
int I3=3; //this extracts only first two bits. It is actually N%4
float array[4];
stage2;
mov eax,addr[0];
cmp eax,0;
jz bypass;

  cvtps2dq xmm0,index;
  maxps xmm0,zero;
  minps xmm0,max;
  movaps xmm1,xmm0;
  andps xmm0,In4;
  pslld xmm0,2;
  paddd xmm0,addr; //this is address for 16bit aligned read
  movaps temp,xmm0;
  andps xmm1,I3; //this will be used to shuffle the right sample into output
  movaps temp2,xmm1;
  pslld xmm1,4;
  //read for channel1 and store into array
  mov eax,temp[0];
  movaps xmm2,[eax];
  movd eax,xmm1;
  movaps array[eax],xmm2;
 
  //extract values from array and shuffle each value into index[0]
  mov eax,0;
  movaps xmm0,array[eax]; //xmm0 may contain desired value in ch(0) - no shufling needed
  movaps xmm4,I0;
  cmpps xmm4,temp2,0; //true if index%4==0
  andps xmm1,xmm4;
 
  add eax,16;
  movaps xmm1,array[eax]; //xmm1 may cntn desired value in ch(1) - shuffle it to 0
  shufps xmm1,xmm1,1;
  movaps xmm4,I1;
  cmpps xmm4,temp2,0; //true if index%4==1
  andps xmm1,xmm4;
 
  add eax,16;
  movaps xmm2,array[eax]; //...
  shufps xmm2,xmm2,2;
  movaps xmm4,I2;
  cmpps xmm4,temp2,0; //true if index%4==2
  andps xmm2,xmm4;
 
  add eax,16;
  movaps xmm3,array[eax];
  shufps xmm3,xmm3,3;
  movaps xmm4,I3;
  cmpps xmm4,temp2,0; //true if index%4==3
  andps xmm3,xmm4;
 
  orps xmm0,xmm1;
  orps xmm0,xmm2;
  orps xmm0,xmm3;
  movaps out,xmm0;
 
bypass:


it does not work because of the movaps xmm0,[eax] but replacing that with array should fix it.
Exo
Posts: 426
Joined: Wed Aug 04, 2010 8:58 pm
Location: UK
Contact:

Re: Fast Stream Array Access

Post by Exo »

KG_is_back wrote:it does not work because of the movaps xmm0,[eax] but replacing that with array should fix it.


Yes movaps xmm0,[eax]; is the first thing I tried. Shame really. Should it work?

I was going to ask you guys is there any opcodes you really want/need? If you can give clear examples of benefits of certain opcodes I could get on to Malc to add them (I'm usually quite good at getting him to add little things if I give him a clear example and make it simple for him).

Maybe topic for another thread?
Flowstone Guru. Blog and download site for Flowstone.
Best VST Plugins. Initial Audio.
Post Reply