time-domain FIR filter (Convolver)
Posted: Tue Oct 22, 2013 5:48 pm
REPOSTED FROM SYNTHMAKER FORUM
Very simply a time-domain FIR filter. It circularly loads the coefficients and calculates output as (pseudocode):
However, this can be managed in many different ways in assembly.
The "medium optimized" FIR filter uses following code to load 4packed values from memory:
That requires cvtps2dq (which is packed single precision float to doublequadword integer conversion) which is quite cpu hungry instruction and also the st() registers to call individual values of 4packed data. If you put this in a loop (like the FIR filter requires to calculate it a several hundred times) the polar-bear isn't very happy with you buying quadcore instead of dualcore.
To save CPU you can calculate the indexes as integers (using eax register and add,sub,cmp instructions and conditional jumps) and leave out the conversion. Also in this case the index is always the same for all 4 channels so you don't save CPU by using SSE instructions (actually you waste CPU by using them).
Because index is the same in all 4 channels instead of calling individual values from the memory using st() registers, you can simply use this:
on my computer the filter runs at 1/6 CPU % compared to the medium optimized version. Only it has fixed impulse size of 256. However it can easily be changed within the schematic, possibly even be made variable.
the schematic is .osm ,hope there will not be problems with compatibility.
http://www.mediafire.com/?xdwa7ivycqaywtk
Very simply a time-domain FIR filter. It circularly loads the coefficients and calculates output as (pseudocode):
Code: Select all
out=0;
for i:=0 to 255 do
out=out+coeficientMemory[i]*inputMemory[i];
However, this can be managed in many different ways in assembly.
The "medium optimized" FIR filter uses following code to load 4packed values from memory:
Code: Select all
cvtps2dq xmm1,Index;
movaps smIntVarArrayIndex,xmm1;
push eax;
mov eax,smIntVarArrayIndex[0];
shl eax,4;
fld memory[eax];
fstp smIntVarTemp[0];
mov eax,smIntVarArrayIndex[1];
shl eax,4;
add eax,4;
fld memory[eax];
fstp smIntVarTemp[1];
mov eax,smIntVarArrayIndex[2];
shl eax,4;
add eax,8;
fld memory[eax];
fstp smIntVarTemp[2];
mov eax,smIntVarArrayIndex[3];
shl eax,4;
add eax,12;
fld memory[eax];
fstp smIntVarTemp[3];
pop eax;That requires cvtps2dq (which is packed single precision float to doublequadword integer conversion) which is quite cpu hungry instruction and also the st() registers to call individual values of 4packed data. If you put this in a loop (like the FIR filter requires to calculate it a several hundred times) the polar-bear isn't very happy with you buying quadcore instead of dualcore.
To save CPU you can calculate the indexes as integers (using eax register and add,sub,cmp instructions and conditional jumps) and leave out the conversion. Also in this case the index is always the same for all 4 channels so you don't save CPU by using SSE instructions (actually you waste CPU by using them).
Because index is the same in all 4 channels instead of calling individual values from the memory using st() registers, you can simply use this:
Code: Select all
mov eax,index[0]; //assuming the index is already integer
shl eax,4;
movaps xmm0,memory[eax]; //loads all four values from memory at that index
on my computer the filter runs at 1/6 CPU % compared to the medium optimized version. Only it has fixed impulse size of 256. However it can easily be changed within the schematic, possibly even be made variable.
the schematic is .osm ,hope there will not be problems with compatibility.
http://www.mediafire.com/?xdwa7ivycqaywtk