x86 SSE cvtdq2ps intruction
Posted: Sat Mar 07, 2020 5:38 pm
Is Flowstone 3.0.4 supporting the cvtdq2ps x86 SSE2 instruction?
I am talking about the ConvoRev7fixed.frm that got posted by Martin Vicanek on Wed Jun 05, 2019 11:00 pm.
http://www.dsprobotics.com/support/viewtopic.php?f=4&t=3879&start=50.
The Frequency Domain Convolution (FDM) routine appears to rely on such instruction. Such instruction converts four packed signed doubleword integers from xmm2/mem to four packed single-precision floating-point values in xmm1. Is such instruction exploited for speeding up the real/imag Frequency Domain data processing?
By the way, such ConvoRev7fixed.frm embeds a Direct Convolution (DC) routine (purely operating in time-domain thus), that's not exploiting the cvtdq2ps x86 SSE2 instruction. Has anyone tried to exploit such Direct Convolution (DC) routine, as 32-tap FIR filter? Can we push and generalize such DC routine, for it to serve as freely (statically) configurable N-tap FIR filter with N, not necessary power of two, allowed to take any value between 2 and 256? Showing as Flowstone "ready made" component? We'll deal with the impulse response generator late, as companion. Thus, within in a few weeks, Flowstone could embed a standardized "DC FIR Filter" module, along with a "DC FIR Filter Controller" that's relying on splines for drawing some arbitrary frequency response curve, and may feature a "linear phase / minimum phase" selector, and may feature a few windowing options.
Have a nice day
I am talking about the ConvoRev7fixed.frm that got posted by Martin Vicanek on Wed Jun 05, 2019 11:00 pm.
http://www.dsprobotics.com/support/viewtopic.php?f=4&t=3879&start=50.
The Frequency Domain Convolution (FDM) routine appears to rely on such instruction. Such instruction converts four packed signed doubleword integers from xmm2/mem to four packed single-precision floating-point values in xmm1. Is such instruction exploited for speeding up the real/imag Frequency Domain data processing?
By the way, such ConvoRev7fixed.frm embeds a Direct Convolution (DC) routine (purely operating in time-domain thus), that's not exploiting the cvtdq2ps x86 SSE2 instruction. Has anyone tried to exploit such Direct Convolution (DC) routine, as 32-tap FIR filter? Can we push and generalize such DC routine, for it to serve as freely (statically) configurable N-tap FIR filter with N, not necessary power of two, allowed to take any value between 2 and 256? Showing as Flowstone "ready made" component? We'll deal with the impulse response generator late, as companion. Thus, within in a few weeks, Flowstone could embed a standardized "DC FIR Filter" module, along with a "DC FIR Filter Controller" that's relying on splines for drawing some arbitrary frequency response curve, and may feature a "linear phase / minimum phase" selector, and may feature a few windowing options.
Have a nice day