## JOSÉ EDUARDO FORNARI *

CCRMA Department of Music. Stanford University. Stanford, California 94305 email: fornari@ccrma.stanford.edu## FURIO DAMIANI *

DSIF. Electrical Engineering Faculty Campinas University. Campinas, S.P. Brazil email: Furio@dsif.fee.unicamp.br *Nucleus for Interdisciplinary Sound Studies - NICS/UnicamThis paper proposes a method for real time timbre modeling. The timbre manipulation is done by spectral operators. These ones are used to construct algorithmic structures. Each one of them performs a real time timbre operation. The fundamental theory of this method is presented here. A graphical example of a sound analysis is given, where it's theoretically exemplified an algorithmic structure for a timbre modeling performance. In the conclusion is discussed its advantadges and constrictions so far observed.

AbstractIt has been observed these last few years that sound synthesis modeling techniques has been grouped in two main streams of the knowledge; Spectral Modeling, and Physical Modeling. Spectral modeling techniques synthesize a sound based on the manipulation of its spectrum of frequencies. By the spectral modeling it is possible the generation and control of new sounds (e.g.,additive synthesis). However, the emulation of a natural sound, as a musical instrument, hasn't been good enough, due in part to the burden of dynamically control the sound time-varying partials (its frequency components). Therefore, spectral modeling can easily create new timbres but can not emulate a natural one. Physical modeling synthesizes techniques blown up a natural sound source in its fundamental compounds, attributing to each one of them a physical model. For instance, in a piano hammer the group mass and spring might be mathematically represented by a second-order resonator (Smith 1995). All these blocks are bound together into one physical model of the whole source of sound. This one can handle with dynamic changes, or time-varying. However, once that these techniques are supposed to emulate one specific source of sound they don't expect structural changes on the fly. Each physical model is a rigid structure made to emulate only one particular source. Timbre modeling is priory part of the spectral modeling realm, who is supposed to manipulate the timbre of a sound remaining unchanged its other features, as loudness and pitch. However, the technique proposed here try to combine features of the spectral and physical modeling: The flexibility of spectrum manipulation from the spectral modeling with the possibility of time-varying control, from the physical one. Timbre modeling in this work is given by a structured grouping of real-time operations on the instantaneous spectrum of an incoming sound. These operations are made by spectral operators, or SOs (Fornari 1995). The system proposed here has in common with the spectral modeling thought that the sound spectrum is the material to be manipulated. From the physical modeling theory, the system has the structure of SOs dynamically controlled to perform a timbre operation, or TO. The time-varying controller is the instantaneous spectrum of the incoming sound, or its timbre. Therefore, the timbre modeling here proposed is done by a real-time system who permits transform the timbre of an incoming sound by assembling and applying on it an algorithmic structure of SOs, called TO, who's can handle changes on the fly.

IntroductionThe system proposed here has to be fast enough to achieve real-time timbre modeling of a sound. Therefore, it was chosen to use DSP techniques in parallel processing applied to discrete-in-time audio signals. This system is divided in three sections: Analysis, Timbre Operation, and Synthesis. Sound is sequentially processed through them but each one of the three sections has internal blocks in parallel processing The system is based on the principle of Deterministic/Stochastic decomposition (Serra, 1989) adapted to real-time performance. In a few words, this method split a sound spectrum in two parts: a deterministic one, compound to its sinusoid (periodic-in-time) components, represented by peaks in its instantaneous spectrum; and stochastic one, the rest besides the deterministic one, that means the noisy, or non-periodic, part of the sound. The stochastic spectrum is obtained by subtraction of the peaks (deterministic part) of the original spectrum.

Method DescriptionThe analysis is done by a parallel-processing structure. The main branch of this structure promotes deterministic/stochastic decomposition, DSD. The others' branches promote only deterministic decompositions, DD. All branches receive the same entrance; a sequence of Fs samples/sec, representing the incoming quantized sound. The DSD carry out with two arrays in the frequency domain: a stochastic array, holding the stochastic part of its spectrum; and the deterministic array, with peaks of the first part of the spectrum, the lower frequencies. The other branches, DDs, fill the same deterministic array in its successive order, from the lowest to the highest frequency. The DSD branch is blown up in three blocks: Short-time Fourier transform, STFT; Peak detection; and Valley detection. Short-time Fourier transform is the adaptation of the Fourier transformation to handle with time-varying signals (Allen 1977). It receives an array with a small segment of sound in time domain and gives one array of the correspondent spectrum. Peak detection point out the psychoacoustical most important peaks in this spectrum and assembles the deterministic array (Serra 1989). Valley detection point out the psychoacustical valleys of this spectrum and interpolates it in one envoltory that is the stochastic array. The DD branches have the same structure of DSD, besides the Valley detection block. Periodically, for each frame of sound analyzed, the entire analysis structure provides one set of stochastic/deterministic arrays. This represents, as a sequence of pictures in a movie reel, the time-varying behavior of the sound spectrum.

a) Analysis:The ideal Spectral Operator promotes basic spectral operation instantaneously. It can by connect with others SOs in algorithmic structures (TOs) and each one of them, as well as the whole algorithm can be parametrized/modified on the fly (Fornari 1995). Nevertheless, instantaneous spectral operations are impossible to be done. Considering the human auditioning perception limits, in frequency and delay of time, it's possible nowadays to brew a set of SOs fast enough to perform spectral operations to our audition perception as the ideal ones. The SO structure is based on the followings: 1) SO: T[RN] -> RN; is transformation in the same vectorial space. 2) SO1[SO2[SO3[...SOn[A]....]]]] can be transposed by SOequivalent[A] 2) SO(A,p1,p2,...pn)=B; where: A, B are arrays of same length; p are parameters which can also be arrays (not necessarily with same size of A or B). 3) For each frame analyzed, the psychoascoustic processing time is: Time{SOn + SOn+1 + +SOn+2 + ... } = time{SOn} = 0. 4) SO handle with three types of arrays: Input, Output, Memory. Input and output have same size and only can be deterministic or stochastic ones. Do Not exist SO[M] -> I; or SO[O] -> I. 5) For a time-sequencing of frames, each SOt has to observe: Exist SOt[At-n], but do Not exist SOt[At+n] (prediction) The algorithmic structuring of SOs is organized in three states: 1) SO[I]->O; or SO[I]->M; 2) SO[M]->M; 3) SO[M]->O As it was mentioned above, each one of these structures is called a timbre operation, TO, applied to real time timbre modeling. The next section gives some examples of TOs.

b) Timbre OperationThe synthesis section receives periodically in time the stochastic/deterministic couple of arrays. The synthesis of the stochastic one is performed by the stochastic composition block, SC. This one applies the inverse-STFT to get the time-varying noise of the new timbre. The deterministic array is treated by the deterministic composition block, DC. It performs an additive synthesis of the peaks. The magnitude and phase of the peaks, represented along the deterministic frames, are interpolated in time and feed one bank of oscillators, as d(m)=Summation of: Mag.r(m).cos[Phase(m)]. The result of SC and DC are summated and given as the new timbre's sound.

c) SynthesisThis system has been simulated by the object-oriented mathematical programming tool Simulink, of Matlab. Simulink permits to write down each block of the system above as functions and simulate their parallel processing. The functions can be written in matlab code or in C (MEX-files). Each function is represented by an icon. These icons are bound in structures representing the three stages of the system: analysis, synthesis, and timbre operation. Once that the results are audio signals, this paper will present the graphical results of the analysis stage commenting its possibilities of timbre modeling. This one is simulated by four functions developed: frame, stft, pkdt, and vldt. The FRAME function pick one segment of the sound file based on the parameters: N=2v, the frame length, that determines the lowest frequency represented in the frame; fl= 1/N, H, the hop-size, that determine the ovelap between consecutive frames. The factor H/Fs has to be below of .03 to simulate a real-time sound transformation (Dyer, 1991). The STFT function do the short-time Fourier transformation of the frame, or the time-varying Fourier transformation of the sound. The output of stft is a complex array of length N/2 where the real part is the spectrum magnitude (in dB) and the imaginary part is its correspondent frequencies (in Hz). The PKDT do the spectrum peak detection function, what means to build one array only with the peaks of the framed sound spectrum. This is considered the deterministic part of sound. The VLDT do the valley detection of the same spectrum. It is used to determine the envoltory of the noisy part of the spectrum. It is considered the stochastic part of the sound. Figures with the result of each one of these functions are presented below. The spectral operators, SOs, manipulate the deterministic/stochastic arrays, or frames. To simulate them its was developed some matlab functions that transform the frame of the deterministic and/or stochastic spectrum, such as: sf(A,s,d)=B, shift the frame s% to the left(d=0) or right(d=1); iv(A,m)=B, inversion of the frame in magnitude(m=1) or frequency(m=0); ls(A,l,t)=B, load(l=1) or save(l=0) frame in the past time t; ts(A,B,s)=(0,1) (test the similarity between two frames pattern with s(0,..1) of accuracy. If similar, output is "1", otherwise "0"; ei(SO(A,...),i)=B , enable a generic SO if i=1, otherwise B=A. With these SO is possible to come up with one simple timbre operation structure. Let's suppose we want to put one octave up the sound of an acoustic instrument every time that the instrumenstist do an particular effect. Normally these effects can be related with a particular noisy or stochastic envoltory. Once that we have the approximated shape expected, its possible to construct one structure to do it, as shown below:

Experimental ResultsThe theoretical study and experimental simulations done so far have been proved that the method presented in this work can be useful to real time timbre modeling. The actual system, otherwise, will have to be a firmware system. The SOs will have necessarily to be implemented in hardware to overcome the processing time constriction. The parametrization of SOs and the modeling of structures will have to be done in software, to be flexible and easy to handle with. Some SOs can be projected to be controlled by interfaces of virtual reality. In that case the user can see, listen to, and interact with the sound surface, constructed by the successive in time frame of spectrum frames. The sound input is not only timbre modeled but also the real time controller of TO, or the structure of SOs. Considering this point of view, a static structure TO can be compared to an musical instrument but whose the controller; instead to be a keyboard, a frictioned or plucked string, a reef, or any mechanical device; is the timbre of a sound. This concept come up with a new propose to the development of a generation of electronic musical instruments timbre-controlled. This can mean one more step toward the freedom of the musical creativity, frequently stuck to the technical controller-virtuosity barrier.

ConclusionsAllen, Joint B. (1977). A Unified Approach to Short-Time Fourier Analysis and Synthesis. Proceedings of the IEEE, vol.65, no 11, November. Dyer, Lourette M. (1991) An Object-Oriented Real-Time Simulation of Music Performance Using Interactive Control. Phylosophy Dissertation, Stanford University. Stanford, California 94305. Fornari, J.E. (1995). Transformacoes Sonoras Atraves de Operacoes Timbrais. Master Thesis Dissertation. Campinas University. Sao Paulo, Brazil. Lo, Yee On. (1987). Toward a Theory of Timbre. Research sponsored by The System Development Fundation. CCRMA, Stanford University. Stanford, California 94305. Serra, Xavier. (1989). A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition. Phylosophy Dissertation, Stanford University. Stanford, California 94305. Smith, Julius O. (1995). Physical Model Synthesis Update. DRAFT version. CCRMA Stanford. California 94305.

References

This paper was presented as part of the Third Symposium of Computer Music in Brazil.