I have been given a wav file of train locomotive noise - literally something you can play back and hear. Using the audio package and the load.wave function I have got a 1.5 million element vector which visually at least has some periodicity in certain parts and does not seem to be completely random. Most elements (99%) are within a range of about -0.14 to +0.14 with occasional outliers. Beneath is a typical short segment. This is the head: sample rate: 16000Hz, mono, 16-bits [1] -3.051851e-05 6.103516e-05 -6.103702e-05 3.051758e-05 3.051758e-05 -1.220740e-04 Most elements (99%) are within a range of about -0.14 to +0.14 with occasional outliers This is the same kind of output as is illustrated in the documentation: https://cran.r-project.org/web/packages/seewave/vignettes/seewave_IO.pdf What I am not sure about, and I can't find any clear explanation, is what these elements actually stand for? I would have thought that one needed as a minimum both volume and frequency ie a two dimensional vector but as far as I can tell there is only one single vector. I'm aware that this question is pushing the envelope of R help but... Thanks, Nick Wray
You aren't pushing any envelope... you slit it open and fell out somewhere on the sidewalk. I tossed your question into Google and it came back with [1] and [2]. Please do that yourself instead whenever you are tempted to go off topic. [1] https://stackoverflow.com/questions/25940376/whats-the-actual-data-in-a-wav-file [2] https://en.m.wikipedia.org/wiki/Digital_audio On February 1, 2019 2:20:57 AM PST, Nick Wray via R-help <r-help at r-project.org> wrote:>I have been given a wav file of train locomotive noise - literally >something you can play back and hear. Using the audio package and the >load.wave function I have got a 1.5 million element vector which >visually at least has some periodicity in certain parts and does not >seem to be completely random. Most elements (99%) are within a range >of about -0.14 to +0.14 with occasional outliers. Beneath is a typical >short segment. > > >This is the head: > >sample rate: 16000Hz, mono, 16-bits >[1] -3.051851e-05 6.103516e-05 -6.103702e-05 3.051758e-05 >3.051758e-05 -1.220740e-04 > >Most elements (99%) are within a range of about -0.14 to +0.14 with >occasional outliers > >This is the same kind of output as is illustrated in the documentation: > >https://cran.r-project.org/web/packages/seewave/vignettes/seewave_IO.pdf > >What I am not sure about, and I can't find any clear explanation, is >what these elements actually stand for? >I would have thought that one needed as a minimum both volume and >frequency ie a two dimensional vector but as far as I can tell >there is only one single vector. I'm aware that this question is >pushing the envelope of R help but... > >Thanks, Nick Wray >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
Hello Nick Wray, Let me offer a simplified explanation of what's going on. Sorry if it's unnecessary. Sound is waves of pressure in the air. Devices like microphones can measure the changing pressure by converting it into voltage. Voltage can then be sampled by an analog-to-digital converter inside a sound card and stored as numbers in computer memory. On Fri, 1 Feb 2019 10:20:57 +0000 (GMT) Nick Wray via R-help <r-help at r-project.org> wrote:> What I am not sure about, and I can't find any clear explanation, is > what these elements actually stand for?Digital sound works by measuring "pressure" a few tens of thousands of times per second and then recreating the corresponding signal elsewhere. According to the sampling theorem, sound sampled N times per second would be losslessly reproduced if it didn't contain frequencies above N/2 Hz. To reiterate, these numbers are just audio samples. Feed them to the sound card at the original sample rate, and you hear the same sound that had been recorded. This part is explained well in two 30-minute video lectures here: https://xiph.org/video/vid1.shtml https://xiph.org/video/vid2.shtml (I wouldn't normally recommend video lectures, but these are really good.)> I would have thought that one needed as a minimum both volume and > frequency ie a two dimensional vector but as far as I can tell there > is only one single vector.You are describing a spectrogram: a surface showing the "volume" of each individual frequency in the sound recording, over time. How to get it? If you run a Fourier transform over the original vector, you will get only one vector showing the magnitudes and phases of all frequencies through the whole length of the clip. To get a two-dimensional spectrogram, you should take overlapping parts of the original vector of samples, multiply them by a special window function, then take a Fourier transform over that and combine resulting vectors into a matrix. Computing a spectrogram involves choosing a lot of parameters: size of the overlapping window, step between overlapping windows, the window function itself and its own parameters. Problems like these should be described in books about digital signal processing. Jeff Newmiller sent more useful links while I was typing this, and I guess I should posting off-topic. -- Best regards, Ivan