大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【信息技术】【2003】高质量音频信号的分析与编码

已有 1768 次阅读 2021-9-2 20:45 |系统分类:科研笔记|文章来源:转载

图片

本文为澳大利亚昆士兰科技大学(作者:Daryl Ning)的博士论文,共179页。

 

数字音频已经越来越成为我们日常生活的一部分。不幸的是,与原始数字信号相关联的过多比特率使其成为极其昂贵的表示。数字音频广播、高清晰度电视和互联网音频等应用需要低比特率的高质量音频。音频编码领域解决了在保持高感知质量的同时降低数字音频比特率这一重要问题。开发一个高效的音频编码器需要对音频信号本身进行详细的分析。重要的是要找到一种表示,可以简洁地对任何一般音频信号建模。

 

在这篇论文中,我们提出了两种新的高质量音频编码器,分别基于两种不同的音频表示:正弦小波表示和扭曲线性预测编码(WLPC)小波表示。除了高质量的编码外,音频编码器在应用中的灵活性也很重要。随着网络音频的日益普及,音频编码器有利于解决实时音频传输的相关问题,本文针对比特流的可扩展性问题,提出了一种具有比特流可扩展的第三代音频编码器。通过与MPEGlayer III编码器的比较,评价了每种编码器的性能。

 

第一种编码器是基于混合正弦小波表示。假设每一帧音频都可以建模为正弦信号加上噪声残差的总和。利用离散小波变换(DWT)将残差分解为近似人耳临界频带的子带,然后,使用感知导出的比特分配算法来最小化由量化DWT系数引入的可听失真。听力测试表明,编码器在G4 kbps提供了近透明质量范围内的关键音频信号。它的性能也优于在相同比特率下运行的MPEGlayer III编码器。然而,这种编码器仅对高质量编码有用,并且难以扩展到较低的速率操作。

 

第二种编码器是基于混合WLPC小波表示的。在这种方法中,音频信号的频谱是由一个使用扭曲线性预测(WLP)的全极点滤波器估计的。WLP工作在一个扭曲的频域上,分辨率可以调整到接近人类听觉系统的分辨率。这使得合成滤波器的固有噪声更适合于音频编码。该滤波器的激励采用离散小波变换,并进行感知编码。听力测试表明,在G4kbps时,可以实现近乎透明的编码。在相同的比特率下,该编码器也被发现略优于MPEGlayer III编码器。

 

提出的第三种编码器与以前的WLPC小波编码器相似,但经过改进以实现码流的可伸缩性。为了保持低比特率,采用了高频分量的噪声模型,并实现了DWT系数的两级量化方案。第一阶段使用固定速率标量和矢量量化来提供系数的粗略近似,这使得输入信号的低比特率、低质量版本可以嵌入到整个比特流中。第二阶段的量化增加了系数的细节,从而提高了输出信号的质量。听力测试表明,当比特率从16kbps增加到20kbps时,信号质量得到了很好的改善。此编码器的性能与以类似(但固定)比特率运行的MPEGlayer III编码器相当。

 

 

Digital audio is  increasingly becoming more and more  a part ofour  daily lives. Unfortunately, theexcessive bitrate associated with the raw digital signal makes it an extremelyexpensive representation. Applications such as digital audio broad- casting,high definition television, and internet audio, require high quality audio atlow bitrates. The field of audio coding addresses this important issue ofreducing the bitrate of digital audio, while maintaining a high perceptual quality. Developing an efficient audio coder requiresa detailed analysis of the audio signals themselves. It is important to find a representation that can concisely model any general audiosignal. In this thesis, we propose two new high quality audio coders based ontwo different audio representations thesinusoidal-wavelet representa- tion, and the warped  linear predictive coding (WLPC)-waveletrepresentation.  In addition to highquality coding, it is also important for audio coders to be flexible in theirapplication. With the increasing popularity of internet  audio, it is advan- tageous for audio codersto address issues related to real-time audio delivery, The issue of bitstreamscalability has been targeted in this thesis, and therfore, a third audio codercapable of bitstream scalability is  alsoproposed. The performance of each of the proposed  coders was evaluated by comparisons with theMPEG layer III coder. The first  coder  proposed is based  on a hybrid sinusoidal-wavelet representation. This assumes that each frameof audio can be modelled as a sum of sinusoids plus a noisy residual. The discrete wavelet transform (DWT) is usedto decompose the residual into subbands that approximate the critical bands ofhuman hearing. Aperceptually derived bitallocation algorithm is then used to minimise the audible distortionsintroduced from quantising the DWT coefficients. Listening tests showed thatthe coder delivers near transparent quality for a range of critical audiosignals at G4 kbps. It also outperforms the MPEG layer IIIcoder operating atthis same bitrate.  This coder,however,  is only  useful for  high quality coding, and isdifficult to scale to operate at lower rates. The second  coder proposed  is  based on a hybrid WLPC-wavelet representation. In this approach, the spectrum of theaudio signal is estimated by an all pole filter using warped linear prediction(WLP). WLP operates on a warped frequency domain, where the resolution  can be adjusted to approximate that of the human auditory system. This makes the inherent noise shaping of thesynthesis filter even more suited to audio coding. The excitation to thisfilter  is transformed using the DWT andperceptually  encoded.  Listening tests showed that near transparentcoding is achieved at G4 kbps. The coder was also found to be slightly superiorto the MPEG layer IIIcoder operating at this same bitrate. The third proposedcoder is similar to the previous WLPC-wavelet coder, but modified to achievebitstream scalability. A noise model forhigh frequency components is  included tokeep the overall bitrate low, and a  twostage  quantisation scheme for the DWT coefficientsis implemented. The first stage uses fixed rate scalar and vector quantisation to provide a coarseapproximation of the coefficients. This al- lows for low bitrate, low qualityversions of the input signal to be embedded in the overall bitstream. The second stage of quantisation adds detail to thecoefficients, and hence, enhances the quality of the output signal.  Listening tests showed that signal qualitygracefully improves as the bitrate increases from 16 kbps to SO kbps. This coder has a performance thatis comparable to the MPEG layer IIIcoder operating at a similar (but fixed) bitrate.

1.       引言

2. 音频编码基础

3. 感知音频编码方案

4. 混合正弦小波音频编码

5. 混合WLPC小波音频编码

6. 比特流可扩展WLPC小波音频编码

7. 结论与未来展望


更多精彩文章请关注公众号:205328s611i1aqxbbgxv19.jpg




https://m.sciencenet.cn/blog-69686-1302590.html

上一篇:[转载]【计算机科学】【2016.12】运动摄影测量中三维点云结构的比较与表征
下一篇:[转载]【计算机科学】【2020.05】【含源码】基于GO语言的深度学习

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-3-29 13:52

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部