ABSTRACT
To reduce the tampering artifacts and/or enhance audio quality, some audio processing operations are often applied in the resulting tampered audio. Like image forensics, the detection of various post processing operations has become very important for audio authentication. In this paper, we propose a convolutional neural network (CNN) to detect audio processing operations. In the proposed method, we carefully design the network architecture, with particular attention to the frequency representation for the audio input, the activation function and the depth of the network. In our experiments, we evaluate the proposed method on audio clips with 12 commonly used audio processing operations and of three different small sizes. The experimental results show that our method can significantly outperform related methods based on hand-crafted features and other CNN architectures, and can achieve state-of-the-art results for both binary and multiple classification.
- Mart'ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et almbox. . 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).Google Scholar
- Mauro Barni, Luca Bondi, Nicolò Bonettini, Paolo Bestagini, Andrea Costanzo, Marco Maggini, Benedetta Tondi, and Stefano Tubaro . 2017. Aligned and non-aligned double JPEG detection using convolutional neural networks. Journal of Visual Communication and Image Representation Vol. 49 (2017), 153--163. Google ScholarDigital Library
- Belhassen Bayar and Matthew C Stamm . 2016. A deep learning approach to universal image manipulation detection using a new convolutional layer. In Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security. ACM, 5--10. Google ScholarDigital Library
- Belhassen Bayar and Matthew C Stamm . 2017. Design principles of convolutional neural networks for multimedia forensics. Electronic Imaging Vol. 2017, 7 (2017), 77--86.Google ScholarCross Ref
- Tiziano Bianchi, Alessia De Rosa, Marco Fontani, Giovanni Rocciolo, and Alessandro Piva . 2013. Detection and classification of double compressed MP3 audio tracks Proceedings of the first ACM workshop on Information hiding and multimedia security. 159--164. Google ScholarDigital Library
- Bolin Chen, Weiqi Luo, and Haodong Li . 2017. Audio Steganalysis with Convolutional Neural Network Proceedings of the ACM Workshop on Information Hiding and Multimedia Security. 85--90. Google ScholarDigital Library
- Jiansheng Chen, Xiangui Kang, Ye Liu, and Z Jane Wang . 2015. Median filtering forensics based on convolutional neural networks. IEEE Signal Processing Letters Vol. 22, 11 (2015), 1849--1853.Google ScholarCross Ref
- Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter . 2016. Fast and accurate deep network learning by exponential linear units (ELUs) Proceedings of International Conference on Learning Representations.Google Scholar
- Luca Cuccovillo, Sebastian Mann, Patrick Aichroth, Marco Tagliasacchi, and Christian Dittmar . 2013 a. Blind microphone analysis and stable tone phase analysis for audio tampering detection Audio Engineering Society Convention 135.Google Scholar
- Luca Cuccovillo, Sebastian Mann, Marco Tagliasacchi, and Patrick Aichroth . 2013 b. Audio tampering detection via microphone classification IEEE International Workshop on Multimedia Signal Processing. 177--182.Google Scholar
- Hany Farid . 1999. Detecting Digital Forgeries Using Bispectral Analysis. Tech. Rep. AIM-1657, MIT AI Memo, Mass. Inst. Technol., Cambridge, MA, USA (1999). Google ScholarDigital Library
- Catalin Grigoras . 2005. Digital audio recording analysis--the electric network frequency criterion. International Journal of Speech Language and the Law Vol. 12, 1 (2005), 63--76.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun . 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. 1026--1034. Google ScholarDigital Library
- Sohaib Ikram and Hafiz Malik . 2010. Digital audio forensics using background noise. In IEEE International Conference on Multimedia and Expo. 106--110.Google ScholarCross Ref
- Xiaodan Lin, Jingxian Liu, and Xiangui Kang . 2016. Audio recapture detection with convolutional neural networks. IEEE Transactions on Multimedia Vol. 18, 8 (2016), 1480--1487. Google ScholarDigital Library
- Da Luo, Mengmeng Sun, and Jiwu Huang . 2016. Audio postprocessing detection based on amplitude cooccurrence vector feature. IEEE Signal Processing Letters Vol. 23, 5 (2016), 688--692.Google ScholarCross Ref
- Da Luo, Rui Yang, and Jiwu Huang . 2015. Identification of AMR decompressed audio. Digital Signal Processing Vol. 37 (2015), 85--91. Google ScholarDigital Library
- Da Luo, Rui Yang, Bin Li, and Jiwu Huang . 2017. Detection of Double Compressed AMR Audio Using Stacked Autoencoder. IEEE Transactions on Information Forensics and Security Vol. 12, 2 (2017), 432--444. Google ScholarDigital Library
- Weiqi Luo, Haodong Li, Qi Yan, Yang Rui, and Jiwu Huang . {n. d.}. Improved Audio Steganalytic Feature and Its Applications in Audio Forensics. ACM Transactions on Multimedia Computing, Communications, and Applications, accepted (. {n. d.}). Google ScholarDigital Library
- Andrew L Maas, Awni Y Hannun, and Andrew Y Ng . 2013. Rectifier nonlinearities improve neural network acoustic models Proceedings of the International conference on Machine Learning.Google Scholar
- Hafiz Malik and Hany Farid . 2010. Audio forensics from acoustic reverberation. In IEEE International Conference on Acoustics Speech and Signal Processing. 1710--1713.Google ScholarCross Ref
- Catherine Paulin, Sid-Ahmed Selouani, and Eric Hervet . 2016. Audio steganalysis using deep belief networks. International Journal of Speech Technology Vol. 19, 3 (2016), 585--591.Google ScholarCross Ref
- Daniel Seichter, Luca Cuccovillo, and Patrick Aichroth . 2016. AAC encoding detection and bitrate estimation using a convolutional neural network IEEE International Conference on Acoustics, Speech and Signal Processing. 2069--2073.Google Scholar
- Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov . 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of machine learning research Vol. 15, 1 (2014), 1929--1958. Google ScholarDigital Library
- Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton . 2013. On the importance of initialization and momentum in deep learning Proceedings of the International conference on Machine Learning. 1139--1147. Google ScholarDigital Library
- Rui Yang, Yun Q Shi, and Jiwu Huang . 2010. Detecting double compression of audio signal. In Proceedings of the SPIE Media Forensics and Security II. 75410K.Google Scholar
Index Terms
- Identification of Audio Processing Operations Based on Convolutional Neural Network
Recommendations
Audio Steganalysis with Convolutional Neural Network
IH&MMSec '17: Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia SecurityIn recent years, deep learning has achieved breakthrough results in various areas, such as computer vision, audio recognition, and natural language processing. However, just several related works have been investigated for digital multimedia forensics ...
Fake Faces Identification via Convolutional Neural Network
IH&MMSec '18: Proceedings of the 6th ACM Workshop on Information Hiding and Multimedia SecurityGenerative Adversarial Network (GAN) is a prominent generative model that are widely used in various applications. Recent studies have indicated that it is possible to obtain fake face images with a high visual quality based on this novel model. If ...
Audio Steganalysis with Improved Convolutional Neural Network
IH&MMSec'19: Proceedings of the ACM Workshop on Information Hiding and Multimedia SecurityDeep learning, especially the convolutional neural network (CNN), has enjoyed significant success in many fields, e.g., image recognition. Recently, CNN has successfully applied to multimedia steganalysis. However, the detection performance is still ...
Comments