The document describes the adaptive arithmetic coder FArCo (Fast Arithmetic Coder), developed for use in the video codec WMC (Wavelet Media Codec). The main goal is the performance optimization, which has been successfully achieved. There is no division operations in the encoder and decoder, mostly 16-bit integer calculations are used (with the exception of a few 32-bit additions and subtractions), and the vast merit for achieving high performance belongs to the parallelism and the use of integer SIMD instructions. This document presents the characteristics of a draft implementation of the encoder, and soon they will improve.

Introduction

        The basic motivation for the implementation of this version of the arithmetic coder was a need for faster and more efficient coding algorithm for statistical data to be used in the codec WMC (Wavelet Media Codec) for encoding video sequences. The first version of the video codec there were used the Exp-Golomb codes, which are difficult to adapt to changes in statistical data and their efficiency is quite low compared with arithmetic coding. The high performance of the arithmetic coding is due to the algorithmic and hardware (SIMD) optimization, through some algorithmic simplifications has resulted in a slight deterioration of compression efficiency.

Features of implementation

        The presented implementation of the arithmetic coder has the following features and limitations:

  •  input alphabet size is 4 bits [0 .. 15] or less which if motivated by the usage in the video codec
  • adaptive model is periodic
  •  calculations are performed mostly using 16-bit integer arithmetic, except for a few additions and subtractions with 32-bit arguments
  •  divisions eliminated
  •  the minimum number of conditional jumps
  •  simplified search on the decoder’s side (practically none)
  •  x86-based platform requires SSE2 instructions (future implementations will use SSSE3 instructions).

        The algorithm is not restricted to specific hardware resources (such as, for example, x86 platform and SIMD instructions set), it can be easily optimized for many different platforms utilizing each platform’s advantages. Coder is based on the algorithm that can be easily adapted to the vast majority of platforms, and allows efficient use of hardware parallelism, if any. Besides, the decoder’s use of integer 16-bit arithmetic, and the absence of division makes it easy to port algorithm to mobile platforms. Hardware implementation of the algorithm would achieve pretty high coding performance with very low energy consumption, which is important for the use in mobile devices.

Achieved performance

        The goal of the optimization, which consisting in the implementation of the algorithm with the efficiency of the arithmetic coder while maintaining high coding performance (as for Exp- Golomb codes) has been successfully achieved. In addition, there’re still some reserves remaining which enable achieving even higher performance, as well as there’re opportunities to increase coding efficiency. In the figures below the results of comparison of FArCo with a well- known fast implementation of the arithmetic encoder FastAC and with the entropy coder Exp- Golomb currently used in video codec. Performances of the decoders is presented in the figure by the dotted lines. All calculations were performed on the single-core x86 Intel Pentium 4 2.8 GHz CPU.

grafik.png

          As one can easily see from the figures the presented codec demonstrates more than threefold performance advantage compared to one of the fastest implementations of the arithmetic coder and almost achieves the speed of simple and fast Exp-Golomb encoder. The second figure shows the coding efficiency (calculated as the difference between the resultant number of bits per symbol and the source entropy, the lower value correspond to the better efficiency).

grafik2.png

Further Plans

         This document contains only the draft performance of the FArCo arithmetic coder. Its further development and optimization is carried out in several directions:

  •  Integration into WMC video codec
  • Optimization using SSSE3 instructions (x86 platform)
  •  Optimization for multi-core platforms
  •  Reducing the number of simplifications in coding algorithm to improve the efficiency of

compression

  •  Optimize the decoder
  •  Porting to other platforms.