Consider the range [B,T) of storage immediately before a further letter is added in, and let s be the upper bound of T-B. Observe that we can identify three (possibly empty) zones within the digits that compose any number in the range; for example if s = 1000 then [B,T) might be 考虑范围[B,T)存储在紧接的另一封信中,加入,让s为上限的T-B。观察,我们可以识别3(可能为空)个区域内的位数构成任何范围内的数字,例如当s=1000则[B,T)可能会 [1319314 , Remember that T-1, not T, is the highest number in the range. 记住的T-1,而不是T,是该区域中的{zg}数字。 Zone 1 consists of digits that are common to every number in the range, and thus are unaffected by the choice of remainder. These digits may be committed to the transmitter or to storage. 1区包括数字所共有的每一个范围内的数字,因此受到了影响其余部分的选择。这些数字可能是致力于发射机或储存。 Zone 2 consists of n digits forming a number or , where d is a single digit and b is the base of the encoding. In our example n=2 and d=2. Zone 2 is the digit that may be affected by the choice of remainder, but which are not required in order to distinguish between two numbers in the range. We shall call these the delayed digits, and (d,n) identifies the possible values of the delayed digits. By convention, if n=0 then d=0. 2区由n形成一个数字或,其中d是数字一位数,b是编码的基础。在我们的例子n=2和d=2。2区位可能由其余的选择,但不要求以两个数字之间的区分范围内的影响。我们将称这些延迟的数字,和(d,n)的确定延误数字可能的值。按照惯例,如果n=0,那么d=0。 Zone 3 consists of the rightmost w digits, and is sufficient to distinguish between any two numbers from the range. 3区包括最右边w数字,足以和任何两个数字之间的区分范围。 Consider the range [B',T'), with committed digits c, and delayed digits represented by (d,n). Let x be the committed digits after resolving the delay high, i.e. 考虑范围[B',T'),与c犯下的数字,并代表延迟数字(d,n)的。设x是致力于解决数字延迟后高,即
then we shall express [B',T'] as 然后我们将表示[B',T']的 c,(d,n),[B,T] where B=B'-xs, and T=T'-xs. For example, [1319314,1320105) becomes 13,(2,2),[-686,105). The remaining width is T-B and if we combine c,(d,n),[B,T] with the partial remainder then we create the range c,(d,n),[B+i, B+j]. If B+j≤0 then we may resolve the delay low: if B+i≥0 then we may resolve the delay high. Figure 3 shows all the interesting possibilities that can arise. 其中B=B'-xs和T=T'-xs。例如,[1319314,1320105)变成13,(2,2),[-686,105)。其余的宽度是T-B和如果我们把c,(d,n),[B,T]与其余部分,我们创建的范围c,(d,n),[B+i, B+j]。如果B+j≤0,那么我们可以解决拖延低:如果B+i≥0的话,我们可以解决拖延高。图3显示了一个有趣的可能性,所有可能出现。 We have now reduced the ranges to a form that we can implement easily since if the range is c,(d,n),[B,T) then: 我们现在已经减少了不等的形式,我们可以很容易实现,因为如果该范围是c,(d,n),[B,T),那么: -s<B<T≤+s d is a single digit n is a small integer c need not be held in the encoder/decoder. We have one further refinement before our algorithm is complete. It is most unlikely that the number of delayed digits will ever grow very large, but we may wish to impose an upper limit, One way in which we may force resolution of the delay is to reduce the top of the range, or to increase the bottom of the range. Thus, for example, 我们有一个进一步改进我们的算法是前完成。这是最不可能的延迟位数都不会变得很大,但我们不妨规定一个上限,一个途径,我们可能会迫使延迟的解决办法是减少的范围,顶部或增加底部范围。因此,例如, 13,(2,3),[-660,140) => 13,(2,3),[-660,000) => 13199,(0,0),[340,1000) This wastes at most one bit of storage. 最多一个这种现象既浪费位的存储空间。 Figure 3 Illustrating c,(d,n),[B,T] + [i,j] This shows the effect of encoding a letter as partial remainder to the range 13,(2,2),[-686,105], and adjusting the resulting range so that the remaining width is as high as possible without exceeding 1000. 这表明了作为部分编码范围内的其余部分字母效应13,(2,2),[-686,105],以及由此产生的调整范围,使剩余的宽度尽可能高的不超过1000。 Case 1. The letter encodes in storage width 791 as [000,080) 案例1。在存储宽度791作为[000,080)信编码 13,(2,2),[-686,105)+[000,080) => 13,(2,2),[-686,-606) Case 2. The letter encodes in storage width 791 as [620,700) 案例2。在存储宽度791作为[620,700)信编码 13,(2,2),[-686,105)+[620,700) => 13,(2,2),[-066,014) Case 3. The letter encode in storage width 791 as [700,791) 案例3。在存储宽度791作为[620,700)信编码 13,(2,2),[-686,105)+[700,791) => 13,(2,2),[014,105) ----------------------------------------------------------------------------- Observations Sort order : The sort order of encoded messages is the same as the sort order implied for uncoded messages by the alphabetic order chosen in the implementation of the frequency algorithms. In [2] this is called the strong alphabetic property. 排列顺序:信息的编码排序顺序,作为未编码信息隐含在频率算法的实现所选择的字母顺序排序的顺序相同。在[2]这就是所谓的强字母财产。 Prefix codes : Prefix encoding (e.g. Huffman encoding) is the most popular encoding for removing alphabetic redundancy, so it is pleasing to find that any prefix encoding can be generated or read using the range encoding algorithm that we have developed. 前缀码:前缀编码(例如霍夫曼编码)是xx冗余{zlx}的字母编码,因此它很高兴知道,任何前缀编码可以生成或读取使用范围编码算法,我们开发。 Consider a message encoded using a prefix encoding, where any letter 'a' encodes to a string of digits of length ua and numerical value va. The same message will encode to the same encoding using the range encoding algorithm if we define and for all 'a', where b is the base of both encodings. 考虑使用信息编码的前缀编码,其中任何字母'a'编码成一个数字串的长度ua和数值va。同样的消息将编码相同的编码范围内使用的编码算法,如果我们定义和对于所有'a',其中b是两个编码基础。 The corollary is that any messages encoded in a single context will form an encoding that can be treated as a prefix encoding if for all 'a', fa is a power of b and Fa/fa is an integer. 总而言之,在单一环境中编码的任何信息将成为一个可以作为如果所有'a',fa是b的权力和Fa/fa是一个整数编码前缀编码处理。 Recognising end of message : The decoder is driven by whatever wants the message, and it is the responsibility of the driver to recognise the end of a message. If the driver continues to ask for letters after the end of a message, it will get spurious letters. If the message is not self delimiting we must add a letter 'end-of-message' to the alphabet. 鉴于该消息的结尾:解码器是由什么希望消息驱动,它是驱动程序的责任,认识到一条消息的结束。如果驱动程序继续为信要求后,邮件的结束,是会得到虚假的信件。如果消息是,我们不能自我界定必须添加一个字母“末端的讯息”的字母。 Context Since f and F map letters in context to probabilities, we should properly talk about fca, Fca, and Lca, where fca is the probability of encountering the letter 'a' in context c, and similarly for F and L. In our example up till now there has been only one context; we shall now derive F and L for an example involving several contexts. 由于f和F的关系网图信概率,我们应该正确地谈论fca,Fca和Lca,其中禁区,是遇到的信'a'的范围内的概率c,同样为F和L。在我们的例子到目前为止已有只有一个方面,我们现在获得F和L的一个例子,由于若干情况。 In 1952 Oliver modelled [5] a typical television signal as drawn from an alphabet of m levels, where each letter had probability of differing from the previous letter by n levels in either direction, where k<1, and p is a function of the previous letter. 1952年,奥利弗蓝本[5]一个典型的电视信号是来自各级的m,每个字母的概率已经从在两个方向,其中k<1,p是一个函数n水平上一封信不同的字母以前的信。 Each level is encoded in the context of the preceeding level, and it can be shown that: 每个级别是编码前的水平情况下,它可以证明:
This can easily be implemented if the encoder holds a list of the values of for 0≤i≤m. 这可以很容易地实现编码器,如果持有的值列表为0≤i≤m。 Lcj is the highest letter 'a' for which Fca<j, i.e. the highest such that: Lcj是{zg}的信'a',这些Fca<j,即{zg}的是:
Thus L too can easily be implemented given a list of the values of for 0≤i≤m. 因此,L也可以很容易地实施给予了值列表为0≤i≤m。 The context of improbable letters s reflects the largest integer that our encoder is built to handle, and until now we have assumed that frequency algorithm f can only be used with an encoder parameterised by s if for all contexts c and letters 'a', s/b≥1/fca, or fca=0. By fca=0 we mean that letter 'a' is truely impossible in context c. We shall now consider how we can simply transform any f, F and L so that they meet this constraint. s反映{zd0}的整数,我们的编码器的设计是处理,直至现在我们假设频率算法f可以只用s参数化的所有情况下,如果c和字母'a',s/b≥1/fca,或fca=0。通过fca=0的意思是字母'a'是真正的情况下不可能的c。我们现在考虑我们如何可以简单地改变任何F,F和L,使他们达到这个限制。 Consider a context x where r is the width in which we must encode the next letter. The range of the letter is . If this range is null, i.e. , then we cannot encode the letter 'a' When we encounter such a range, then we will steal one value from the next non-null range above, namely , to represent the context marker Cy, which marks the fact that the next letter is coded in the context y. The range of Cy is . 考虑上下文x其中r是宽度,我们必须编码的下一个字母。该信的范围。如果此范围为空,即,我们就不能编码字母'a',当我们遇到这样一个范围,那么我们就会从下一个偷一个价值的非空范围以上,即,代表上下文标记Cy,这标志着下一个字母是在上下文编码y的事实Cy的范围是。 Now all letters e such that will result in the generation of Cy, except perhaps the highest such letter. We shall identify the range of letters that do as [α,β) where α is the lowest such letter, and β is next letter above the highest such letter. 现在,所有字母e,这样将导致一代的Cy,也许除了{zg}的信。我们将确定一系列的字母不为[α,β),其中α是{zd1}的这封信,和β高于{zg}这封信的下一个字母。 Let us consider a letter 'a' for which the range is not null, i.e. . If the next possible letter below 'a' causes the generation of any context marker Cz, then the range of 'a' is reduced to , since the value is stolen to represent Cz. If this reduced range is null, i.e. , then letter 'a' must also generate context marker Cz. 让我们考虑一个字母'a'的范围为不为空,即。如果下一个低于'a'原因可能信中的任何标记生成Cz,那么'a'是减少到,因为值被盗代表范围Cz。如果此范围减少为空,即,那么字母'a'也必须产生背景标记Cz。 Thus the range of letters [α,β) that generate the context marker Cy is all those letters whose range is included in the range of Cy. 因此,范围的信件[α,β)的产生背景标记Cy是所有那些范围的Cy范围包括信件。
The context y is a context of improbable letters in which we encode the letter that caused the generation of the context marker Cy. 上下文y是一个我们在其中编码信造成的上下文标记代Cy不可能的信中。 F and f are defined in the context y by: F和f是在上下文y定义为:
If we can calculate Fxa-Fxe directly as a floating point number, where 'a' and e are any two letters, then we do not have to work in double precision even when encoding improbable letters. This process may be repeated to any depth, and thus we may (for example) perform any encoding on an eight bit micro processor. 如果我们能计算Fxa-Fxe直接作为浮点数,其中,'a'和e是任意两个字母,那么我们没有工作的双精度即使编码不可能的信件。这个过程可能会重复的任何深度,因此,我们可以(例如)上执行任何一个8位微处理器的编码。 Note that the algorithm still generates prefix codes if for all 'a', fa is a power of the base, and Fa/fa is an integer. 请注意,仍然生成算法如果所有'a',fa为基力量前缀代码,Fa/fa是一个整数。 Conclusion We are now able to separate the task of describing redundancy from the task of removing it. If we can describe it concisely, we can remove it cheaply. 我们现在能够分开描述,从它的任务删除冗余的任务。如果我们能形容它简洁,我们可以将其删除便宜。 For the sake of brevity, we merely state that messages encoded using range encoding will have an average length little more than digits longer than the theoretical optimum. This paper will also be published as a University of Warwick Theory of Computation report, where we shall justify that statement, and include an APL model of a range encoder and decoder. 为了简洁起见,我们只是指出,信息编码的使用范围将有编码的平均长度多位比理论的{zj0}时间。本文也将公布的英国沃里克大学的计算报告,理论,我们应提出声明,其中包括一个范围的编码器和解码器APL的模式。 Acknowledgements I am greatful to my employers, IBM, for a valuable education award that has enabled me to attend Warwick University to write up this and other ideas. I am particularly grateful to my supervisor Dr. M. S. Paterson, for his help in the preparation and presentation of this algorithm. 我感谢我的雇主,IBM公司,为宝贵的教育奖,它使我参加英国华威大学写了这个和其他的想法。我特别感谢我的主管帕特森博士硕士,他在准备和介绍该算法的帮助。 Post script Since writing this report, two papers by J. J. Rissanen have been brought to my notice [6,7]. The ideas in those papers and in this appear to be closely related, and it will be interesting to compare them in detail. 由于编写本报告的,由JJ 60.3两个文件已经向我提出通知[6,7]。在这些文件的想法,并在这似乎是密切相关的,这将是有趣的比较详细的。 References [1] A method for the construction of minimum redundancy codes. [2] Variable length binary encodings. [3] Adaptive data compression [4] Data compression and adaptive telemetery. [5] Efficient coding [6] Generalised Kraft Inequality and Arithmetic Coding [7] Arithmetic coding
|