|
ABSTRACT
This paper describes experiments that apply machine learning to compress computer programs, formalizing and automating decisions about instruction encoding that have traditionally been made by humans in a more ad hoc manner. A program accepts a large training set of program material in a conventional compiler intermediate representation (IR) and automatically infers a decision tree that separates IR code into streams that compress much better than the undifferentiated whole. Driving a conventional arithmetic compressor with this model yields code 30% smaller than the previous record for IR code compression, and 24% smaller than an ambitious optimizing compiler feeding an ambitious general-purpose data compressor.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Digital SRC research report 124, 5/10/94.
|
| |
3
|
D. Chickering, D. Heckerman, and C. Meek. A Bayesian approach to learning Bayesian networks with local structure. Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufman, 8/97.
|
 |
4
|
Jens Ernst , William Evans , Christopher W. Fraser , Todd A. Proebsting , Steven Lucco, Code compression, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.358-365, June 16-18, 1997, Las Vegas, Nevada, United States
|
| |
5
|
M. Franz and T. Kistler. Slim binaries. TR 96-24, Dept of Information and Computer Science, University of California, irvine, 6/96.
|
| |
6
|
M. Franz. Adaptive compression of syntax trees and iterative dynamic code optimization: Two basic technologies for mobile-object systems. TR 97-04, Dept of Information and Computer Science, University of California, Irvine, 2/97.
|
| |
7
|
|
| |
8
|
Christopher W. Fraser and Todd A. Proebsting. Custom Instruction Sets For Code Compression. Unpublished manuscript, http://research'micr~s~ft'c~rn/~t~ddpr~/papers/ pldi2.ps, 10/95.
|
| |
9
|
Free Software Foundation. GCC - The GNU C Compiler. http://www.gnu.org/software/gcc, 8/13/98.
|
| |
10
|
Jean-Loup Gailly and Mark Adler. The gzip home page. http://w3.gzip.org. 1/21/99.
|
| |
11
|
|
| |
12
|
|
| |
13
|
A. Lempel and J. Ziv. On the complexity of finite sequences. IEEE Transactions on Information Theory 22(1):75-81, 1/76.
|
 |
14
|
|
| |
15
|
Julian Seward. The bzip2 and libbzip2 home page. http://www.muraroa.demon.co.uk, 2/11/99.
|
| |
16
|
C. E. Shannon. Prediction and entropy of printed English. Bell System Technical Journal 30:50-64, 1/51.
|
 |
17
|
Richard E. Sweet , James G. Sandman, Jr., Empirical analysis of the mesa instruction set, Proceedings of the first international symposium on Architectural support for programming languages and operating systems, p.158-166, March 01-03, 1982, Palo Alto, California, United States
|
| |
18
|
Rik van de Wiel. Code compaction bibliography. http://www, win.tue.nl/csdpa/rikvdw/bibl.html, 2/3/99.
|
| |
19
|
|
| |
20
|
J. Ziv and A. Lempel. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24(5):530-536, 9/78.
|
INDEX TERMS
Primary Classification:
E.
Data
E.4
CODING AND INFORMATION THEORY
Subjects:
Data compaction and compression
Additional Classification:
D.
Software
D.3
PROGRAMMING LANGUAGES
D.3.4
Processors
Subjects:
Compilers
I.
Computing Methodologies
I.2
ARTIFICIAL INTELLIGENCE
I.2.3
Deduction and Theorem Proving
Subjects:
Inference engines
General Terms:
Design,
Experimentation,
Languages,
Measurement,
Performance,
Theory
Keywords:
abstract machines,
code compaction,
code compression,
compiler intermediate languages and representations,
data compression,
decision trees,
machine learning,
statistical models,
virtual machines
|