Lossless Data Compression with Neural Networks
==============================================

1) Overview
-----------

NNCP is an experimental lossless data compressor using Neural
Networks. Two models are available: Transformer (slower but best
ratio) and LSTM (faster). A text preprocessor and tokenizer can
optionally be enabled. More information is available at
https://bellard.org/nncp .

Thanks to LibNC, it supports both the GPU (NVIDIA CUDA version 11.x or
12.x required with a Ampere, ADA or Hopper GPU) and CPU. For an
acceptable speed with large models, a GPU is required.

2) Compilation
--------------

A Linux system is assumed. Just type 'make' to compile the program. A
binary DLL of the LibNC library is included in the archive. When using
CUDA, cuBLAS must be installed too.

Windows cross-compilation from Linux is supported provided the
libnc*.dll files are copied from the Windows version.

5) Current best models for enwik8/enwik9
----------------------------------------

enwik8:

  ./nncp --cuda --profile enwik8 --preprocess 4096,512 c enwik8 out.bin
  
  Result: 14915298 bytes (13.2 hours)

enwik9:

  ./nncp --cuda --profile enwik9 --preprocess 16384,512 c enwik9 out.bin

  Result: 106632363 bytes (2.8 days)

Decompression:

  ./nncp --cuda d out.bin out.txt

  Decompression has a similar speed than compression.

Test system: AMD Ryzen 9 3900 + RTX 4090 NVIDIA GPU

Memory usage: CPU: 1 GB, GPU: 6.6 GB

6) Licence
----------

The source code is released under the MIT licence.

The LibNC library is provided in binary form and can be freely redistributed.
