Add buffer system
The buffer system allows the cpu to memcpy (or dma? maybe...) data to the accelerator so the accelerator can access it quickly. It can operate on the entire chunk of data, then send it out to another buffer.
I think I still need 2 buffers because it is possible that the the output can be larger than the input. Obviously for decompression, but even in compression if every alpha value is different then the file will be 4/3 bigger.