On improving the efficiency of CLP compression and decompression #501

lihaoZhang1234 · 2024-07-30T03:01:19Z

Request

Why can't we fix the template during compression and then compress according to the template, this will improve the efficiency of compression and decompression of CLP. For example, application scenarios: when an enterprise compresses daily logs under the same application, when the template is fixed, there is no need to repeatedly extract the template, and it can be compressed and decompressed directly according to the template, which greatly saves compression time and computational resources.

Possible implementation

Template extraction of an application's logs prior to compression, after which the application's daily logs are compressed directly according to the template.

jackluo923 · 2024-07-30T04:10:10Z

It is technically possible to use fix "template" during compression, but we don't have much motivation in doing so. Generating "templates" on-the-fly in CLP is extremely cheap. Sharing a "template" introduces additional complexities, such as managing shared templates, handling new templates discovered in the logs, etc, which may outweigh the benefits of using a fixed template. Keeping everything self-contained, with an independent template for each archive, makes design simple with compression and search embarrassingly parallelizable with no data dependencies across archives.

That said, we have internally experimented with dictionary pre-training (an improved version of the "fixed template"). In this approach, compression dictionaries are pre-trained on a dataset that can be used across multiple datasets. Only the "delta" (new dictionary entries) needs to be saved in the archive. IIRC, as expected, there were no noticeable performance gain, only compression ratio gain which is expected. If customers have strong need to achieve maximum compression ratio (we have many options and tuning to achieve higher compression ratio) and the additional complexity is involved is justifiable, then we can consider moving this experimental feature into production code.

lihaoZhang1234 · 2024-07-31T09:42:04Z

Will this improved method of fixing templates be open-sourced in the future? Does this fixed template method increase its compression and decompression time as the compression rate goes up?

lihaoZhang1234 added the enhancement New feature or request label Jul 30, 2024

jackluo923 closed this as completed Jul 30, 2024

jackluo923 reopened this Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On improving the efficiency of CLP compression and decompression #501

On improving the efficiency of CLP compression and decompression #501

lihaoZhang1234 commented Jul 30, 2024

jackluo923 commented Jul 30, 2024 •

edited

Loading

lihaoZhang1234 commented Jul 31, 2024

On improving the efficiency of CLP compression and decompression #501

On improving the efficiency of CLP compression and decompression #501

Comments

lihaoZhang1234 commented Jul 30, 2024

Request

Possible implementation

jackluo923 commented Jul 30, 2024 • edited Loading

lihaoZhang1234 commented Jul 31, 2024

jackluo923 commented Jul 30, 2024 •

edited

Loading