You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why can't we fix the template during compression and then compress according to the template, this will improve the efficiency of compression and decompression of CLP. For example, application scenarios: when an enterprise compresses daily logs under the same application, when the template is fixed, there is no need to repeatedly extract the template, and it can be compressed and decompressed directly according to the template, which greatly saves compression time and computational resources.
Possible implementation
Template extraction of an application's logs prior to compression, after which the application's daily logs are compressed directly according to the template.
The text was updated successfully, but these errors were encountered:
It is technically possible to use fix "template" during compression, but we don't have much motivation in doing so. Generating "templates" on-the-fly in CLP is extremely cheap. Sharing a "template" introduces additional complexities, such as managing shared templates, handling new templates discovered in the logs, etc, which may outweigh the benefits of using a fixed template. Keeping everything self-contained, with an independent template for each archive, makes design simple with compression and search embarrassingly parallelizable with no data dependencies across archives.
That said, we have internally experimented with dictionary pre-training (an improved version of the "fixed template"). In this approach, compression dictionaries are pre-trained on a dataset that can be used across multiple datasets. Only the "delta" (new dictionary entries) needs to be saved in the archive. IIRC, as expected, there were no noticeable performance gain, only compression ratio gain which is expected. If customers have strong need to achieve maximum compression ratio (we have many options and tuning to achieve higher compression ratio) and the additional complexity is involved is justifiable, then we can consider moving this experimental feature into production code.
Will this improved method of fixing templates be open-sourced in the future? Does this fixed template method increase its compression and decompression time as the compression rate goes up?
Request
Why can't we fix the template during compression and then compress according to the template, this will improve the efficiency of compression and decompression of CLP. For example, application scenarios: when an enterprise compresses daily logs under the same application, when the template is fixed, there is no need to repeatedly extract the template, and it can be compressed and decompressed directly according to the template, which greatly saves compression time and computational resources.
Possible implementation
Template extraction of an application's logs prior to compression, after which the application's daily logs are compressed directly according to the template.
The text was updated successfully, but these errors were encountered: