Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IR stream reader API redesign #539

Open
LinZhihao-723 opened this issue Sep 17, 2024 · 0 comments
Open

IR stream reader API redesign #539

LinZhihao-723 opened this issue Sep 17, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@LinZhihao-723
Copy link
Member

Request

As the IR format has evolved, an IR stream (ignoring the preamble and end-of-stream byte) is no longer a sequence of serialized unstructured log events. In addition to log events, we’ve introduced other concepts that may change the stream's state but without producing a log event. For clarity, we’ll refer to these “concepts,” including log events, as IR units. For example, to support loggers that change time zones, we’ve added an IR unit that indicates a UTC offset change. These new IR units may appear in between log-event IR units. Moreover, the order of these IR units is unpredictable, so we cannot say, for instance, that they will appear after every three log events. The IR units we have in the latest IR format are:

  • Log event: user-decided information stored as key-value pairs
  • UTC offset change: change of UTC offset indicating the time zone information
  • Schema tree node insertion: insert new nodes to grow the schema tree
  • End of stream indicator: end of stream

Note that although our current IR streams can be stateful, that statefulness was always updated with each log event. For instance, the four-byte-integer encoding IR stream stores the timestamp in each log event as a timestamp delta; thus, an IR stream reader needs to keep track of the absolute timestamp of the last log event so that it can calculate the absolute timestamp of the next log event as last_log_event_abs_timestamp + next_log_event_timestamp_delta. This stream state is updated after deserializing each log event. However, as mentioned before, a UTC-offset-change IR unit may be updated in between any number of log events, and it then affects any log events deserialized afterward. (Although we discuss reading/deserializing IR streams above, the process is similar for writing/serializing.)

The current IR stream reader APIs make it easy to read log events, but have several limitations for IR formats that include additional IR units like UTC offset changes. Currently, when the caller calls the API to read a log event, the reader will read all IR units up to and including the next log event. For instance, if there are one or more UTC offset changes before the next log event, each would be read—updating the reader’s state—and then the log event would be read and returned. The limitations of this design are as follows:

  • When the IR stream reader encounters an error, the reader needs to revert any state changes it made and reset the read head (i.e., each read needs to operate like a transaction). This is for two reasons:
    • A common error scenario is a truncated read in our FFI libraries—the caller passes in a buffer of IR stream data to be read, and that buffer may not contain enough content to completely read up to and including the next log event.
    • We want to leave the error handling decision to the caller rather than swallowing it.
  • If the caller cares about any state changes, they need to manually compare the reader’s state before and after reading a log event in order to determine what state has been updated.
    • Callers also can’t determine if there were consecutive updates to a piece of state (e.g., a UTC offset change).

Thus, we propose redesigning the reader’s APIs to solve these issues.

Possible implementation

To read IR streams, we propose a class structure that consists of a deserializer class and optional user-defined IR unit handlers. Intuitively, the deserializer will be responsible for deserializing IR units from the stream. Users of the deserializer can pass in IR unit handlers for the IR units they are interested in. When the deserializer deserializes one of these IR units, it will call the relevant IR unit handler, allowing the user to perform any additional handling for the IR unit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
1 participant