-
Notifications
You must be signed in to change notification settings - Fork 62
Unify OTLP/OTAP gRPC Configuration and Introduce Experimental Non-Tonic Receiver #1382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1382 +/- ##
==========================================
- Coverage 83.91% 83.91% -0.01%
==========================================
Files 398 409 +11
Lines 108993 112861 +3868
==========================================
+ Hits 91466 94708 +3242
- Misses 16993 17619 +626
Partials 534 534
🚀 New features to boost your workflow:
|
Co-authored-by: Utkarsh Umesan Pillai <66651184+utpilla@users.noreply.github.com>
| let mut deduped = Vec::with_capacity(methods.len()); | ||
| for method in methods { | ||
| if !deduped.contains(&method) { | ||
| deduped.push(method); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is only needed for CompressionConfigValue::List. We could move it inside the match statement to make the code simpler.
| // Note: without status set, the OTAP encoder fails at runtime | ||
| .status(otap_df_pdata::proto::opentelemetry::trace::v1::Status::new( | ||
| StatusCode::Ok, | ||
| "ok", | ||
| )) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Note: without status set, the OTAP encoder fails at runtime | |
| .status(otap_df_pdata::proto::opentelemetry::trace::v1::Status::new( | |
| StatusCode::Ok, | |
| "ok", | |
| )) | |
| .status(otap_df_pdata::proto::opentelemetry::trace::v1::Status::new( | |
| StatusCode::Ok, | |
| "ok", | |
| )) |
Maybe fixed in #1436
| use otap_df_pdata::proto::opentelemetry::collector::logs::v1::ExportLogsServiceResponse; | ||
| use otap_df_pdata::proto::opentelemetry::collector::metrics::v1::ExportMetricsServiceResponse; | ||
| use otap_df_pdata::proto::opentelemetry::collector::trace::v1::ExportTraceServiceResponse; | ||
| use parking_lot::Mutex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 TIL
| } | ||
| } | ||
|
|
||
| /// Applies the shared server tuning options to a tonic server builder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// Applies the shared server tuning options to a tonic server builder. | |
| /// Applies the shared server tuning options to a server builder. |
| struct AckRegistryInner { | ||
| slots: Box<[AckSlot]>, | ||
| free_stack: Vec<usize>, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a tiny bit confused. Is this the new experimental code intended to replace something else, or have you already replaced something with this? This looks like the data type in crates/otap/src/accessory which use slotmap, which I see still referenced from otap_grpc/otlp/server.rs. This looks like an alternative to slotmap?
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can write your own gRPC implementation, but it better be 1000 lines or less!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol yeah we gotta delete all the comments to get under the line limit
| F: Fn(T) -> OtapArrowRecords + Send + Copy + 'static, | ||
| { | ||
| stream::unfold(state, |mut state| async move { | ||
| match state.next_item().await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the same problem persists. The receiver requires the client to poll the output stream. Even though fill_inflight() reads data eagerly from the input stream, it won’t run until the client polls the output stream, right?
client polls the output stream -> `ArrowBatchStreamState::next_item()` -> `ArrowBatchStreamState::fill_inflight()`
| } | ||
|
|
||
| #[test] | ||
| #[ignore = "temporarily disabled while investigating produce_bar failure"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utpilla
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the effort on this! I wanted to share a few thoughts and concerns. To be upfront, I don’t have deep expertise in HTTP/2 or gRPC internals, so I can’t fully evaluate this implementation and that’s part of why I’m raising these points.
I understand the motivation behind avoiding Send bounds for a thread-per-core design, and that seems like a valid goal. My concern is more about long-term maintainability than the immediate code. HTTP/2 and gRPC have subtle requirements around flow control, connection state, error handling, deadline propagation, and header compression. I guess my question is less about whether this works now but whether it will still work correctly down the road when someone encounters a weird client, a specific network condition, or a new gRPC feature. Tonic has years of production exposure and an active community finding and fixing edge cases. With a custom implementation, we'd be building up that production experience from scratch. I’m not sure how many of us (myself included) feel comfortable debugging HTTP/2 frame-level issues if they arise later. Just something to consider on the maintenance side.
Maybe we could feature flag this implementation rather than trying to make it default? This would let teams who need the thread-per-core performance benefit opt in, while keeping Tonic as the battle-tested default. It would also give time to harden the h2 implementation with early adopters before broader rollout.
|
@utpilla We are on the same page. I plan to keep both the OTLP and OTAP receivers for quite some time, in addition to this experimental receiver. |
Align OTLP/OTAP receivers on shared gRPC config
This PR is a subset of #1357 and focuses on the OTLP and OTAP receivers.
Key changes for the OTAP receiver
New experimental receiver (non-Tonic)
This PR also introduces an experimental OTAP receiver (otel_receiver) that does not rely on Tonic. The intent is to:
This experimental receiver does not support OTLP yet, but that is coming soon.
The following diagram describes the overall design to ease the review process.