Unify OTLP/OTAP gRPC Configuration and Introduce Experimental Non-Tonic Receiver #1382

lquerel · 2025-11-04T01:48:37Z

Align OTLP/OTAP receivers on shared gRPC config

This PR is a subset of #1357 and focuses on the OTLP and OTAP receivers.

Both gRPC-based receivers now share the same configuration structure (GrpcServerSettings) to ensure greater consistency. The configuration has also been significantly extended to allow for more precise tuning. Parameters representing byte quantities can now be expressed either as raw byte counts or using units such as MB, MiB, etc.
Adds flexible compression deserialization (single value, list, or none) and reuses it across request/response compression settings. By default, response compression is no longer enabled since responses are typically very small. Both zstd and gzip are enabled for requests.
Pushes max-concurrent-request tuning into both receivers (and exposes a helper) so that downstream capacity directly drives the gRPC concurrency clamp.
Updates the OTLP/OTAP servers to honor the new settings: compression preferences, adaptive windows, and message-size limits are applied once when the services are built.

Key changes for the OTAP receiver

Response streaming is now driven by an async state machine instead of spawning a new task per request. This removes the extra mpsc hop and preserves backpressure.
Ack/Nack correlation slots are protected by a parking_lot::Mutex, providing fast, non-poisoning locking in async contexts where a poisoned std::sync::Mutex would otherwise stall the Tokio worker.
Compression preferences, concurrency limits, and middleware (such as zstd header handling) are applied once per service construction so that hot-path processing remains lean.

New experimental receiver (non-Tonic)

This PR also introduces an experimental OTAP receiver (otel_receiver) that does not rely on Tonic. The intent is to:

Better align our gRPC-based receivers with the thread-per-core design of the OTAP pipeline engine
Improve control over the internal mechanics of these receivers. A first version of the admission controller is included; a future PR will extend it to protect the engine and keep CPU/memory usage under control
Improve overall performance
Support OTAP and OTLP on the same port

This experimental receiver does not support OTLP yet, but that is coming soon.

The following diagram describes the overall design to ease the review process.

…ration

codecov · 2025-11-04T01:51:37Z

Codecov Report

❌ Patch coverage is 82.09203% with 755 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.91%. Comparing base (e771ca9) to head (8664c2f).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1382      +/-   ##
==========================================
- Coverage   83.91%   83.91%   -0.01%     
==========================================
  Files         398      409      +11     
  Lines      108993   112861    +3868     
==========================================
+ Hits        91466    94708    +3242     
- Misses      16993    17619     +626     
  Partials      534      534

Components	Coverage Δ
otap-dataflow	`85.62% <82.09%> (-0.12%)`	⬇️
query_abstraction	`80.61% <ø> (ø)`
query_engine	`90.61% <ø> (ø)`
syslog_cef_receivers	`∅ <ø> (∅)`
otel-arrow-go	`53.50% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rust/otap-dataflow/crates/config/src/byte_units.rs

rust/otap-dataflow/crates/otap/src/otap_grpc.rs

Co-authored-by: Utkarsh Umesan Pillai <66651184+utpilla@users.noreply.github.com>

rust/otap-dataflow/crates/engine/src/local/receiver.rs

utpilla · 2025-11-04T23:20:37Z

rust/otap-dataflow/crates/otap/src/compression.rs

+    let mut deduped = Vec::with_capacity(methods.len());
+    for method in methods {
+        if !deduped.contains(&method) {
+            deduped.push(method);
+        }
+    }


This logic is only needed for CompressionConfigValue::List. We could move it inside the match statement to make the code simpler.

rust/otap-dataflow/crates/otap/Cargo.toml

jmacd · 2025-11-17T23:37:58Z

rust/otap-dataflow/crates/otap/src/fake_data_generator/fake_signal.rs

+                // Note: without status set, the OTAP encoder fails at runtime
+                .status(otap_df_pdata::proto::opentelemetry::trace::v1::Status::new(
+                    StatusCode::Ok,
+                    "ok",
+                ))


Suggested change

// Note: without status set, the OTAP encoder fails at runtime

.status(otap_df_pdata::proto::opentelemetry::trace::v1::Status::new(

StatusCode::Ok,

"ok",

))

.status(otap_df_pdata::proto::opentelemetry::trace::v1::Status::new(

StatusCode::Ok,

"ok",

))

Maybe fixed in #1436

jmacd · 2025-11-17T23:46:45Z

rust/otap-dataflow/crates/otap/src/otap_grpc/otlp/server.rs

 use otap_df_pdata::proto::opentelemetry::collector::logs::v1::ExportLogsServiceResponse;
 use otap_df_pdata::proto::opentelemetry::collector::metrics::v1::ExportMetricsServiceResponse;
 use otap_df_pdata::proto::opentelemetry::collector::trace::v1::ExportTraceServiceResponse;
+use parking_lot::Mutex;


jmacd · 2025-11-17T23:56:10Z

rust/otap-dataflow/crates/otap/src/otap_grpc/common.rs

+    }
+}
+
+/// Applies the shared server tuning options to a tonic server builder.


Suggested change

/// Applies the shared server tuning options to a tonic server builder.

/// Applies the shared server tuning options to a server builder.

jmacd · 2025-11-18T00:15:40Z

rust/otap-dataflow/crates/otap/src/otel_receiver/ack.rs

+struct AckRegistryInner {
+    slots: Box<[AckSlot]>,
+    free_stack: Vec<usize>,
+}


I'm a tiny bit confused. Is this the new experimental code intended to replace something else, or have you already replaced something with this? This looks like the data type in crates/otap/src/accessory which use slotmap, which I see still referenced from otap_grpc/otlp/server.rs. This looks like an alternative to slotmap?

jmacd · 2025-11-18T00:50:13Z

rust/otap-dataflow/crates/otap/src/otel_receiver/grpc.rs

+            }
+        }
+    }
+}


You can write your own gRPC implementation, but it better be 1000 lines or less!

lol yeah we gotta delete all the comments to get under the line limit

…f exporter

utpilla · 2025-11-18T01:36:30Z

rust/otap-dataflow/crates/otap/src/otap_grpc.rs

+    F: Fn(T) -> OtapArrowRecords + Send + Copy + 'static,
+{
+    stream::unfold(state, |mut state| async move {
+        match state.next_item().await {


It seems the same problem persists. The receiver requires the client to poll the output stream. Even though fill_inflight() reads data eagerly from the input stream, it won’t run until the client polls the output stream, right?

client polls the output stream -> `ArrowBatchStreamState::next_item()` -> `ArrowBatchStreamState::fill_inflight()`

albertlockett · 2025-11-19T00:03:01Z

rust/otap-dataflow/crates/otap/src/otel_receiver/otap_tests.rs

+}
+
+#[test]
+#[ignore = "temporarily disabled while investigating produce_bar failure"]


@lquerel since #1447 and #1446 are merged this should pass now. I tested locally by merging main into your branch and it seems to work

utpilla

Thanks for the effort on this! I wanted to share a few thoughts and concerns. To be upfront, I don’t have deep expertise in HTTP/2 or gRPC internals, so I can’t fully evaluate this implementation and that’s part of why I’m raising these points.

I understand the motivation behind avoiding Send bounds for a thread-per-core design, and that seems like a valid goal. My concern is more about long-term maintainability than the immediate code. HTTP/2 and gRPC have subtle requirements around flow control, connection state, error handling, deadline propagation, and header compression. I guess my question is less about whether this works now but whether it will still work correctly down the road when someone encounters a weird client, a specific network condition, or a new gRPC feature. Tonic has years of production exposure and an active community finding and fixing edge cases. With a custom implementation, we'd be building up that production experience from scratch. I’m not sure how many of us (myself included) feel comfortable debugging HTTP/2 frame-level issues if they arise later. Just something to consider on the maintenance side.

Maybe we could feature flag this implementation rather than trying to make it default? This would let teams who need the thread-per-core performance benefit opt in, while keeping Tonic as the battle-tested default. It would also give time to harden the h2 implementation with early adopters before broader rollout.

lquerel · 2025-11-22T02:04:45Z

@utpilla We are on the same page. I plan to keep both the OTLP and OTAP receivers for quite some time, in addition to this experimental receiver.

lquerel and others added 11 commits February 21, 2025 18:19

Contribute Beaubourg to OTEL

4f5a128

Merge branch 'main' into main

f0bf6a4

Merge branch 'open-telemetry:main' into main

39d3b2f

Merge branch 'open-telemetry:main' into main

94c1c2f

Merge branch 'open-telemetry:main' into main

a6cc070

Merge branch 'open-telemetry:main' into main

2b1c910

Merge branch 'open-telemetry:main' into main

4646aff

Merge branch 'open-telemetry:main' into main

15910da

Merge branch 'open-telemetry:main' into grpc-receiver-shared-conf

b3264cc

Refactor OTLP and OTAP receiver to share a more extended gRPC configu…

c72610a

…ration

Fix otap_exporter tests

237d194

github-project-automation bot added this to OTel-Arrow Nov 4, 2025

lquerel self-assigned this Nov 4, 2025

github-actions bot added the rust Pull requests that update Rust code label Nov 4, 2025

lquerel and others added 7 commits November 3, 2025 22:06

Rename GrpcServerConfig into GrpcServerSettings

1893f32

Rename SharedState into AckSubscriptionState

01a68e4

Change Arc into Rc in local receiver

7067292

Refactor compression.rs

20de200

Fix documentation

ed9ab1a

Document and refactor code for clarity

38ed5f2

Document and refactor code for clarity

21f7e1a

lquerel marked this pull request as ready for review November 4, 2025 18:51

lquerel requested a review from a team as a code owner November 4, 2025 18:51

utpilla reviewed Nov 4, 2025

View reviewed changes

rust/otap-dataflow/crates/config/src/byte_units.rs Show resolved Hide resolved

jmacd reviewed Nov 4, 2025

View reviewed changes

rust/otap-dataflow/crates/otap/src/otap_grpc.rs Outdated Show resolved Hide resolved

rust/otap-dataflow/crates/otap/src/otap_grpc.rs Outdated Show resolved Hide resolved

Update rust/otap-dataflow/crates/config/src/byte_units.rs

a33f48d

Co-authored-by: Utkarsh Umesan Pillai <66651184+utpilla@users.noreply.github.com>

utpilla reviewed Nov 4, 2025

View reviewed changes

rust/otap-dataflow/crates/engine/src/local/receiver.rs Outdated Show resolved Hide resolved

utpilla reviewed Nov 4, 2025

View reviewed changes

rust/otap-dataflow/crates/otap/Cargo.toml Show resolved Hide resolved

jmacd reviewed Nov 18, 2025

View reviewed changes

Add new option to display on the console the signal throughput in per…

6ecf6b4

…f exporter

utpilla reviewed Nov 18, 2025

View reviewed changes

lquerel added 3 commits November 17, 2025 18:26

Replace Vec<u8> by bytes::Bytes in OtlpProtoByte

c921266

Remove redundant tcp nodelay

65036cb

Move the JoinSet:::join_next outside of the select!

d845bef

albertlockett reviewed Nov 19, 2025

View reviewed changes

lquerel added 15 commits November 19, 2025 18:18

Replace select! by a stream approach into handle_tcp_conn

7f03421

Replace select! in otel_receiver and grpc

7974336

Replace spawn_local tasks per stream with FutureUnordered

b0fb382

Refactor code around the response encoder

8ce027f

Fix unit tests

bbff3ca

Improve ack management efficiency

d106109

Use only one zstd decoder

f8e30f1

Document otel_receiver.rs

8bc6512

Fix perf issue + Documentation

ff36a6c

Refactor code for clarity

cffa021

Fix unbounded gRPC frame size & decompressed size

c9f899a

Add timeout on HTTP2 handshake

594c7ef

Fix ack slot leak on pipeline send failure

a60680b

Fix some small corner cases

d04dc74

Add security.md for otel_receiver

4cd0fc7

utpilla reviewed Nov 22, 2025

View reviewed changes

lquerel added 2 commits November 21, 2025 18:26

Add a security.md doc

63f375b

Add architecture description

30d2358

lquerel mentioned this pull request Nov 22, 2025

Progressive plan to improve OTLP and OTAP receivers/exporters + Ack/Nack system #1461

Open

7 tasks

utpilla mentioned this pull request Nov 24, 2025

Client and Server gRPC Settings #1471

Merged

OTLP and OTAP exporter.rs

5444c0a

jmacd marked this pull request as draft December 3, 2025 17:46

	/// Applies the shared server tuning options to a tonic server builder.
	/// Applies the shared server tuning options to a server builder.

Unify OTLP/OTAP gRPC Configuration and Introduce Experimental Non-Tonic Receiver #1382

Are you sure you want to change the base?

Unify OTLP/OTAP gRPC Configuration and Introduce Experimental Non-Tonic Receiver #1382

Uh oh!

Conversation

lquerel commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Align OTLP/OTAP receivers on shared gRPC config

New experimental receiver (non-Tonic)

Uh oh!

codecov bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

utpilla left a comment

Choose a reason for hiding this comment

Uh oh!

lquerel commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lquerel commented Nov 4, 2025 •

edited

Loading

codecov bot commented Nov 4, 2025 •

edited

Loading