* Fix UI stuck on Disconnected during network-change engine restart
When EngineRestarter stopped and restarted the Go engine after a
network type change, the UI only saw the engine's onDisconnected
callback and had no visibility into the reconnect attempt. If the
restart stalled (e.g. on a stale management RPC), the UI stayed on
Disconnected for the full stall window, making it look like the
client never reconnected.
Emit onConnecting() from EngineRestarter at stop and at re-launch to
keep the UI in the Connecting state throughout the restart, and emit
onDisconnected() on error or the 30s safety timeout so a truly failed
restart doesn't leave the UI stuck on Connecting.
* Bind process to default network and ignore initial callback burst
Pin the process's outgoing sockets to the current default Android
Network via ConnectivityManager.bindProcessToNetwork so fresh dials
after a WiFi/cellular switch do not stall on TCP SYN retransmits
through the departing interface.
Skip the initial onAvailable burst fired right after registering the
NetworkCallback. That burst reflects current state, not a transition,
and was triggering a spurious EngineRestarter restart that cancelled
the in-flight login on cold start.
* Bump netbird submodule to test branch
* Gate network change notifications on engine running
Replace the time-based grace window with an isEngineRunning predicate.
The initial onAvailable burst that Android fires right after
registerNetworkCallback cannot trigger an EngineRestarter run because
the engine is not up yet at that point.
Tests updated accordingly; adds coverage for the engine-not-running
path.
* Update submodule
* Silence foreground service notification
Use IMPORTANCE_LOW and explicitly clear sound/vibration on the channel
so the persistent VPN notification does not play a sound or vibrate on
creation or each connection state update.
* Guard default network callback against stale events
Track the currently bound default network and an active flag so late
onLost callbacks cannot clear a newer binding and post-unregister
onAvailable callbacks cannot rebind after shutdown.
* Serialize default network callback state changes
Add a dedicated lock and wrap the default network callback's onAvailable
and onLost bodies, plus the unregister teardown, in synchronized blocks
to close the TOCTOU race where a stale callback could re-bind the
process after unregisterNetworkCallback had cleared the binding.
* Warn if default network is a VPN
If registerDefaultNetworkCallback ever delivers our own TUN as the
default network, binding the process to it would create a routing loop.
Log a warning to surface that case if it happens on any device.
* Serialize default network callback registration
Mirror the unregister teardown's locking by wrapping the active flag and
registerDefaultNetworkCallback in the same synchronized block. Closes the
asymmetry between register and unregister so concurrent calls cannot leak
the default callback or leave the active flag inconsistent.
* Suppress old engine state events during restart
Detach the ConnectionListener before stopping the engine so the old
engine's Disconnecting/Disconnected teardown events do not reach the UI
and cause a brief visible Disconnected flash before the restart kicks
in. The listener is re-attached after the new engine starts; the Go
notifier delivers the current state on attach so the UI converges
without our help.
While the engine is detached, the EngineRestarter drives the UI itself
via notifyConnecting on stop and notifyDisconnected on timeout/error.
* Detect network handover from default-network signal
Replace the per-network onAvailable/onLost pairing with a default-network
type observation. Android sometimes skips onLost on seamless WiFi
handovers, leaving the previous mechanism unable to detect the
transition. The default-network callback delivers the authoritative
current transport, so any change of type triggers an engine restart.
* Skip restart when engine reconnects on its own
Two related changes to avoid disrupting a working connection during a
network handover:
- Filter Disconnecting/Disconnected events from the old engine teardown
via a wrapper around ConnectionListener, and suppress per-listener the
ServiceStateListener.onStopped/onStarted notifications so the UI does
not flash through Disconnected during the restart window.
- Subscribe to OnConnected events from the engine. If the Go core
reconnects autonomously while the 2s restart debounce is still
pending, cancel the restart instead of tearing down the working
connection.
* Bump netbird submodule to fix/job-stream-state-leak
Picks up the fix that prevents transient JOB stream errors from being
reported as a management disconnect, which would otherwise stick the
UI on Connecting after the JOB stream silently reconnects.
* Skip bindProcessToNetwork when default network is a VPN
On some devices the default network callback delivers our own TUN as
the default within a VpnService process. Binding the process to that
risks a routing loop. The Android default-network signal is replayed
seconds later with the underlying physical network, so skipping the
bind on a VPN result waits for that follow-up signal instead.
* Skip bindProcessToNetwork when network capabilities are unknown
Treat a null NetworkCapabilities as unsafe and skip the bind. The
previous null-tolerant check would have bypassed the VPN routing-loop
guard if Android happened to return null in a race between the
default-network callback firing and getNetworkCapabilities.
* Cancel pending restart on user-driven engine actions
A debounced restart scheduled in response to a network change can fire
after the user has manually started or stopped the engine, killing the
user's action mid-flight (auth context canceled, restart fails, UI
stays Disconnected).
Cancel any pending restart before the user-facing entry points run:
binder runEngine/stopEngine, broadcast stop, always-on start, and VPN
permission revoke. The EngineRestarter's own internal stop+restart
remains unaffected.
* Remove bindProcessToNetwork from default network callback
Reverts the bindProcessToNetwork side of f0df3f5. Pinning the process
to the current default network helps when the kernel routing table
lags the actual network change, but hurts when Android lingers a
departing network as default for tens of seconds: every fresh socket
gets stuck on a dying interface.
The default-network callback now only feeds the type-change signal
used for engine restart decisions; the kernel decides which interface
new sockets actually use.
* Fix wrapper stacking and stale listener on restart timeout
EngineRunner.setConnectionListener stacked a fresh ObservingConnectionListener
around whatever it received, so EngineRestarter snapshotting the current
listener and re-installing a FilteringConnectionListener around it grew the
chain by one level on every restart cycle. Unwrap any prior
ObservingConnectionListener inside setConnectionListener and any prior
FilteringConnectionListener when EngineRestarter snapshots, so the chain
stays at most two layers deep across repeated restarts.
Also unregister the restart's ServiceStateListener (and clear
currentListener) on the 30s timeout path so a late onStopped cannot fire
runWithoutAuth against a stale listener and silently restart the engine
after the timeout already gave up. Mirror the cleanup in onStarted and
onError for consistency.
* Address CodeRabbit review nits and bump submodule
- Restore the original ConnectionListener on successful restart so
FilteringConnectionListener wrappers do not accumulate across cycles.
Drop the now-unused allowAfterFirstConnectingOrConnected hook; only
one engine ever runs at a time on Android, so there is no source of
late events that would justify keeping the filter past onStarted.
- NetworkChangeDetector: replace AtomicBoolean with a plain boolean now
that all access is guarded by networkCallbackLock.
- VPNService: drop the redundant engineRestarter null-checks in the
stop-broadcast receiver and onRevoke; engineRestarter is initialized
in onCreate before any of these paths are reachable.
- Bump netbird submodule to current main HEAD (ed828b7a).
* Move peer-list refreshes off the UI thread
PeersFragmentViewModel.onPeersChanged and HomeFragment.onPeersListChanged
called serviceAccessor.getPeersList() synchronously on whatever thread
invoked the listener. When MainActivity.registerServiceStateListener
replays cached state during fragment view creation, that thread is the
UI thread, so the JNI call into the Go engine happened on the main
looper. During engine bootstrap or teardown the call could block long
enough to trigger a 5+ s ANR.
Dispatch the JNI call to a single-thread executor in each consumer and
post results back via LiveData.postValue / View.post, both of which are
already UI-thread safe. The executor lifetime matches the listener's
(ViewModel.onCleared / Fragment.onDestroyView).
* Fix races and listener-suppression leak in EngineRestarter
Schedule/cancel of the debounced restart was non-atomic across the
connectivity callback thread, the Go reconnect notifier, and the main
thread, so a pending runnable could outlive a self-reconnect or be
cancelled while restartScheduled stayed true. Guard restartScheduled
together with handler.postDelayed/removeCallbacks under a single lock.
The suppressed-listener list was a local in restartEngine(), so a
cleanup() during an in-flight restart would not unsuppress external
ServiceStateListeners. The subsequent engineRunner.stop() then filtered
them out of notifyServiceStateListeners(false) and the UI never saw the
final onStopped. Hoist the holder to a field and unsuppress in cleanup()
and on every completion path via getAndSet(null).
* Guard PeersFragmentViewModel against teardown race
onPeersChanged is invoked from a gomobile callback thread while
onCleared runs on the main thread. An in-flight callback that already
passed the listener null-check in PeersStateListenerAdapter could call
refreshExecutor.execute after shutdown and propagate a
RejectedExecutionException up through gomobile.
Set an isCleared flag before clearing the listener and shutting down
the executor, skip submission when set, and swallow
RejectedExecutionException to make a concurrent event a no-op.
* Pass app cache directory to Go for debug bundle temp files
Pass context.getCacheDir() through AndroidPlatformFiles so the Go
debug bundle generator can create temporary zip files in a writable
directory instead of /data/local/tmp/.
* Update netbird submodule with logcat debug bundle support
* Add Troubleshoot fragment with debug bundle upload
Move trace log toggle and log sharing from Advanced to a new
Troubleshoot fragment accessible via the drawer menu. Replace
the old logcat share with debug bundle generation and upload
that copies the upload key to clipboard. Add anonymize toggle.
Works with or without a running engine.
* Replace drawer menu PNG icons with Material outlined vectors
Remove density-specific PNG icons and replace with vector drawables
using Material Design outlined style for consistent appearance.
* Add error logging to EngineRunner.debugBundle
Wrap goClient call with try-catch to log errors, matching the
pattern used by selectRoute and deselectRoute.
* Guard TroubleshootFragment UI callbacks against destroyed view
Check binding and isAdded() before accessing UI in background
thread callbacks to prevent NPE if the fragment is destroyed
while the debug bundle upload is in progress.
Replace the low-resolution 42x42 PNG with a vector drawable for the
Quick Settings tile icon. The new drawable uses the NetBird logo SVG
paths, rendering crisply at any screen density.
* Add network connectivity stress test for VPN resilience
Instrumented test that simulates real-world network disruptions (WiFi/mobile
switching, airplane mode, flapping, long outages) and verifies VPN recovery
via ping. Auto-detects real device vs emulator and uses appropriate network
control strategy for each.
* Fix review findings: resource leak, version bump, markdown lint
- Use try-with-resources in sendEmulatorConsoleCommand to prevent socket leak
- Bump testRules version from 1.6.1 to 1.7.0
- Add language specifier to fenced code block in README
* Fix review findings: restore in finally, label mismatch, readUntilOK error handling
- Always call disruption.restore in finally block to prevent stale network state
- Fix EMU latency label from 5s to 10s to match actual delay value
- Make readUntilOK propagate IOException and detect KO/EOF failures
* Exclude NetworkConnectivityStressTest from CI
This test requires a configured VPN and real network interfaces,
so it cannot run on the CI emulator. Run it manually on a real device.
* Follow up changes for the Go ConnStatus enum type
* Update app/src/main/java/io/netbird/client/ui/home/PeersFragmentViewModel.java
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Fix error handling
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Add Quick Settings Tile service for VPN control
* Fix binding state management in NetbirdTileService
* Handle bindService failure in NetbirdTileService and reset pendingClick flag
* Enhance VPN service intent handling in NetbirdTileService
* fix: Ensure VPN service starts as foreground and adapt activity launch for Android 14+.
* Fix: Ensure `updateTile` calls are executed on the main thread using a Handler.
* feat: Explicitly start VPN service as foreground before running the engine.
* Fix highlights being cut off on long text inputs
edit_text_white_focusable.xml combines the
background with the focus highlight using a
selector.
* Prevent preshared_key input from getting autofocus
When opening the advanced screen, if the input
has focus automatically, toggling light and dark
themes will make the screen scroll up to the
input. This will still happen if the input
ever gets focus, though.
* Change windowSoftInputMode on MainActivity
So that when the keyboard is opened in a fragment
and the fragment is dismissed while the keyboard
is still open it won't resize the MainActivity's
views.
* Fix SSO login on ChromeOS by using Device Code Flow
ChromeOS runs Android apps in a container with separate network namespace,
preventing the browser from reaching the localhost callback server used by
PKCE flow. This causes "service not available" errors after authentication.
Use Device Code Flow on ChromeOS (like Android TV) which uses polling
instead of localhost callback. On ChromeOS, also auto-open the browser
for a similar UX to PKCE while showing QR dialog as fallback.
* update submodule
* Add max width constraint to fragment_server
* Make fragment_server scrollable if necessary
* Make dialog_confirm_change_server scrollable
* Set max width on ConfirmChangeServerDialog
* Set rounded corner radius for dialogs to 16dp
* Set rounded corner radius for dialogs to 28dp
To make it consistent across the app
* Use AlertDialogTheme in ChangeServerFragment dialogs
* Center text used in dialog_simple_alert_message
* Add default margin values for dialogs
Smaller screens use 16dp margin; anything
larger will use 56dp.
* Add max width and center_horizontal to Advanced fragment
* Add max width and center_horizontal to Profiles fragment
* Use wrap_content on ScrollView's layout_height
* Remove unused method to calculate max width on dialogs
* Add fragment_max_width to dimens.xml
Make use of its value in fragments advanced,
profile and server
This also fixes a bug where fragment_advanced was
not respecting the containing layout's max width
value (and making switches disappear in landscape)
* Add missing contentDescription to Use Netbird server button
* Add dialog max width to dimensions file
* Change ConstraintLayout in dialog files to be contained in LinearLayout
In order to apply layout_gravity to its children
* Change ConstraintLayout in fragment_advanced to be contained in LinearLayout
In order to apply layout_gravity to its children
* Add abstract implementation of StateListener
* Add adapter to implement only StateListener
* Add ViewModel usage to PeersFragment
Updates the UI when the peers list change via
StateListener
* Remove unused code and references
* Add null check for model before unregistering service state listener
* Make UI state immutable
* Change exception message in Status.fromString
This code is throwing "Unknown status: Idle"
without reason
* Add Locale.ROOT when doing status.ToLowerCase()
Some locales might be turning lowercase i to
dotless i (ı)
* Skip null peer info or connStatus
When parsing PeerInfoArray to List<Peer>
* Move cleaning up stateListenerRegistry and serviceAccessor to onDetach event
* Reverse cleanup order
* Add Locale.ROOT usage to exception message
* Keep fragments from inflating repeatedly
When tapping in a given option in Navigation
Drawer multiple times
* Return boolean onNavigationItemSelected
Indicating if the NavController handled
navigation to the fragment successfully