* Huge stability push... all (known) paths have been tested heavily on linux with tsan and every single found race condition report has been fixed. Lots of locks have been added/moved/changed and some instances of things have been leaked on purpose to prevent tsan reports during shutdown
* More efficient storage proxy implementation which immediately forward segments to clients once they are available in proxy
* Added UbaAgent -command=x which can be used to send commands to host. Supported commands are "status"which prints out status of all remote sessions. "abort/abortproxy/abortnonproxy" that can be used to abort remote sessions and "disableremote" to have host stop accepting more helpers
* Fixed scheduler::stop bug if remotes were still requesting processes
* Added support for process reuse on linux/macos
* Added support for Coordinator interface and dynamically load coordinator dll in UbaCli
* Restructured code a little bit to be able to queue up all writes in parallel
* Added Show create/write colors to visualizer (defaults to on)
* Fixed so write file times are visualized in visualizer
* Improved socket path for visualizer
* Improved a lot of error messages
* Fixed double close of memory handle in StorageServer
* Changed some ScopedWriteLock to SCOPED_WRITE_LOCK (same for read locks)
* Fixed some missing cleanup of trace view when starting a new trace view in visualizer
[CL 32137083 by henrik karlsson in ue5-main branch]
* Fixed race condition in UbaNetworkClient related to message ids and returning message ids after use
* Implemented network backend that is using memory for communication (will be used between client and proxy server when running inproc)
* Added connection uid provided by network backend to be able to improve error messages
* Added NetworkBackend GetTotalSendAndRecv that can be used to fetch all traffic on backend
[CL 31981782 by henrik karlsson in ue5-main branch]
* Elevate thread priority for the threads doing all the networking to make sure proxy and host are prioritizing receiving/sending network data
[CL 31796023 by henrik karlsson in ue5-main branch]
* Changed so all ScopedReadLock and ScopedWriteLock are macros (SCOPED_READ_LOCK and SCOPED_WRITE_LOCK).. this to be able to add code around the lock to measure contention. Contention testing code can be enabled by setting UBA_TRACK_CONTENTION to 1
[CL 31795889 by henrik karlsson in ue5-main branch]
* Changed aws code to use token for meta data queries. Some of our AWS machines require this and it doesn't hurt to use it everywhere
[CL 31757296 by henrik karlsson in ue5-main branch]
* NetworkBackendTcp - Changed so when having failed connection we wait for socket thread before taking lock to delete it.
* NetworkBackendTcp - Removed check for 0 connections. That was a hack and now all code should properly wait
* Changed infinte wait of event to 60 seconds and output error if it times out..
* Removed NetworkBackend::Close. This should never be done on the outside. Outside should always call shutdown and then wait for disconnect callback
* Moved shutdown of socket to outside lock to prevent deadlock
[CL 31731469 by henrik karlsson in ue5-main branch]
* Fixed shutdown issues where client and server did not wait for connection to be closed before closing down
* Added error handling for trying to decompress bad cas files
[CL 31715626 by henrik karlsson in ue5-main branch]
* Added more description when failing to drop cas db entry
* Added cas entry lock around code that copy or link file out
* Added more fixes for things reported by tsan/asan
[CL 31708309 by henrik karlsson in ue5-main branch]
* Fixed a bug in UbaWorkManager shutdown where intrusive linked list contains bad instances while deleting workers
* Fixed more tsan errors when doing ctrl-c on host
[CL 31676297 by henrik karlsson in ue5-main branch]
* Fixed read-after-free bug found by TSAN in StorageServer.cpp
* Fixed write-after-free bug found in UbaProcess.cpp (m_cancelEvent.Set() could be called after memory was freed)
* Lots of minor fixes found using TSAN. Most are harmless but still nice to cleanup
* Disabled mimalloc for linux again.. seems like tsan does not like it so maybe there are bugs in it
[CL 31676014 by henrik karlsson in ue5-main branch]
* Added code to validate that connections from server to client are all from the same server. Solved by giving servers a unique id and made sure it is part of connection handshake with client. Client will just discard connections that is providing a different server uid than the first connection
This will hopefully fix the bug seen on our farm a couple days ago. Theory is that two servers managed to connect to the same helper. First server is just about to start connecting to a helper when helper is brought down and restarted to help another server. Some sort of stall happens on first server and once it starts the actual tcp connection the helper has been brought up to help the other server. The helper allows more than one tcp connection for performance reasons so will accept tcp connection from both old and new server. In this messed up scenario some messages would go to one server and some to the other.. causing all kinds of weird things. It is critical that all connections go to the same server since all messages are just round robin the tcp connections.
[CL 31460950 by henrik karlsson in ue5-main branch]
* Added hint to unmapfileview to be able to add useful information to asserts on fails
* Added error handling for vsnprintf returning error
* Fixed memory stomp in unit test
* Removed usage of %hs on non-windows since asan complains about it
* Changed UBA_USE_MIMALLOC to always be defined and be 0 or 1 instead of using ifdef
* Enabled mimalloc on linux
[CL 31355077 by henrik karlsson in ue5-main branch]
* Added lock around socket send for windows to see if it will resolve some of the crazy socket send times on the farm
[CL 30976001 by henrik karlsson in ue5-main branch]
* Added so in-process proxy can use storage of client
* Fixed reset of trace reader
* Fixed annoying warnings when closing invalid sockets
[CL 30667316 by henrik karlsson in ue5-main branch]
* Added GetNextProcess as a real message type (not using Custom) to be able to stop fetching work if helper is being terminated by aws or if we want to scale down number of workers
* Did some cleanup in the Tcp backend code and made sure not to call close socket multiple times (since it can cause the code to close a different socket than what is owned)
[CL 30584707 by henrik karlsson in ue5-main branch]
* Downgraded tcp bind error to info since it could happen if ppl are running compile and cook at the same time.
[CL 30574550 by henrik karlsson in ue5-main branch]
* Fixed so paths returned from SearchPathForFile are cleaned up (not having .. etc)
* Fixed so UbaAgent handle WSAPoll POLLERR as timeout so it retries when failing (this made the process exit on wine)
* Fixed so returned processes that had been reused returned the reused process to the scheduler
* Fixed so EnsureBinaryFile is using correct fileNameKeys
* Removed assert in detoured GetFullPathNameA since it is always calling GetFullPathNameW
[CL 30465533 by henrik karlsson in ue5-main branch]
* Fixed path validation code
* Added null guard in NetworkBackendTcp::Connect and collapsed to early outs into one
[CL 30353516 by henrik karlsson in ue5-main branch]
* Removed usage of "select" in tcp socket connect code because it can't handle file descriptors over 1024 so segfaults (phew, hard one to find)
* Added option to have clients send back log to server (SessionServerCreateInfo.remoteLogEnabled) .. this forced a network protocol bump
* Improved asserts for when network messages are corrupt
* Some small fixes of asserts in debug in win32 detoured code
[CL 30343891 by henrik karlsson in ue5-main branch]
* Added delete to copy-ctor and assign operator for Thread and event and fixed related code
* Some minor cleanups in thread class
* Fixed so UbaAgent is not writing timeout waiting for connection if server was connected but quickly disconnected
[CL 30325575 by henrik karlsson in ue5-main branch]
* Fixed so storage client keeps track of last tested proxy to prevent testing of same proxy over and over again
* Removed Sleep to emulate timeout if timeout variable is not set
[CL 30315232 by henrik karlsson in ue5-main branch]
* Desperate changes trying to figure out the segfault happening rarely on the farm which contains _zero_ information. It seems to happen directly after a remote is connecting but it never reaches the log entry that writes "Connected to ".
[CL 30314956 by henrik karlsson in ue5-main branch]
* Added guards for null function pointers in tcp backend
* Changed signal handler for SIGSEGV to use sigaction
* Added time to log entry when populating cas client side
[CL 30303417 by henrik karlsson in ue5-main branch]