Commit Graph

37 Commits

Author SHA1 Message Date
henrik karlsson
5dc54a8104 [UBA]
* Huge stability push... all (known) paths have been tested heavily on linux with tsan and every single found race condition report has been fixed. Lots of locks have been added/moved/changed and some instances of things have been leaked on purpose to prevent tsan reports during shutdown
* More efficient storage proxy implementation which immediately forward segments to clients once they are available in proxy
* Added UbaAgent -command=x which can be used to send commands to host. Supported commands are "status"which prints out status of all remote sessions. "abort/abortproxy/abortnonproxy" that can be used to abort remote sessions and "disableremote" to have host stop accepting more helpers
* Fixed scheduler::stop bug if remotes were still requesting processes
* Added support for process reuse on linux/macos
* Added support for Coordinator interface and dynamically load coordinator dll in UbaCli
* Restructured code a little bit to be able to queue up all writes in parallel
* Added Show create/write colors to visualizer (defaults to on)
* Fixed so write file times are visualized in visualizer
* Improved socket path for visualizer
* Improved a lot of error messages
* Fixed double close of memory handle in StorageServer
* Changed some ScopedWriteLock to SCOPED_WRITE_LOCK (same for read locks)
* Fixed some missing cleanup of trace view when starting a new trace view in visualizer

[CL 32137083 by henrik karlsson in ue5-main branch]
2024-03-08 18:31:48 -05:00
henrik karlsson
d394b4b515 [UBA]
* Added session notification message to be able to see why client is about to go away.. hopefully this will help us identify if there are crashes happening on remotes that we don't track
* Added ApplicationRules::AllowStorageProxy and changed so .obj files are never going through the storage proxy on fetch (they are only read by one process so no point feeding them through the proxy
* Fixed the file write-through code and gave the #if 0 a name (#if UBA_USE_WRITE_THROUGH).. still disabled since it didn't give any benefits enabling it

[CL 32025069 by henrik karlsson in ue5-main branch]
2024-03-05 12:41:52 -05:00
henrik karlsson
bbf8e06ccb [UBA]
* Added support for handling server side messages and send response later
* Changed so storage proxy errors are muted when disconnected
* Improved error messages

[CL 31981923 by henrik karlsson in ue5-main branch]
2024-03-03 21:30:10 -05:00
henrik karlsson
ff59263a86 [UBA]
* Added support for sending .uba trace files from helpers to server

[CL 31981882 by henrik karlsson in ue5-main branch]
2024-03-03 21:18:10 -05:00
henrik karlsson
0b3f5e6f95 [UBA]
* Extracted work tracker out from work manager so it can be used for multiple work managers when running with proxies

[CL 31981685 by henrik karlsson in ue5-main branch]
2024-03-03 20:51:45 -05:00
henrik karlsson
361532ab08 [UBA]
* Reduced contention in SessionClient when testing if processes are done and spawning new processes
* Changed so FileMappingBuffer is lazily closing mapped memory. For some reason these calls end up being very expensive and we don't have to close them straight away

[CL 31796195 by henrik karlsson in ue5-main branch]
2024-02-26 01:50:10 -05:00
henrik karlsson
0b12758059 [UBA]
* Changed so all ScopedReadLock and ScopedWriteLock are macros (SCOPED_READ_LOCK and SCOPED_WRITE_LOCK).. this to be able to add code around the lock to measure contention. Contention testing code can be enabled by setting UBA_TRACK_CONTENTION to 1

[CL 31795889 by henrik karlsson in ue5-main branch]
2024-02-26 01:14:42 -05:00
henrik karlsson
18005e4601 [UBA]
* Disabled custom signal handler in UbaHost in case this is the reason we get crashes in ubt
* Changed some logging from Info to Detail

[CL 31742437 by henrik karlsson in ue5-main branch]
2024-02-22 18:46:44 -05:00
henrik karlsson
8255fb00b5 [Uba]
* Added more description when failing to drop cas db entry
* Added cas entry lock around code that copy or link file out
* Added more fixes for things reported by tsan/asan

[CL 31708309 by henrik karlsson in ue5-main branch]
2024-02-21 20:09:27 -05:00
henrik karlsson
5adb64e10e [UBA]
* Fixed read-after-free bug found by TSAN in StorageServer.cpp
* Fixed write-after-free bug found in UbaProcess.cpp (m_cancelEvent.Set() could be called after memory was freed)
* Lots of minor fixes found using TSAN. Most are harmless but still nice to cleanup
* Disabled mimalloc for linux again.. seems like tsan does not like it so maybe there are bugs in it

[CL 31676014 by henrik karlsson in ue5-main branch]
2024-02-21 01:43:26 -05:00
henrik karlsson
529009dcd9 [Uba]
* Changed so ClientSession struct is allocated using aligned_alloc instead of new.. we can't figure out why this is failing with new on the farm and noone can repro so this is a very nasty workaround until we do understand how it can go wrong.
* Fixed potential race condition in scheduler
* Reduced lock scope in StorageClient to remove chance of deadlock
* Enabled mimalloc on linux

[CL 31660233 by henrik karlsson in ue5-main branch]
2024-02-20 18:03:50 -05:00
henrik karlsson
a01cb16b24 [UBA]
* Removed force drop of cas file for remote log file since it caused problems if two log files were the same

[CL 31607371 by henrik karlsson in ue5-main branch]
2024-02-19 01:59:50 -05:00
henrik karlsson
5673117e65 [UBA]
* Added code to handling uba::ReadFile trying to read file with less bytes than requested. It will try for 3 seconds and then fail with a message saying it read 0 bytes for 3 seconds
* Added more information to error messages

[CL 31461069 by henrik karlsson in ue5-main branch]
2024-02-14 01:37:37 -05:00
henrik karlsson
eda007f51f [UBA]
* Added support for most features in UbaScheduler. Dependencies, weights etc etc.
* Added support for loading yaml file with processes for UbaScheduler.
* Added so UbaCli interprets yaml file as a file with processes use it to populate a scheduler which is then executed
* Fixed so all threads inside same process spawned by uba ends up in the same thread group.
* Fixed linux crash where process comunication memory was deleted when cancel event was called (added lock around code)
* Fixed deadlock that could happen if flush dead processes were called at the after lock but before processhandle dtor in Session::ProcessExited
* Changed new[]/delete[] to aligned_alloc/free because for some reason new/delete trigger asan on linux and don't know why.

[CL 31372220 by henrik karlsson in ue5-main branch]
2024-02-11 04:00:51 -05:00
henrik karlsson
59cd3b1849 [UBA]
* Added support for custom text in trace/visualizer... can be used to report things like horde status etc

[CL 31118070 by henrik karlsson in ue5-main branch]
2024-02-02 00:30:28 -05:00
henrik karlsson
11ed67aeed [UBA]
* Added log entries to trace file

[CL 30669670 by henrik karlsson in ue5-main branch]
2024-01-17 16:49:50 -05:00
henrik karlsson
ade0725e14 [UBA]
* Fixed so imagehlp.dll and dbghelp.dll are detoured when loaded. Detour ImageGetDigestStream and SymLoadModuleExW because both of them cause trouble on wine
* Fixed so reuse of processes also honor log files so a new log file is created when reuse happen
* Improved log file naming so host can set name and log files are sent back properly with the right name

[CL 30638193 by henrik karlsson in ue5-main branch]
2024-01-16 13:24:28 -05:00
henrik karlsson
4b1c00af7d [UBA]
* Fixed stats reporting bug for process reuse

[CL 30588189 by henrik karlsson in ue5-main branch]
2024-01-12 01:51:57 -05:00
henrik karlsson
22cf61f84b [UBA]
* Added GetNextProcess as a real message type (not using Custom) to be able to stop fetching work if helper is being terminated by aws or if we want to scale down number of workers
* Did some cleanup in the Tcp backend code and made sure not to call close socket multiple times (since it can cause the code to close a different socket than what is owned)

[CL 30584707 by henrik karlsson in ue5-main branch]
2024-01-11 21:01:56 -05:00
henrik karlsson
9f12e202b4 [UBA]
* Added ProcessStartInfo::writeOutputFilesOnFail that can be set to true if we want the output files to be written/sent back even though process exited with errors
* Changed so session client use ProcessStartInfoHolder and moved serialization code to that class

[CL 30576299 by henrik karlsson in ue5-main branch]
2024-01-11 15:42:50 -05:00
henrik karlsson
e0e929073b [Uba]
* Added RegisterDeleteFile to external api that should be used when file is deleted outside of uba but uba needs to know about it
* Reduced lock scope around FlushDeadProcesses
* Fixed disconnect issue crash in ubaagent that could cause access violation in UbaStorage. Fix was to cleanup when leaving StorageClient::SendAllSegments
* Changed so UbaRequestNextProcess is doing a full environment update (not resetting stats) if no next process is found
* Fixed so Scheduler enableProcessReuse=false works properly
* Improved some assert descriptions
* Fixed so files in m_outputFiles in session client is erased when flushed to server
* Fixed so Rpc_GetFullName cleans up .. in paths
* Fixed so files that existed but has been deleted by external process is seen as not existing by remote detoured process.

[CL 30507663 by henrik karlsson in ue5-main branch]
2024-01-09 12:04:16 -05:00
henrik karlsson
9b1e22319d [UBA]
* Fixed so paths returned from SearchPathForFile are cleaned up (not having .. etc)
* Fixed so UbaAgent handle WSAPoll POLLERR as timeout so it retries when failing (this made the process exit on wine)
* Fixed so returned processes that had been reused returned the reused process to the scheduler
* Fixed so EnsureBinaryFile is using correct fileNameKeys
* Removed assert in detoured GetFullPathNameA since it is always calling GetFullPathNameW

[CL 30465533 by henrik karlsson in ue5-main branch]
2024-01-03 16:23:58 -05:00
henrik karlsson
ea673427b2 [UBA]
* Implemented process reuse logic in UbaScheduler. It is now possible for running processes to fetch more work
* Fixed bugs in custom message path
* Fixed bugs in FlushWrittenFiles message
* Changed So UpdateEnvironment is resetting stats
* Fixed potential race condition related to process reuse
* Added special rule for ShaderCompileWorker.exe which detours ImageGetDigestStream in Imagehlp.dll (because wine implementation does not match windows implementation)
* Renamed "exitedUserData" to just userData since it is used for more than when exiting process
* Added a couple more dlls to "known system dlls"
* Fixed so visualizer can visualize process reuse properly

[CL 30462059 by henrik karlsson in ue5-main branch]
2024-01-02 17:42:42 -05:00
henrik karlsson
32f666ea42 [UBA]
* Added Scheduler class that is a very simple scheduler that handles processes that has no intra dependencies. It handles scheduling remotely and also reschedule processes returned from remote machines.
* Changed so application dependencies are retrieved in parallel by clients
* Added so custom assert handler can be set from the outside
* Added traceEnabled to SessionCreateInfo so trace shared mem can be created without needing to launch visualizer or write to file
* Added so userData can be provided in RemoteProcessAvailable and RemoteProcessReturned callbacks
* Added more files to known system files (based on what exists in wine)
* Improved FindImports code and made it available in export. (it is now automatically filtering out known system files)
* Moved ProcessHandle to its own file
* Added ProcessStartInfoHolder which is a class that can wrap a ProcessStartInfo and make sure all strings are allocated

[CL 30437951 by henrik karlsson in ue5-main branch]
2023-12-22 02:41:43 -05:00
henrik karlsson
0c68ced47b [UBA]
* Fixed a bug where uba failed to download dlls to remote if it was in the system32 folder but did not exist on the remote machine's system32 and was not part of known system modules. (this solves the problem with ShaderCompilerWorker not downloading opengl32.dll to barebone machines)
* Fixed bug where relative path of dll was sent into RetriveCasFile which in turn failed because it couldn't find the file
* Fixed so code could handle loading libraries with circular dependencies
* Removed assert for now related to GetThreadPreferredUILanguages and asking for language name instead of id for remote machines)

[CL 30407004 by henrik karlsson in ue5-main branch]
2023-12-19 19:53:32 -05:00