This change serializes the GPU breadcrumb data into the shared context so that it can then be retrieved by the crash report client and included in the final payload.
#jira UE-214270
#tests tested locally
#rb mihnea.balta
[CL 34237925 by daniele vettorel in ue5-main branch]
The module name is ok but the CrashReportClientEditor target is causing a warning that can be safely ignored.
#jira UE-215245
#rb Sebastian.Thomeczek, Patrick.Laflamme
[CL 33941604 by martin sevigny in ue5-main branch]
IsStuck can be used to categorize abnormal terminations. This can catch reports where the player terminates the program before the hang detection kicks in (normally quite high, 45 to 60 seconds).
#rb daniele.vettorel
#tests multiple crash reports using debug commands
#rnx
[CL 32428340 by nicolas mercier in ue5-main branch]
Change PlatformName to be stored as TCHAR array like all other properties.
#rb daniele.vettorel, elizabeth.bunner, zach.harris
#tests locally in FrontEnd
[CL 31938694 by nicolas mercier in ue5-main branch]
Introduces a few changes that enables CrashReportClient to read project it's settings from project configuration.
* Add compile time fallback of data router to CrashReportClientEditor.
* Refactored how some global defines are set in CrashReportClient/CrashReportClientEditor target to allow different settings for development time vs retail time.
* Added code that allows project configuration files to override any CRC setting. Previously it would only apply to some properties.
* Refactored CrashReportCoreConfig.
* Added compile time defines for company name. UX changes in other CL.
* Limit core usage for CRC to 5 cores. Significantly reduces memory usage for machines with many cores.
#rb Patrick.Laflamme
#jira UE-114670
[CL 30879797 by johan berg in ue5-main branch]
- use BaseDir() instead of ProjectDir(), as ProjectDir could return a relative path (unsuitable for drive information)
- Added missing fields in the crash reporter context that is sent on abnormal termination. Some fields were added recently but were never synced/sent by the crash reporter client.
- Fix an issue that prevented listing of partitions. The Wbem Next() method can return a value that is not 0 but also not a failure.
#rb daniele.vettorel, zach.harris
#rnx
[CL 30330133 by nicolas mercier in ue5-main branch]
Cuts time to record a crash context by as much as 30s in the presence of some security software.
#rb johan.berg
[CL 27322947 by robert millar in ue5-main branch]
#rb Chris.Gagnon, Johan.Berg
#preflight 629f7017617cbe81d32add99
#ushell-cherrypick of 20541559 by Patrick.Laflamme
#preflight 629fbb4b521254896f6c2c43
#ROBOMERGE-AUTHOR: patrick.laflamme
#ROBOMERGE-SOURCE: CL 20546605 via CL 20546619 via CL 20546630 via CL 20546644 via CL 20546648
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v954-20466795)
[CL 20552343 by patrick laflamme in ue5-main branch]
### Features
This change enables the StallDetector watchdog in Editor to submit reports to crashreporter about threads violating instrumented deadlines in the source code. This feature was available prior on Windows, and this change adds Linux support.
### Notes
New APIs:
ReportStall()
CaptureThreadPortableCallStack()
Many APIs are updated from purely "Ensure" naming to more general naming. Stalls are more like Ensures than crashes, and so the appropriate renames to make the code readable and clear have been made. In some places Ensure is replaced with the clearer: Continuable Event nomenclature.
### Testing
I synthesized an ensure on Linux, and did the same for a stall. I then compared crash report XML file to make sure they contain accurate data in the callstack, portable callstack, and other fields in the report. I also noted that the stall information was showing as expected in the crash reporter.
#rb brandon.schaefer, francis.hurteau
#jira UETOOL-3336
#preflight 625e20d2804460ab0fea3277
[CL 19911608 by geoff evans in ue5-main branch]
- The data show no evidence that CRC is crashing there. Capturing this state is I/O expensive and not required moving forward.
#jira UETOOL-4042 Inspect UE5/Main analytics for CRC crashes
#rb Jamie.Dale
[CL 17116844 by Patrick Laflamme in ue5-main branch]
- Added code to run a clean up on UECrashContext-{pid}.xml files that 30 days old where the process ID (pid in the name) is not running anymore.
#rb Jamie.Dale
[CL 16522984 by Patrick Laflamme in ue5-main branch]
Engine/Editor changes:
- Split the Editor summary session in two, one summary for the Engine properties and one for the Editor specific properties. Made it easy to extend the Engine summary to create other summaries.
- Made the summary sender as agnostics as possible of the keys it sends.
- Fixed the system wide lock contention between the process when persisting a session. (On problem caused by the lock is UE-114315).
- Fixed concurrent issue when saving the summary sessions on Linux/Mac
- Fixed performance issue when saving the summary session on Linux/Mac. This enable saving at higher frequency.
- Fixed cases where the same session summary is sent more than once.
- Fixed Windows registry key overflow that could happens if we accumulated too many sessions (in theory, this can happen)
- Made adding new properties to the summary easy and private to the implementation.
- Brought the Linux/Mac implementation closer to Windows implementation.
- Reduced memory allocation, especially when the session records a crash.
- Improved chances to send the summary non-delayed by allowing the Editor to send the reports if CRC died unexpectedly.
- Generalized the support to collect and aggregate analytics from helper processes. For example, CRC already collects analytics that is merged with the Editor summary as information supplement
- Reserved the disk space required to store the summary ahead of time to prevent failing later.
- Increased frequency at which the summary is persisted because saving the summary is more efficient. (About every 10 seconds rather than every minutes).
- Added unit tests
CrashReportClient changes:
- Created a 'session summary' from the CRC point of view to merge with the Editor summary.
- Moved analytics collection in a separated class to make the crash reporting code leaner and less noisy with all the analytics
- Merged the CRC diagnostic logger in the class collecting CRC analytics summary and make the diagnostic log a property in the summary.
- Collected analytics (on behalf of Editor) in a background thread because CRC main thread can be blocked collecting a crash, so it doesn't pay attention to other things
- Added MonitorBatteryLevel and MonitorOnACPower summary properties on Windows. Collected on CRC background thread (never blocked, so we reduce changes to miss the battery running out)
- Added MonitorSessionDuration summary property to track now long CRC ran.
- Added MonitorQuitSignalRecv summary property to detect when CRC is soft killed like: taskkill /PID 1234
- Added MonitorIsReportingCrash summary property to track when CRC dies reporting a crash.
- Added MonitorIsCollectingCrash summary property to track when CRC dies collecting a crash artifacts.
- Added IsProcessingCrash summary property to track when CRC dies processing a crash.
- Added MonitorCrashed summary property to track when CRC exception handler was triggered.
- Added MonitorWasShutdown summary property to track when CRC summary was shutdown
- Added MonitorLoggingOut summary property to track when CRC died because the user was logging out (or as result of shutting down or restarting the computer).
- More accurate value for DeathTimestamp summary property because this is now captured in CRC background thread (which cannot be busy handling a crash)
- Added crash processing timing to CRC diagnostic logs (how long it takes to collect/process a crash).
#rb Jamie.Dale, Wes.Hunt, Johan.Berg
#jira UETOOL-3500
#jira UE-114315
[CL 16324612 by Patrick Laflamme in ue5-main branch]
- Bumped the limit from 256 to 512
- Always reserve one spot for the crashing thread in the list transmitted to CRC, possibly ignoring some thread.
- Added diagnostic logs in CRC to captures cases where the number of thread would reach the new limit of 512 or if the crashing thread is 0.
#jira UE-114291 - Fail to capture some Editor PCallstack because a hard limit in GenericCrashContext
#rb Johan.Berg
[CL 16123400 by Patrick Laflamme in ue5-main branch]