Commit Graph

116 Commits

Author SHA1 Message Date
Marc Audy
11f5b21210 Merging //UE5/Release-Engine-Staging @ 13752110 to Main (//UE5/Main)
#rnx

[CL 13753156 by Marc Audy in ue5-main branch]
2020-06-23 18:40:00 -04:00
Patrick Laflamme
667697eac6 #jira UE-94499 - Crash Reporter application name shows Unreal Engine 4 Crash Reporter
- Fixed the application title, getting it from the engine version rather than hardcoding it.

#rb Francis.Hurteau

[CL 13747857 by Patrick Laflamme in ue5-main branch]
2020-06-23 09:54:23 -04:00
ryan durand
6e84939c40 Wrapping CRC API URL in quotes to treat it as a string.
#rnx
#rb none
#jira none

#ROBOMERGE-OWNER: ryan.durand
#ROBOMERGE-AUTHOR: ryan.durand
#ROBOMERGE-SOURCE: CL 13072767 via CL 13072777 via CL 13072783 via CL 13072864
#ROBOMERGE-BOT: RELEASE (Release-Engine-Staging -> Main) (v686-13045012)

[CL 13072869 by ryan durand in Main branch]
2020-04-29 15:03:00 -04:00
patrick laflamme
30017ee6f0 #jira UE-92231 - Editor session summary event doesn't have any link to CrashGUID
- Implemented a special logger inside CrashReportClientEditor to capture and save important events such as crash reporting (along with the CrashGUID)
  - When CrashReportClientEditor sends all the Editor summary events, if an error was detected in the session being sent, the mini-log for that session is attached to the analytic event.

#rb Chris.Gagnon, Jamie.Dale
#lockdown cristina.riverun

#ROBOMERGE-SOURCE: CL 12935952 in //UE4/Release-4.25/... via CL 12935970 via CL 12935996
#ROBOMERGE-BOT: RELEASE (Release-Engine-Staging -> Main) (v682-12900288)

[CL 12936020 by patrick laflamme in Main branch]
2020-04-20 15:38:55 -04:00
patrick laflamme
a348d2b8d1 #jira UE-92094 - CrashReportClientEditor exits before the Editor
- Added code to the Editor to detect and report when CrashReportClientEditor exited unexpectedly. (MonitorExceptCode 777005 is set in the Editor session summary event)
  - Added a retrial loop to CrashReportClientApp to retry opening the the handle on the Editor process if the first time fails.

#rb Jamie.Dale
#lockdown cristina.riverun

#ROBOMERGE-SOURCE: CL 12878012 in //UE4/Release-4.25/... via CL 12878014 via CL 12878016
#ROBOMERGE-BOT: RELEASE (Release-Engine-Staging -> Main) (v681-12776863)

[CL 12878017 by patrick laflamme in Main branch]
2020-04-17 07:28:54 -04:00
patrick laflamme
ff329cbe1d Fixed CIS reporting fatal error C1083: Cannot open include file: 'EditorAnalyticsSession.h': No such file or directory
#rnx
#rb none
#jira none.

#ROBOMERGE-SOURCE: CL 12778165 in //UE4/Release-4.25/... via CL 12778172 via CL 12784352
#ROBOMERGE-BOT: RELEASE (Release-Engine-Staging -> Main) (v681-12776863)

[CL 12786616 by patrick laflamme in Main branch]
2020-04-14 16:56:34 -04:00
patrick laflamme
62d60da7d8 Fix CIS reporting 'CRASH_REPORT_WITH_MTBF' is not defined as a preprocessor macro, replacing with '0' for '#if/#elif'
#rb trivial
#rnx
#jira none

#ROBOMERGE-SOURCE: CL 12762568 in //UE4/Release-4.25/... via CL 12762575 via CL 12784299
#ROBOMERGE-BOT: RELEASE (Release-Engine-Staging -> Main) (v681-12776863)

[CL 12786526 by patrick laflamme in Main branch]
2020-04-14 16:55:01 -04:00
patrick laflamme
a90826ca91 Fixed CIS error C4459: declaration of 'MonitorPid' hides global declaration
#rb Trivial
#jira none.

#ROBOMERGE-SOURCE: CL 12754568 in //UE4/Release-4.25/... via CL 12754572 via CL 12783851
#ROBOMERGE-BOT: RELEASE (Release-Engine-Staging -> Main) (v681-12776863)

[CL 12786382 by patrick laflamme in Main branch]
2020-04-14 16:53:34 -04:00
patrick laflamme
847fb54e89 Detected and reported if CrashReportClientEditor crashed itself and improved how Editor analytics summarize few stats:
- Computed a more accurate 'idle' base on user inputs.
  - Experimenting a measurement of Editor 'idle' time based on Editor process CPU usage.
  - Recorded entering/exiting PIE right away rather than waiting the next 'heartbeat' up to 60 seconds.
  - In case the the session creation is delayed (because contention on the session lock), don't wait up to 60 seconds to retry. Retry immediatedly at the next tick.
  - Increased update rate of the session in the first minute to each second rather than each minute because lot of crashes occurs before the first minute.

#jira UE-91890 - Detect and report if CrashReportClientEditor is crashing
#rb Jamie.Dale
#lockdown cristina.riverun

#ROBOMERGE-SOURCE: CL 12751397 in //UE4/Release-4.25/... via CL 12751399 via CL 12783803
#ROBOMERGE-BOT: RELEASE (Release-Engine-Staging -> Main) (v681-12776863)

[CL 12786319 by patrick laflamme in Main branch]
2020-04-14 16:52:27 -04:00
ben marsh
2f31a08a8c Remove unnecessary whitelist from CrashReportClient module.
#jira

#ROBOMERGE-SOURCE: CL 12737887 via CL 12737889 via CL 12737893
#ROBOMERGE-BOT: RELEASE (Release-Engine-Staging -> Main) (v676-12543919)

[CL 12737900 by ben marsh in Main branch]
2020-04-10 20:04:27 -04:00
patrick laflamme
097792ed22 On Windows, exit the application with exit code 777003 if the crash reporter crashes while reporting or code 777004 if the crash handler (the __except clause) crashes after reporting an error.
Details:

The 4.24.3 analytics shows many unexplained exit codes, 23 647 at the moment. Normally, the Editor will exit with code 0 if everything when well, 3 or 1 if it gracefully handled a crash, 255 it was aborted. But we also see may others like the following predominent cases below:

-1073741819 => STATUS_ACCESS_VIOLATION => 8081 cases
-1073740791 => STATUS_STACK_BUFFER_OVERRUN  => 7581 cases
-1073740771 => STATUS_FATAL_USER_CALLBACK_EXCEPTION => 5357 cases

On Windows, the crash reporting system should catch and report STATUS_ACCESS_VIOLATION and then exit with code 3 (as the error was handled). For example, if you add a null pointer dereference(STATUS_ACCESS_VIOLATION) in the code, the crash reporter handle it and the Editor exit with code 3. Just like if you enter 'debug crash' console command, the editor gracefully handle the error and exit with code 3.  But if you move the null pointer dereference in the crash handler thread itself, the error is not handled and the Editor exits with code STATUS_ACCESS_VIOLATION. This hints that our crash reporting thread is likely crashing in the wild. It would be useful to isolate those cases from the other cases and keep count of how many times this happens.

#jira UE-91803 - Analytics hints that crash reporting and crash handling crashes themselves.
#rb Jamie.Dale
#lockdown cristina.riverun

#ROBOMERGE-SOURCE: CL 12695027 in //UE4/Release-4.25/... via CL 12695062 via CL 12695098
#ROBOMERGE-BOT: RELEASE (Release-Engine-Staging -> Main) (v676-12543919)

[CL 12695136 by patrick laflamme in Main branch]
2020-04-09 15:50:15 -04:00
ben marsh
69b89d7869 Allow CRC analytics settings to be compiled into the executable, to remove NotForLicensees folder within engine code.
#jira
#rb none

#ROBOMERGE-OWNER: ben.marsh
#ROBOMERGE-AUTHOR: ben.marsh
#ROBOMERGE-SOURCE: CL 12681294 via CL 12681304 via CL 12681357
#ROBOMERGE-BOT: RELEASE (Release-Engine-Staging -> Main) (v675-12543919)

[CL 12681363 by ben marsh in Main branch]
2020-04-08 19:18:46 -04:00
patrick laflamme
675cc37038 Prevented one CrashReportClientEditor from sending the report owned by another concurrent instance and losing the exit code.
#jira UE-91493 - CrashReportClientEditor may send a report owned by another concurrent instance losing the exit code in the process
#rb Jamie.Dale

#ROBOMERGE-SOURCE: CL 12598059 in //UE4/Release-4.25/... via CL 12598060 via CL 12598062
#ROBOMERGE-BOT: RELEASE (Release-Engine-Staging -> Main) (v675-12543919)

[CL 12598063 by patrick laflamme in Main branch]
2020-04-03 16:23:51 -04:00
patrick laflamme
45a36adfa1 Also prevent CrashReportClientEditor from sending abnormal termination logs if bSendUnattendedBugReports is not set.
#jira UE-91318 - Events were sent from Crash Reporter after Editor Usage Data is disabled.
#rb none
#lockdown cristina.riveron

#ROBOMERGE-SOURCE: CL 12489133 in //UE4/Release-4.25/... via CL 12489135 via CL 12489146
#ROBOMERGE-BOT: RELEASE (Release-Engine-Staging -> Main) (v673-12478461)

[CL 12489149 by patrick laflamme in Main branch]
2020-03-30 15:05:00 -04:00
patrick laflamme
475b34c0ec #jira UE-91318 - Events were sent from Crash Reporter after Editor Usage Data is disabled.
- Fixed crash report client editor to prevent sending any usage data.

#rb Jamie.Dale
#lockdown cristina.riveron

#ROBOMERGE-SOURCE: CL 12489019 in //UE4/Release-4.25/... via CL 12489022 via CL 12489029
#ROBOMERGE-BOT: RELEASE (Release-Engine-Staging -> Main) (v673-12478461)

[CL 12489031 by patrick laflamme in Main branch]
2020-03-30 14:46:31 -04:00
patrick laflamme
7e1f5f4a6b Ensure to create a crash report everytime the Editor things this is an abnormal termination even if the Editor exit code is zero as it was in 4.24.3
#jira none
#rb trivial

#ROBOMERGE-SOURCE: CL 12380556 in //UE4/Release-4.25/... via CL 12380573
#ROBOMERGE-BOT: RELEASE (Release-4.25Plus -> Main) (v671-12333473)

[CL 12381193 by patrick laflamme in Main branch]
2020-03-23 14:47:08 -04:00
patrick laflamme
5e5233c055 Improved reliability of the Editor analytics.
- Modified the Editor session to support lockless logging and corresponding analysis for concurrent events.
  - Added null check EditorSessionSummaryWriter::LowDriveSpaceDetected() to prevent accessing a null session.
  - Fixed CrashReportClient that could use the analytic provider outside of init/shutdow
  - Fixed IdleTime reported that could be buggy if Slate did not register any interaction before a crash occurs.

#jira UE-90719 - FPlatformMisc::RequestExit() can corrupt the EditorSessionSummaryWriter
#rb Jamie.Dale

#ROBOMERGE-SOURCE: CL 12350899 in //UE4/Release-4.25/... via CL 12350910
#ROBOMERGE-BOT: RELEASE (Release-4.25Plus -> Main) (v671-12333473)

[CL 12350942 by patrick laflamme in Main branch]
2020-03-20 17:10:14 -04:00
patrick laflamme
71396d8aea Tried to get more reliable Editor analytic data out of crash rerporter app:
- Ensured the data is fully written and read to/form the pipe between the Editor and CrashReporter.
  - Refactored the code sending MTFB to send it as soon as possible
  - Added some special exit code to detect when the Editor is still running or the exit code is unknown when the summary event is sent.

#rb Sebastian.Nordgren, Johan.Berg
#jira UE-90733 - Editor summary event may not be sent as expected because CrashReportClient may not consume the pipe data

#ROBOMERGE-SOURCE: CL 12216180 in //UE4/Release-4.25/... via CL 12216189
#ROBOMERGE-BOT: RELEASE (Release-4.25Plus -> Main) (v662-12191386)

[CL 12216227 by patrick laflamme in Main branch]
2020-03-16 14:44:54 -04:00
patrick laflamme
94b4e6d5ad Improved scalability of Disaster Recovery
- Converted Concert API transferring package data in-memory only model to a streaming model to support packages bigger than 2 GB. (TNumberiLimit<int32>::max())
  - Added the IConcertFileSharing interface to share large files between the client and the server. This is used as a side channel to the Concert request/response and event protocol.
  - Fixed the ConcertClientPackageManager to prevent sending the package data for each the 'pre-save' when the 'live sync'  is off. It only emits it once.
  - Fixed UI to correctly report pre-save vs save vs auto-save for package as well as when a package is discarded.

#jira UE-85652 - Crash when importing large FBX with Morph Targets and Disaster Recovery enabled
#jira UE-78722 - Potential Memory Leak with Disaster Recovery Plugin

#rb Francis.Hurteau, Jamie.Dale

#ROBOMERGE-SOURCE: CL 12113821 in //UE4/Release-4.25/... via CL 12113828
#ROBOMERGE-BOT: RELEASE (Release-4.25Plus -> Main) (v657-12064184)

[CL 12113837 by patrick laflamme in Main branch]
2020-03-10 14:25:48 -04:00
johan berg
3d69cf35d5 Crash reporter was stripping callstack frames from the wrong direction.
When creating portable callstack for the crash reports a number of stack frames should be skipped, corresponding to the crash collecting code itself. This value was applied to wrong end of the callstack.

#jira UE-89885
#rb stefan.boberg

#ROBOMERGE-SOURCE: CL 12000209 in //UE4/Release-4.25/... via CL 12000773
#ROBOMERGE-BOT: RELEASE (Release-4.25Plus -> Main) (v656-11643781)

[CL 12000982 by johan berg in Main branch]
2020-03-06 12:23:26 -05:00
patrick laflamme
58925c4e1d #jira UE-87927 - Disaster Recovery doesn't restore a crash from a restored session
- Added the ability to copy and restore a live session, preventing the need to archive it in first place, making the server exist fast (releasing the session lock very quickly) before showing the crash UI and before the next Editor instance could starts.

Details:

This bug could manifest if various ways. An issue causing this bug was fixed in 11252374. This bug can also be observed if the crash reporting process doesn't release its lock on the crashed session quickly. Archiving a session may takes several minutes (depending on the session size) and while a session is archiving, its database is locked and cannot be restored until the archiving process complets. When the Editor reboots after a crash, it searches for a session to recover, but skip over any session that is mounted/locked assuming the session is concurrently used by a concurrent Editor process, potentially preventing it from restoring. The optimal way to work around this problem  is to skip the archiving step. Instead, the live session is never archived (saving a copy), which allows the recovery service to shutdown and release the session lock very quickly ensuring that the session will be unlocked when the Editor restarts. On Editor start, it a crashed session is found and the user decides to restore it, the live session is copied into a new live session.

This changelist also affect those other jira in the following ways:

#jira UE-87899 - Disaster recovery prevents showing the crash reporting UI in a timely manner if the session is large
  - This CL changes execution order to shut down the recovery service ASAP to release the lock, but the optimization above make it super fast, so the UI should always be shown in a timely manner.

#jira UE-87927 - Disaster Recovery doesn't restore a crash from a restored session
  - This CL ensures the recovery service release the session lock faster than the next instance of the Editor can start.

#jira UE-87900 - Disaster Recovery stops recording transactions if the UDP transport layer restarts or auto-repair
#jira UE-88517 - Concert Log Spam - (ConcertKeepAlive) discarded
  - This CL fixes an issues with endpoints timeout logic.

#jira UE-81049 - Clean up the DisasterRecovery Intermediate directory
  - This CL added code to clean up the intermediate directory left over by crashed client.

#rb Francis.Hurteau

#ROBOMERGE-SOURCE: CL 11632069 in //UE4/Release-4.25/... via CL 11632084
#ROBOMERGE-BOT: RELEASE (Release-4.25Plus -> Main) (v655-11596533)

[CL 11632094 by patrick laflamme in Main branch]
2020-02-26 11:18:30 -05:00
cristina riveron
48ff95252f CrashReporterClientEditor crash if the bug report windows in closed before the callstack appears
- Fixed CrashReportClient using the analytics provider without checking if it is available first.

#jira UE-89414
#rb Chris.Gagnon
#lockdown cristina.riveron

#ROBOMERGE-SOURCE: CL 11590231 in //UE4/Release-4.24/... via CL 11590232 via CL 11590234
#ROBOMERGE-BOT: RELEASE (Release-4.25Plus -> Main) (v654-11333218)

[CL 11590237 by cristina riveron in Main branch]
2020-02-24 15:20:17 -05:00
patrick laflamme
dc11bc7204 #jira UE-89180 - Crash Report Client does not display callstack for debug commands
- Changed the execution flow to ensure the callstack was displayed while the windows was still on screen (it was updated just after the window was closed).

#rb Johan.Berg

#ROBOMERGE-SOURCE: CL 11565572 in //UE4/Release-4.25/... via CL 11565573
#ROBOMERGE-BOT: RELEASE (Release-4.25Plus -> Main) (v654-11333218)

[CL 11581626 by patrick laflamme in Main branch]
2020-02-21 19:31:35 -05:00
patrick laflamme
017559ab07 #jira UE-87900 - Disaster Recover stops recording transactions if the UDP transport layer restarts or auto-repair
- Fixed disaster recovery remote endpoint timeout set to zero, preventing it from re-registering with MessageBus when an error occurred (like the socket disconnected).

#jira UE-87899 - Disaster recovery prevents showing the crash reporting UI in a timely manner if the session is large
  - Fixed the crash reporter app to display the UI (asking the user to send the bug report) before shutting down the recovery service.

- Renamed the field FDisasterRecoveryInfo::Version into FDisasterRecoveryInfo::Revision because revision is more accurate for the field.

#rb Jamie.Dale

Edigrated 11250824 from Dev-VirtualProduction.

#ROBOMERGE-SOURCE: CL 11515425 in //UE4/Release-4.25/... via CL 11515515
#ROBOMERGE-BOT: RELEASE (Release-4.25Plus -> Main) (v654-11333218)

[CL 11577491 by patrick laflamme in Main branch]
2020-02-21 14:03:11 -05:00
patrick laflamme
b8e8523cc8 #jira UE-87900 - Disaster Recover stops recording transactions if the UDP transport layer restarts or auto-repair
- Fixed disaster recovery remote endpoint timeout set to zero, preventing it from re-registering with MessageBus when an error occurred (like the socket disconnected).

#jira UE-87899 - Disaster recovery prevents showing the crash reporting UI in a timely manner if the session is large
  - Fixed the crash reporter app to display the UI (asking the user to send the bug report) before shutting down the recovery service.

- Renamed the field FDisasterRecoveryInfo::Version into FDisasterRecoveryInfo::Revision because revision is more accurate for the field.

#rb Jamie.Dale

Edigrated 11250824 from Dev-VirtualProduction.

#ROBOMERGE-SOURCE: CL 11515425 in //UE4/Release-4.25/...
#ROBOMERGE-BOT: RELEASE (Release-4.25 -> Release-4.25Plus) (v654-11333218)

[CL 11515515 by patrick laflamme in 4.25-Plus branch]
2020-02-18 15:50:31 -05:00