Commit Graph

171 Commits

Author SHA1 Message Date
patrick laflamme
58925c4e1d #jira UE-87927 - Disaster Recovery doesn't restore a crash from a restored session
- Added the ability to copy and restore a live session, preventing the need to archive it in first place, making the server exist fast (releasing the session lock very quickly) before showing the crash UI and before the next Editor instance could starts.

Details:

This bug could manifest if various ways. An issue causing this bug was fixed in 11252374. This bug can also be observed if the crash reporting process doesn't release its lock on the crashed session quickly. Archiving a session may takes several minutes (depending on the session size) and while a session is archiving, its database is locked and cannot be restored until the archiving process complets. When the Editor reboots after a crash, it searches for a session to recover, but skip over any session that is mounted/locked assuming the session is concurrently used by a concurrent Editor process, potentially preventing it from restoring. The optimal way to work around this problem  is to skip the archiving step. Instead, the live session is never archived (saving a copy), which allows the recovery service to shutdown and release the session lock very quickly ensuring that the session will be unlocked when the Editor restarts. On Editor start, it a crashed session is found and the user decides to restore it, the live session is copied into a new live session.

This changelist also affect those other jira in the following ways:

#jira UE-87899 - Disaster recovery prevents showing the crash reporting UI in a timely manner if the session is large
  - This CL changes execution order to shut down the recovery service ASAP to release the lock, but the optimization above make it super fast, so the UI should always be shown in a timely manner.

#jira UE-87927 - Disaster Recovery doesn't restore a crash from a restored session
  - This CL ensures the recovery service release the session lock faster than the next instance of the Editor can start.

#jira UE-87900 - Disaster Recovery stops recording transactions if the UDP transport layer restarts or auto-repair
#jira UE-88517 - Concert Log Spam - (ConcertKeepAlive) discarded
  - This CL fixes an issues with endpoints timeout logic.

#jira UE-81049 - Clean up the DisasterRecovery Intermediate directory
  - This CL added code to clean up the intermediate directory left over by crashed client.

#rb Francis.Hurteau

#ROBOMERGE-SOURCE: CL 11632069 in //UE4/Release-4.25/... via CL 11632084
#ROBOMERGE-BOT: RELEASE (Release-4.25Plus -> Main) (v655-11596533)

[CL 11632094 by patrick laflamme in Main branch]
2020-02-26 11:18:30 -05:00
cristina riveron
48ff95252f CrashReporterClientEditor crash if the bug report windows in closed before the callstack appears
- Fixed CrashReportClient using the analytics provider without checking if it is available first.

#jira UE-89414
#rb Chris.Gagnon
#lockdown cristina.riveron

#ROBOMERGE-SOURCE: CL 11590231 in //UE4/Release-4.24/... via CL 11590232 via CL 11590234
#ROBOMERGE-BOT: RELEASE (Release-4.25Plus -> Main) (v654-11333218)

[CL 11590237 by cristina riveron in Main branch]
2020-02-24 15:20:17 -05:00
patrick laflamme
dc11bc7204 #jira UE-89180 - Crash Report Client does not display callstack for debug commands
- Changed the execution flow to ensure the callstack was displayed while the windows was still on screen (it was updated just after the window was closed).

#rb Johan.Berg

#ROBOMERGE-SOURCE: CL 11565572 in //UE4/Release-4.25/... via CL 11565573
#ROBOMERGE-BOT: RELEASE (Release-4.25Plus -> Main) (v654-11333218)

[CL 11581626 by patrick laflamme in Main branch]
2020-02-21 19:31:35 -05:00
patrick laflamme
017559ab07 #jira UE-87900 - Disaster Recover stops recording transactions if the UDP transport layer restarts or auto-repair
- Fixed disaster recovery remote endpoint timeout set to zero, preventing it from re-registering with MessageBus when an error occurred (like the socket disconnected).

#jira UE-87899 - Disaster recovery prevents showing the crash reporting UI in a timely manner if the session is large
  - Fixed the crash reporter app to display the UI (asking the user to send the bug report) before shutting down the recovery service.

- Renamed the field FDisasterRecoveryInfo::Version into FDisasterRecoveryInfo::Revision because revision is more accurate for the field.

#rb Jamie.Dale

Edigrated 11250824 from Dev-VirtualProduction.

#ROBOMERGE-SOURCE: CL 11515425 in //UE4/Release-4.25/... via CL 11515515
#ROBOMERGE-BOT: RELEASE (Release-4.25Plus -> Main) (v654-11333218)

[CL 11577491 by patrick laflamme in Main branch]
2020-02-21 14:03:11 -05:00
patrick laflamme
b8e8523cc8 #jira UE-87900 - Disaster Recover stops recording transactions if the UDP transport layer restarts or auto-repair
- Fixed disaster recovery remote endpoint timeout set to zero, preventing it from re-registering with MessageBus when an error occurred (like the socket disconnected).

#jira UE-87899 - Disaster recovery prevents showing the crash reporting UI in a timely manner if the session is large
  - Fixed the crash reporter app to display the UI (asking the user to send the bug report) before shutting down the recovery service.

- Renamed the field FDisasterRecoveryInfo::Version into FDisasterRecoveryInfo::Revision because revision is more accurate for the field.

#rb Jamie.Dale

Edigrated 11250824 from Dev-VirtualProduction.

#ROBOMERGE-SOURCE: CL 11515425 in //UE4/Release-4.25/...
#ROBOMERGE-BOT: RELEASE (Release-4.25 -> Release-4.25Plus) (v654-11333218)

[CL 11515515 by patrick laflamme in 4.25-Plus branch]
2020-02-18 15:50:31 -05:00
patrick laflamme
6454b8bd00 #jira UE-88801 - CrashReporterClientEditor crashes if the bug report windows in closed before the callstack appears
- Prevented CrashReportClient::FinalizeDiagnoseReportWorker() function to called after the CrashReportClient instance was deleted.

Details:

The FDiagnoseReportWorker asynchronous task created an extra asynchronous tasks (calling back CrashReportClient instance) that could fire after the targetted CrashReportClient instance was deleted. The solutions moves the logic to 'finilize' the report in the tick function that is expected to run in the game thread and make 'Close Without Sending' flows  like 'Send and Close'/'Send and Restart' but without sending anything.

#rb Francis.Hurteau
#lockdown cristina.riveron

Edigated 11462889 from 4.24

#ROBOMERGE-SOURCE: CL 11512537 in //UE4/Release-4.25/...
#ROBOMERGE-BOT: RELEASE (Release-4.25 -> Release-4.25Plus) (v654-11333218)

[CL 11512549 by patrick laflamme in 4.25-Plus branch]
2020-02-18 15:30:39 -05:00
patrick laflamme
b307af769f #jira UE-88968 - CrashReportClientEditor with MTBF do not sent the analytics report if the user dimisses crash report UI with 'Close Without Sending' very quickly.
- Added a 60 seconds grace period for the Editor process to exit so that we can read its exit code.

#rb Francis.Hurteau
#lockdown cristina.riveron

#ROBOMERGE-OWNER: patrick.laflamme
#ROBOMERGE-AUTHOR: patrick.laflamme
#ROBOMERGE-SOURCE: CL 11507818 in //UE4/Release-4.24/... via CL 11508156 via CL 11508181
#ROBOMERGE-BOT: RELEASE (Release-4.25Plus -> Main) (v654-11333218)

[CL 11508901 by patrick laflamme in Main branch]
2020-02-18 14:14:57 -05:00
patrick laflamme
cf408bd077 #jira UE-88968 - CrashReportClientEditor with MTBF do not sent the analytics report if the user dimisses crash report UI with 'Close Without Sending' very quickly.
- Added a 60 seconds grace period for the Editor process to exit so that we can read its exit code.

#rb Francis.Hurteau
#lockdown cristina.riveron

#ROBOMERGE-OWNER: patrick.laflamme
#ROBOMERGE-AUTHOR: patrick.laflamme
#ROBOMERGE-SOURCE: CL 11507818 in //UE4/Release-4.24/... via CL 11508156
#ROBOMERGE-BOT: RELEASE (Release-4.25 -> Release-4.25Plus) (v654-11333218)

[CL 11508181 by patrick laflamme in 4.25-Plus branch]
2020-02-18 13:51:21 -05:00
patrick laflamme
5a60a93830 #jira UE-88801 - CrashReporterClientEditor crashes if the bug report windows in closed before the callstack appears
- Prevented CrashReportClient::FinalizeDiagnoseReportWorker() function to called after the CrashReportClient instance was deleted.

Details:

The FDiagnoseReportWorker asynchronous task created an extra asynchronous tasks (calling back CrashReportClient instance) that could fire after the targetted CrashReportClient instance was deleted. The solutions moves the logic to 'finilize' the report in the tick function that is expected to run in the game thread and make 'Close Without Sending' flows  like 'Send and Close'/'Send and Restart' but without sending anything.

#rb Francis.Hurteau
#lockdown cristina.riveron

#ROBOMERGE-SOURCE: CL 11462889 in //UE4/Release-4.24/...
#ROBOMERGE-BOT: RELEASE (Release-4.24 -> Main) (v654-11333218)

[CL 11462901 by patrick laflamme in Main branch]
2020-02-17 14:41:35 -05:00
johan berg
514feddf21 Crash report client doesn't need full access handle to runtime when monitoring.
While monitoring the parent process CRC doesn't need a full access process handle on Windows. Open the handle using limited acccess flags instead.

#rb stefan.boberg
#jira UE-88601
#lockdown stefan.boberg

#ushell-cherrypick of 11458913 by Johan.Berg

#ROBOMERGE-SOURCE: CL 11458942 in //UE4/Release-4.24/...
#ROBOMERGE-BOT: RELEASE (Release-4.24 -> Main) (v654-11333218)

[CL 11458943 by johan berg in Main branch]
2020-02-17 07:02:19 -05:00
patrick laflamme
3cfe7e366c #jira UE-88555 - CrashReportClientEditor failed to write the proper application exit code in the analytic summary event
- Fixed CrashReportApp not reading the return code once the monitored process exited (Editor) because the logical && would early out and prevent reading it (keeping it 0).

Edigrated CL 11445717

#rb Jamie.Dale
#lockdown cristina.riveron

#ROBOMERGE-SOURCE: CL 11445866 in //UE4/Release-4.24/...
#ROBOMERGE-BOT: RELEASE (Release-4.24 -> Main) (v654-11333218)

[CL 11445875 by patrick laflamme in Main branch]
2020-02-14 13:15:30 -05:00
Patrick Boutot
b67ff68e04 Copying //UE4/Dev-VirtualProduction to //UE4/Dev-Tools-Staging @ 11168401
#rb none
#rnx

[CL 11170710 by Patrick Boutot in Dev-Tools-Staging branch]
2020-01-29 18:45:15 -05:00
Patrick Boutot
410c720ac7 Merging //UE4/Dev-Main @ 10886849 to Dev-Tools-Staging (//UE4/Dev-Tools-Staging)
#rb none
#rnx
#author jeanmichel.dignard

[CL 10992634 by Patrick Boutot in Dev-VirtualProduction branch]
2020-01-15 09:39:21 -05:00
JeanMichel Dignard
70d074639f Merging //UE4/Dev-Main @ 10886849 to Dev-Tools-Staging (//UE4/Dev-Tools-Staging)
#rb none
#rnx

[CL 10906274 by JeanMichel Dignard in Dev-Tools-Staging branch]
2020-01-08 13:26:18 -05:00
Ryan Durand
9ef3748747 Updating copyrights for Engine Programs.
#rnx
#rb none
#jira none

#ROBOMERGE-OWNER: ryan.durand
#ROBOMERGE-AUTHOR: ryan.durand
#ROBOMERGE-SOURCE: CL 10869242 in //Fortnite/Release-12.00/... via CL 10869536
#ROBOMERGE-BOT: FORTNITE (Main -> Dev-EngineMerge) (v613-10869866)

[CL 10870955 by Ryan Durand in Main branch]
2019-12-26 23:01:54 -05:00
johan berg
de0a5270ed Add "implicit send" functionality to crash report client monitor path.
Add functionality for bImplicit send configuration variable. This allow a game to automatically send crash report without user interaction, displaying a native os dialog when completed.

#rb jamie.dale, patrick.laflamme


#ROBOMERGE-SOURCE: CL 10808278 via CL 10808279
#ROBOMERGE-BOT: (v610-10636431)

[CL 10808280 by johan berg in Main branch]
2019-12-19 04:31:25 -05:00
stefan boberg
1147b2f670 Copy-up from Dev-Core
#rb none

#ROBOMERGE-OWNER: patrick.boutot
#ROBOMERGE-AUTHOR: stefan.boberg
#ROBOMERGE-SOURCE: CL 10419044 in //UE4/Main/... via CL 10442942
#ROBOMERGE-BOT: TOOLS (Dev-Tools-Staging -> Dev-VirtualProduction) (v606-10482310)

[CL 10488881 by stefan boberg in Dev-VirtualProduction branch]
2019-12-02 15:34:47 -05:00
Patrick Laflamme
b6c6bd4be0 #jira UE-82767 - Multi-User log spam in editor
- Fixed verbosity of Multi-User/DisasterRecover endpoint discovery by adding a new log category "LogConcertDebug" defaulting to "Warning" except for the Multi-User server which default to "Log" level. The category verbosity can be adjusted from the command line as: -LogCmds="LogConcertDebug Verbose" or from a console command as: log LogConcertDebug Verbose.
  - Prevented Disaster Recovery client from discovering (and logging) all recovery services running.

#rb Francis.Hurteau

[CL 10467794 by Patrick Laflamme in Dev-VirtualProduction branch]
2019-11-27 09:00:18 -05:00
Stefan Boberg
d2f9a61b06 Copy-up from Dev-Core
#rb none

[CL 10419044 by Stefan Boberg in Main branch]
2019-11-25 12:03:09 -05:00
patrick laflamme
93ab27052b #jira UE-83339 - Disaster Recovery can fail to recover its session when the project is opened from the Project Browser
- Fixed a disaster recovery bug preventing the Editor from recovering a session because another instance of the Editor on another project already locked all the sessions.

Problem:

On windows, the CrashReportClientEditor (hosting disaster recovery service) is started in the static initialization, before the engine is initialized, not allowing lot of command line configuration. The Editor project browser would start a first CrashReportClientEditor instance, which would load and lock all the available sessions (unless another CrashReportClientEditor was running). When the user selected a project, a new Editor and CrashReportClientEditor were launched before the first one was closed. The second instance could not access the existing sessions because they were still locked by the first instance.

Solution:

Because CrashReportClientEditor is launch before the engine is initialized, we don't have any context at the launch time. The best the was to delay the moment when the server reloads the existing sessions and enable each clients to store their sessions in different folders (repositories) mounted on demand by the server.

Implementation details:
  - Implemented new RPC API to allow the client to list/create/load/drop specific repositories containing its own sessions on demand.
  - Updated the Concert server to manage multiples directories where session can be stored/found (session repositories) rather than just one.
  - Added a settings to allow the user to specify where the disaster recovery sessions should be stored on the disk. Now default in the current project folder.
  - Added a settings to prevent the Concert server from scanning the sessions in the default location.
  - Updated disaster recovery to start without any session repository and let the client decide if a new one needs to be created or an existing one be mounted to restore a previous session.
  - Changed the code to let disaster recovery client manage its session history rather than letting the server rotate the old session. Defaulted the history to 0, user has no flow to visualize and pick from the history.

#rb Jamie.Dale

#ROBOMERGE-SOURCE: CL 10260823 in //UE4/Release-4.24/...
#ROBOMERGE-BOT: RELEASE (Release-4.24 -> Main) (v591-10236483)

[CL 10260830 by patrick laflamme in Main branch]
2019-11-15 12:55:57 -05:00
jamie dale
9457015249 Initialize and Shutdown analytics for each crash reported by the monitor
This ensures that we honor the user-settings for reporting analytics correctly if they change while the editor is running.

#jira UE-82764
[FYI] Johan.Berg
#rb Sebastian.Nordgren
#rnx

#ROBOMERGE-SOURCE: CL 9902945 in //UE4/Release-4.24/...
#ROBOMERGE-BOT: RELEASE (Release-4.24 -> Main) (v558-9892490)

[CL 9902964 by jamie dale in Main branch]
2019-10-31 12:39:36 -04:00
johan berg
bd51031375 Prevent analytics being initialized twice
When the user has allowed usage data to be sent we initialize the analytics backend in the crash report client. If the user has also enabled sending unattended reports and an ensure is encountered followed by a crash the crash reporter would assert because the analytics backend was being initialized twice.

#rb sebastian.nordgren
#jira UE-82764

#ROBOMERGE-SOURCE: CL 9899678 in //UE4/Release-4.24/...
#ROBOMERGE-BOT: RELEASE (Release-4.24 -> Main) (v558-9892490)

[CL 9899681 by johan berg in Main branch]
2019-10-31 08:19:10 -04:00
johan berg
b70955bdea Allow game/editor to continue execution earlier after a crash.
A previous change moved the signal to the game/editor that it's okay to continue to after the crash report client was completely done with sending and resolving callstacks, because it was assumed that there was a syncronization problem. However that proved to be another issue, so moving the signal back to where it was originally. This should make the editor only "freeze" a short time, while necessary data is collected.

#jira UE-82333
#rb pj.kack

(ushell-p4-cherrypick of 9868282 by Johan.Berg)

#ROBOMERGE-SOURCE: CL 9868804 in //UE4/Release-4.24/...
#ROBOMERGE-BOT: RELEASE (Release-4.24 -> Main) (v548-9842178)

[CL 9868810 by johan berg in Main branch]
2019-10-28 09:56:39 -04:00
ben marsh
4e13df3463 Fix assert in CrashReportClient when shutting down without sending a crash report.
#rb none
[FYI] sebastian.nordgren
#jira UE-82436

#ROBOMERGE-SOURCE: CL 9838412 in //UE4/Release-4.24/...
#ROBOMERGE-BOT: RELEASE (Release-4.24 -> Main) (v546-9757112)

[CL 9838415 by ben marsh in Main branch]
2019-10-25 08:54:17 -04:00
sebastian nordgren
d7fbf013c6 We now spoof a crash report and send it from the CrashReportClientApp when detecting an abnormal shutdown has occurred. That is to say, a shutdown where, as far as we can tell, no known exit path was followed.
Reverted change to where FCrashReportAnalytics was initialized now that we get those settings from the UECrashContext file.

Added DelayedSend analytics attribute that determines whether or not the process that is sending an analytics event was the same one that created it.

#rb jamie.dale

#jira UETOOL-1826

#ROBOMERGE-SOURCE: CL 9731024 in //UE4/Release-4.24/...
#ROBOMERGE-BOT: RELEASE (Release-4.24 -> Main) (v539-9700858)

[CL 9731027 by sebastian nordgren in Main branch]
2019-10-21 08:17:44 -04:00