Files
UnrealEngineUWP/Engine/Source/Developer/Virtualization/Private/PackageSubmissionChecks.cpp

453 lines
17 KiB
C++
Raw Normal View History

// Copyright Epic Games, Inc. All Rights Reserved.
#include "PackageSubmissionChecks.h"
#include "Containers/UnrealString.h"
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
#include "HAL/FileManager.h"
#include "HAL/PlatformFileManager.h"
#include "HAL/PlatformTime.h"
#include "Internationalization/Internationalization.h"
#include "Misc/PackageName.h"
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
#include "Misc/Paths.h"
#include "Serialization/VirtualizedBulkData.h"
#include "UObject/Linker.h"
#include "UObject/Package.h"
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
#include "UObject/PackageResourceManager.h"
#include "UObject/PackageTrailer.h"
#include "UObject/UObjectGlobals.h"
#include "Virtualization/VirtualizationSystem.h"
#define LOCTEXT_NAMESPACE "Virtualization"
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
// When enabled we will validate truncated packages right after the truncation process to
// make sure that the package format is still correct once the package trailer has been
// removed.
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
#define UE_VALIDATE_TRUNCATED_PACKAGE 1
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
// When enabled we will check the payloads to see if they already exist in the persistent storage
// backends before trying to push them.
#define UE_PRECHECK_PAYLOAD_STATUS 1
namespace UE::Virtualization
{
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
/**
* Check that the given package ends with PACKAGE_FILE_TAG. Intended to be used to make sure that
* we have truncated a package correctly when removing the trailers.
*
* @param PackagePath The path of the package that should be checked
* @param Errors [out] Errors created by the function will be added here
*
* @return True if the package is correctly terminated with a PACKAGE_FILE_TAG, false if the tag
* was not found or if we were unable to read the file's contents.
*/
bool ValidatePackage(const FString& PackagePath, TArray<FText>& Errors)
{
TUniquePtr<IFileHandle> TempFileHandle(FPlatformFileManager::Get().GetPlatformFile().OpenRead(*PackagePath));
if (!TempFileHandle.IsValid())
{
FText ErrorMsg = FText::Format(LOCTEXT("Virtualization_OpenValidationFailed", "Unable to open '{0}' so that it can be validated"),
FText::FromString(PackagePath));
Errors.Add(ErrorMsg);
return false;
}
TempFileHandle->SeekFromEnd(-4);
uint32 PackageTag = INDEX_NONE;
if (!TempFileHandle->Read((uint8*)&PackageTag, 4) || PackageTag != PACKAGE_FILE_TAG)
{
FText ErrorMsg = FText::Format(LOCTEXT("Virtualization_ValidationFailed", "The package '{0}' does not end with a valid tag, the file is considered corrupt"),
FText::FromString(PackagePath));
Errors.Add(ErrorMsg);
return false;
}
return true;
}
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
/**
* Creates a copy of the given package but the copy will not include the FPackageTrailer.
*
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
* @param PackagePath The path of the package to copy
* @param CopyPath The path where the copy should be created
* @param Trailer The trailer found in 'PackagePath' that is already loaded
* @param Errors [out] Errors created by the function will be added here
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
*
* @return Returns true if the package was copied correctly, false otherwise. Note even when returning false a file might have been created at 'CopyPath'
*/
bool TryCopyPackageWithoutTrailer(const FPackagePath PackagePath, const FString& CopyPath, const FPackageTrailer& Trailer, TArray<FText>& Errors)
{
// TODO: Consider adding a custom copy routine to only copy the data we want, rather than copying the full file then truncating
const FString PackageFilePath = PackagePath.GetLocalFullPath();
if (IFileManager::Get().Copy(*CopyPath, *PackageFilePath) != ECopyResult::COPY_OK)
{
FText Message = FText::Format( LOCTEXT("Virtualization_CopyFailed", "Unable to copy package file '{0}' for virtualization"),
FText::FromString(PackagePath.GetDebugName()));
Errors.Add(Message);
return false;
}
const int64 PackageSizeWithoutTrailer = IFileManager::Get().FileSize(*PackageFilePath) - Trailer.GetTrailerLength();
{
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
TUniquePtr<IFileHandle> TempFileHandle(FPlatformFileManager::Get().GetPlatformFile().OpenWrite(*CopyPath, true));
if (!TempFileHandle.IsValid())
{
FText Message = FText::Format(LOCTEXT("Virtualization_TruncOpenFailed", "Failed to open package file for truncation'{0}' when virtualizing"),
FText::FromString(CopyPath));
Errors.Add(Message);
return false;
}
if (!TempFileHandle->Truncate(PackageSizeWithoutTrailer))
{
FText Message = FText::Format(LOCTEXT("Virtualization_TruncFailed", "Failed to truncate '{0}' when virtualizing"),
FText::FromString(CopyPath));
Errors.Add(Message);
return false;
}
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
}
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
#if UE_VALIDATE_TRUNCATED_PACKAGE
// Validate we didn't break the package
if (!ValidatePackage(CopyPath, Errors))
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
{
return false;
}
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
#endif //UE_VALIDATE_TRUNCATED_PACKAGE
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
return true;
}
void OnPrePackageSubmission(const TArray<FString>& FilesToSubmit, TArray<FText>& DescriptionTags, TArray<FText>& Errors)
{
TRACE_CPUPROFILER_EVENT_SCOPE(UE::Virtualization::OnPrePackageSubmission);
IVirtualizationSystem& System = IVirtualizationSystem::Get();
// TODO: We could check to see if the package is virtualized even if it is disabled for the project
// as a safety feature?
if (!System.IsEnabled())
{
return;
}
// Can't virtualize if the payload trailer system is disabled
if (!FPackageTrailer::IsEnabled())
{
return;
}
const double StartTime = FPlatformTime::Seconds();
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
// Other systems may have added errors to this array, we need to check so later we can determine if this function added any additional errors.
const int32 NumErrors = Errors.Num();
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
struct FPackageInfo
{
FPackagePath Path;
FPackageTrailer Trailer;
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
TArray<FPayloadId> LocalPayloads;
int32 PayloadIndex = INDEX_NONE;
bool bWasTrailerUpdated = false;
};
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
UE_LOG(LogVirtualization, Display, TEXT("Considering %d file(s) for virtualization"), FilesToSubmit.Num());
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
TArray<FPackageInfo> Packages;
Packages.Reserve(FilesToSubmit.Num());
TArray<FPayloadId> AllLocalPayloads;
AllLocalPayloads.Reserve(FilesToSubmit.Num());
// From the list of files to submit we need to find all of the valid packages that contain
// local payloads that need to be virtualized.
int64 TotalPayloadsToCheck = 0;
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
for (const FString& AbsoluteFilePath : FilesToSubmit)
{
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
FPackagePath PackagePath = FPackagePath::FromLocalPath(AbsoluteFilePath);
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
// TODO: How to handle text packages
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
if (FPackageName::IsPackageExtension(PackagePath.GetHeaderExtension()) || FPackageName::IsTextPackageExtension(PackagePath.GetHeaderExtension()))
{
FPackageTrailer Trailer;
if (FPackageTrailer::TryLoadFromPackage(PackagePath, Trailer))
{
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
FPackageInfo PkgInfo;
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
PkgInfo.Path = MoveTemp(PackagePath);
PkgInfo.Trailer = MoveTemp(Trailer);
PkgInfo.LocalPayloads = PkgInfo.Trailer.GetPayloads(EPayloadFilter::Local);
TotalPayloadsToCheck += PkgInfo.LocalPayloads.Num();
if (!PkgInfo.LocalPayloads.IsEmpty())
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
{
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
PkgInfo.PayloadIndex = AllLocalPayloads.Num();
AllLocalPayloads.Append(PkgInfo.LocalPayloads);
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
Packages.Emplace(MoveTemp(PkgInfo));
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
}
}
}
}
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
UE_LOG(LogVirtualization, Display, TEXT("Found %" INT64_FMT " payload(s) in %d package(s) that need to be examined for virtualization"), TotalPayloadsToCheck, Packages.Num());
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
TArray<FPayloadStatus> PayloadStatuses;
if (!System.DoPayloadsExist(AllLocalPayloads, EStorageType::Persistent, PayloadStatuses))
{
FText Message = LOCTEXT("Virtualization_DoesExistFail", "Failed to find the status of the payloads in the packages being submitted");
Errors.Add(Message);
return;
}
// Update payloads that are already in persistent storage and don't need to be pushed
int64 TotalPayloadsToVirtualize = 0;
for (FPackageInfo& PackageInfo : Packages)
{
check(PackageInfo.LocalPayloads.IsEmpty() || PackageInfo.PayloadIndex != INDEX_NONE); // If we have payloads we should have an index
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
#if UE_PRECHECK_PAYLOAD_STATUS
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
for (int32 Index = 0; Index < PackageInfo.LocalPayloads.Num(); ++Index)
{
if (PayloadStatuses[PackageInfo.PayloadIndex + Index] == FPayloadStatus::FoundAll)
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
{
if (PackageInfo.Trailer.UpdatePayloadAsVirtualized(PackageInfo.LocalPayloads[Index]))
{
PackageInfo.bWasTrailerUpdated = true;
}
else
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
{
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
FText Message = FText::Format( LOCTEXT("Virtualization_UpdateStatusFailed", "Unable to update the status for the payload '{0}' in the package '{1}'"),
FText::FromString(PackageInfo.LocalPayloads[Index].ToString()),
FText::FromString(PackageInfo.Path.GetDebugName()));
Errors.Add(Message);
return;
}
}
}
// If we made changes we should recalculate the local payloads left
if (PackageInfo.bWasTrailerUpdated)
{
PackageInfo.LocalPayloads = PackageInfo.Trailer.GetPayloads(EPayloadFilter::Local);
}
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
#endif
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
PackageInfo.PayloadIndex = INDEX_NONE;
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
TotalPayloadsToVirtualize += PackageInfo.LocalPayloads.Num();
}
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
UE_LOG(LogVirtualization, Display, TEXT("Found %" INT64_FMT " payload(s) that need to be pushed to persistent virtualized storage"), TotalPayloadsToVirtualize);
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
// TODO Optimization: In theory we could have many packages sharing the same payload and we only need to push once
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
TArray<Virtualization::FPushRequest> PayloadsToSubmit;
PayloadsToSubmit.Reserve(TotalPayloadsToVirtualize);
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
// Push any remaining local payload to the persistent backends
for (FPackageInfo& PackageInfo : Packages)
{
if (PackageInfo.LocalPayloads.IsEmpty())
{
continue;
}
TUniquePtr<FArchive> PackageAr = IPackageResourceManager::Get().OpenReadExternalResource(EPackageExternalResource::WorkspaceDomainFile, PackageInfo.Path.GetPackageName());
if (!PackageAr.IsValid())
{
FText Message = FText::Format( LOCTEXT("Virtualization_PkgOpen", "Failed to open the package '{1}' for reading"),
FText::FromString(PackageInfo.Path.GetDebugName()));
Errors.Add(Message);
return;
}
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
PackageInfo.PayloadIndex = PayloadsToSubmit.Num();
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
for (const FPayloadId& PayloadId : PackageInfo.LocalPayloads)
{
checkf(PayloadId.IsValid(), TEXT("PackageTrailer for package '%s' should not contain invalid FPayloadIds"), *PackageInfo.Path.GetDebugName());
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
FCompressedBuffer Payload = PackageInfo.Trailer.LoadPayload(PayloadId, *PackageAr);
if (PayloadId != FIoHash(Payload.GetRawHash()))
{
FText Message = FText::Format( LOCTEXT("Virtualization_WrongPayload", "Package {0} loaded an incorrect payload from the trailer. Expected '{1}' Loaded '{2}'"),
FText::FromString(PackageInfo.Path.GetDebugName()),
FText::FromString(PayloadId.ToString()),
FText::FromString(LexToString(Payload.GetRawHash())));
Errors.Add(Message);
return;
}
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
if (!Payload)
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
{
FText Message = FText::Format( LOCTEXT("Virtualization_MissingPayload", "Unable to find the payload '{0}' in the local storage of package '{1}'"),
FText::FromString(PayloadId.ToString()),
FText::FromString(PackageInfo.Path.GetDebugName()));
Errors.Add(Message);
return;
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
}
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
PayloadsToSubmit.Emplace(PayloadId, MoveTemp(Payload), PackageInfo.Path.GetDebugName());
}
}
if (!System.PushData(PayloadsToSubmit, EStorageType::Persistent))
{
FText Message = LOCTEXT("Virtualization_PushFailure", "Failed to push payloads");
Errors.Add(Message);
return;
}
// Update the package info for the submitted payloads
for (FPackageInfo& PackageInfo : Packages)
{
for (int32 Index = 0; Index < PackageInfo.LocalPayloads.Num(); ++Index)
{
const Virtualization::FPushRequest& Request = PayloadsToSubmit[PackageInfo.PayloadIndex + Index];
check(Request.Identifier == PackageInfo.LocalPayloads[Index]);
if (Request.Status == Virtualization::FPushRequest::EStatus::Success)
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
{
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
if (PackageInfo.Trailer.UpdatePayloadAsVirtualized(Request.Identifier))
{
PackageInfo.bWasTrailerUpdated = true;
}
else
{
FText Message = FText::Format( LOCTEXT("Virtualization_UpdateStatusFailed", "Unable to update the status for the payload '{0}' in the package '{1}'"),
FText::FromString(Request.Identifier.ToString()),
FText::FromString(PackageInfo.Path.GetDebugName()));
Errors.Add(Message);
return;
}
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
}
}
}
TArray<TPair<FPackagePath, FString>> PackagesToReplace;
// Any package with an updated trailer needs to be copied and an updated trailer appended
for (FPackageInfo& PackageInfo : Packages)
{
if (!PackageInfo.bWasTrailerUpdated)
{
continue;
}
// TODO: Implement/test mixed package trailer support
// At the moment we assume that all the payloads in a package will virtualized or none of them will. If none of the payloads have virtualized then
// the package won't have made it this far so we should check that no package at this point has a local payload.
// We should be able to support mixed packages (to support more advanced filtering and min payload sizes etc) but it hasn't been tested yet so I'd
// rather prevent this condition from occurring until it is properly tested.
check(PackageInfo.Trailer.GetNumPayloads(EPayloadFilter::Local) == 0);
const FPackagePath& PackagePath = PackageInfo.Path; // No need to validate path, we checked this earlier
const FString PackageFilePath = PackagePath.GetLocalFullPath();
const FString BaseName = FPaths::GetBaseFilename(PackagePath.GetPackageName());
const FString TempFilePath = FPaths::CreateTempFilename(*FPaths::ProjectSavedDir(), *BaseName.Left(32));
// TODO Optimization: Combine TryCopyPackageWithoutTrailer with the appending of the new trailer to avoid opening multiple handles
// Create copy of package minus the trailer the trailer
if (!TryCopyPackageWithoutTrailer(PackagePath, TempFilePath, PackageInfo.Trailer, Errors))
{
return;
}
TUniquePtr<FArchive> PackageAr = IPackageResourceManager::Get().OpenReadExternalResource(EPackageExternalResource::WorkspaceDomainFile, PackagePath.GetPackageName());
if (!PackageAr.IsValid())
{
FText Message = FText::Format( LOCTEXT("Virtualization_PkgOpen", "Failed to open the package '{1}' for reading"),
FText::FromString(PackagePath.GetDebugName()));
Errors.Add(Message);
return;
}
TUniquePtr<FArchive> CopyAr(IFileManager::Get().CreateFileWriter(*TempFilePath, EFileWrite::FILEWRITE_Append));
if (!CopyAr.IsValid())
{
FText Message = FText::Format( LOCTEXT("Virtualization_TrailerAppendOpen", "Unable to open '{0}' to append the trailer'"),
FText::FromString(TempFilePath));
Errors.Add(Message);
return;
}
FPackageTrailerBuilder TrailerBuilder = FPackageTrailerBuilder::Create(PackageInfo.Trailer, *PackageAr, PackagePath);
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
if (!TrailerBuilder.BuildAndAppendTrailer(nullptr, *CopyAr))
{
FText Message = FText::Format( LOCTEXT("Virtualization_TrailerAppend", "Failed to append the trailer to '{0}'"),
FText::FromString(TempFilePath));
Errors.Add(Message);
return;
}
// Now that we have successfully created a new version of the package with an updated trailer
// we need to mark that it should replace the original package.
PackagesToReplace.Emplace(PackagePath, TempFilePath);
}
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
UE_LOG(LogVirtualization, Display, TEXT("%d package(s) had their trailer container modified and need to be updated"), PackagesToReplace.Num());
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
if (NumErrors == Errors.Num())
{
// TODO: Consider using the SavePackage model (move the original, then replace, so we can restore all of the original packages if needed)
// having said that, once a package is in PackagesToReplace it should still be safe to submit so maybe we don't need this level of protection?
// We need to reset the loader of any package that we want to re-save over
for (const TPair<FPackagePath, FString>& Iterator : PackagesToReplace)
{
UPackage* Package = FindObjectFast<UPackage>(nullptr, Iterator.Key.GetPackageFName());
if (Package != nullptr)
{
ResetLoadersForSave(Package, *Iterator.Key.GetLocalFullPath());
}
}
// Since we had no errors we can now replace all of the packages that were virtualized data with the virtualized replacement file.
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
for(const TPair<FPackagePath,FString>& Iterator : PackagesToReplace)
{
const FString OriginalPackagePath = Iterator.Key.GetLocalFullPath();
const FString& NewPackagePath = Iterator.Value;
if (!IFileManager::Get().Move(*OriginalPackagePath, *NewPackagePath))
{
FText Message = FText::Format( LOCTEXT("Virtualization_MoveFailed", "Unable to replace the package '{0}' with the virtualized version"),
FText::FromString(Iterator.Key.GetDebugName()));
Errors.Add(Message);
continue;
}
}
}
// If we had no new errors add the validation tag to indicate that the packages are safe for submission.
// TODO: Currently this is a simple tag to make it easier for us to track which assets were submitted via the
// virtualization process in a test project. This should be expanded when we add proper p4 server triggers.
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
if (NumErrors == Errors.Num())
{
FText Tag = FText::FromString(TEXT("#virtualized"));
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
DescriptionTags.Add(Tag);
}
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
const double TimeInSeconds = FPlatformTime::Seconds() - StartTime;
UE_LOG(LogVirtualization, Verbose, TEXT("Virtualization pre submit check took %.3f(s)"), TimeInSeconds);
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
}
} // namespace UE::Virtualization
#undef LOCTEXT_NAMESPACE
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00