Files
UnrealEngineUWP/Engine/Source/Developer/Virtualization/Private/PackageVirtualizationProcess.cpp

537 lines
19 KiB
C++
Raw Normal View History

// Copyright Epic Games, Inc. All Rights Reserved.
#include "PackageVirtualizationProcess.h"
#include "Containers/UnrealString.h"
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
#include "HAL/FileManager.h"
#include "HAL/PlatformFileManager.h"
#include "HAL/PlatformTime.h"
#include "Internationalization/Internationalization.h"
#include "Misc/PackageName.h"
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
#include "Misc/Paths.h"
#include "Misc/ScopeExit.h"
#include "Misc/ScopedSlowTask.h"
Add a rehydration command to the stand alone virtualization tool, making it easier to reverse the effects of asset virtualization. Unlike previous processes, this one does not require that we load the package and will just manipulate the data storaged in the package trailer. #rb Sebastian.Nordgren #rnx #jira UE-156436 #preflight 62c287f9a3568e30664eb94f ### VA Standalone Tool - We now plan to add much more functionality to the tool than just virtualizing and submitting changelists, so to make this easier I am moving the tool towards a design where it should be fairly easy to add new functionality. - Added FCommand, which is a base class for adding new functionality, simple derive from FCommand and hook it up at the appropriate locations. -- In the future it should be possible for new command types to automatically register themselves to be initiated from the command line. There should be no need to edit UnrealVirtualizationToolApp to add a new command but this will be done as an additional work item. -- At the moment FCommand comes with a number of utility methods to call that cover some common source control commands. -- The original functionality has not yet been moved to the command system and so the code is a little bit weird at the moment. Updating older code to the new system will be done as an additional work item. - FProject/FPlugin have been moved to their own code files. ### Rehydrate Command - The rehydrate command will take a number of packages, check them out of source control and then attempt to virtualize them. - At the moment the chekout logic is fairly basic, we just check out every package supplied, we don't check if the package is virtualized or not yet. This can be improved in additional work items. Ideally by the end of command the only packages that we have checked out should also be rehydrated. - At the moment the command can either take a path of a specific package, a path of a directory to find packages in, or a changelist containing packages that should be rehydrated. - A cleint spec (workspace) can optionally be provided, but if not supplied we will attempt to find a client spec for which to check out the packages. - Currently we will check out the packages to the default change list. ### Rehydrate process - Added the rehydration process in it's own code files in the virtualization module. Like the virtualization process this is exposed in a public header file and no via the Core interface which means it is very specific to our module/implementation. - The process expects that the caller will have checked out any required packages from source control. It will treat being unable to update a package file as an error. - Added PackageUtils.h/.cpp and moved some of the generic code from the virtualization process code there so that it can be shared by the rehydration process. ### Misc Moving away from the using things like FPackagePath as that requires that the correct mount points have been registered for a project and at the moment (with the flakiness of FConfig*) it seems that the best idea would be to prefer absolute file paths where possible. [CL 20982284 by paul chipchase in ue5-main branch]
2022-07-07 06:54:33 -04:00
#include "PackageUtils.h"
#include "Serialization/EditorBulkData.h"
#include "UObject/Linker.h"
#include "UObject/Package.h"
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
#include "UObject/PackageResourceManager.h"
#include "UObject/PackageTrailer.h"
#include "UObject/UObjectGlobals.h"
#include "Virtualization/VirtualizationSystem.h"
Temp fix to prevent payloads that should be filtered out from virtualizing from being virtualized if they are already in persistent storage, which was causing unexpected packages to become virtualized. #rb Per.Larsson #rnx #jira UE-156312 #preflight 62a99553943e7bb256fe174c ### Problem - Firstly this fix is a hack and not intended for shipping 5.1, but the real fix is already planned work, which involves moving the filtering from the push process to being applied to each payload when the package is saved, just as the asset filtering is done at the moment. -- Doing the proper fix will take a bit of time and testing so we need a quicker fix so that I can re-enable the virtualization process to unblock some workflows. - All new code is wrapped in a single preprocessor define 'ENABLE_FILTERING_HACK' making the code easy to identify and remove. - Submitting this hack will cause the priority of the real fix to be bumped in the backlog so that we can remove the hack asap. - The problem code comes from a pass we do on the payloads to check if they are already in persistent storage or not. If the payload is then we just change it to be virtualized and don't attempt to push. In addition to fixing the filtering there is also a work item to investigate if we really want to keep this logic or if we want to remove it anyway. This might be looked into before I look into the filtering fix as an alternative way to remove the hack. ### Fix - After we have queried the system to check which of the local payloads are already in persistent storage, BUT before we change them over to be virtualized we now run a new pass, checking which payloads should be filtered. -- This is a bespoke code path expected only to be used here, so in this case it is not in the IVirtualizationSystem API but requires us to cast to FVirtualizationManager to gain access. This should stop anyone else from using the hack before it is removed. - The filtering check mimics the filtering that would be run on a normal push operation. - Once done we can check for any payload that would be filtered, and if so, set the status to not found. This prevents the payload from being changed to be virtualized and means it will be included in the normal push operation where it will be correctly filtered. - None of this is the best way to do it, but it did seem like the easiest way to add the hack in isolation as no existing code was changed, so it should be simple to just 'delete' the hack when ready. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 20667712 via CL 20667730 via CL 20667733 #ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017) [CL 20668639 by paul chipchase in ue5-main branch]
2022-06-15 08:56:57 -04:00
#include "VirtualizationManager.h"
Add a new overload to IVirtualizationSystem::TryVirtualizePackages, which takes additional options (via a bitfield enum) and returns more info about the resulting process. The original version has been drepcated. #rb Per.Larsson #jira UE-169626 #rnx #preflight 639c4112012902cb8db43e13 - This allows us to provide the user with more ways to customize the virtualization and return more detailed info about it if the calling code wishes to log additional info. In both cases we can extend the options and the data returned without changing the API. - Previously if we virtualized a package that was not checked out in revision control we would warn the user and then skip updating the package file on disk. This means the payloads would be uploaded but the user would be left with no local changes. Since sometimes we know we don't need to check out any package (virtualizing the packages in a change list for example) we don't want to always incur the cost of polling reivision control to see which packages do need checking out. This is why we now allow the caller to request package files be checked out via the new options enum EVirtualizationOptions. -- If the EVirtualizationOptions::Checkout flag is provided we will poll the revision control status of all package files and then check out those which need it. -- We still check if packages can be modified and warn the user if they can't, as package files could be locked in other ways. - Added a new utility function to SourceControlUtilties to make it easier to check out packages. There is similar functionality elsewhere in the code base but the virtualization module is too low level to make use of it. - Updated existing code that calls ::TryVirtualizePackages and add cases of ''using namespace UE::Virtualization' where required to improve readability. - The UnrealVirtualizationTool now supports a new cmdline option "-checkout" that can be used when virtualizing packages. This will checkout any package that was actually virtualized so the result can be saved back out to the workspace domain. This means we no longer require the caller to have checked out the packages before running the tool. [CL 23536832 by paul chipchase in ue5-main branch]
2022-12-16 06:25:07 -05:00
#include "VirtualizationSourceControlUtilities.h"
#include "VirtualizationUtilities.h"
Add a rehydration command to the stand alone virtualization tool, making it easier to reverse the effects of asset virtualization. Unlike previous processes, this one does not require that we load the package and will just manipulate the data storaged in the package trailer. #rb Sebastian.Nordgren #rnx #jira UE-156436 #preflight 62c287f9a3568e30664eb94f ### VA Standalone Tool - We now plan to add much more functionality to the tool than just virtualizing and submitting changelists, so to make this easier I am moving the tool towards a design where it should be fairly easy to add new functionality. - Added FCommand, which is a base class for adding new functionality, simple derive from FCommand and hook it up at the appropriate locations. -- In the future it should be possible for new command types to automatically register themselves to be initiated from the command line. There should be no need to edit UnrealVirtualizationToolApp to add a new command but this will be done as an additional work item. -- At the moment FCommand comes with a number of utility methods to call that cover some common source control commands. -- The original functionality has not yet been moved to the command system and so the code is a little bit weird at the moment. Updating older code to the new system will be done as an additional work item. - FProject/FPlugin have been moved to their own code files. ### Rehydrate Command - The rehydrate command will take a number of packages, check them out of source control and then attempt to virtualize them. - At the moment the chekout logic is fairly basic, we just check out every package supplied, we don't check if the package is virtualized or not yet. This can be improved in additional work items. Ideally by the end of command the only packages that we have checked out should also be rehydrated. - At the moment the command can either take a path of a specific package, a path of a directory to find packages in, or a changelist containing packages that should be rehydrated. - A cleint spec (workspace) can optionally be provided, but if not supplied we will attempt to find a client spec for which to check out the packages. - Currently we will check out the packages to the default change list. ### Rehydrate process - Added the rehydration process in it's own code files in the virtualization module. Like the virtualization process this is exposed in a public header file and no via the Core interface which means it is very specific to our module/implementation. - The process expects that the caller will have checked out any required packages from source control. It will treat being unable to update a package file as an error. - Added PackageUtils.h/.cpp and moved some of the generic code from the virtualization process code there so that it can be shared by the rehydration process. ### Misc Moving away from the using things like FPackagePath as that requires that the correct mount points have been registered for a project and at the moment (with the flakiness of FConfig*) it seems that the best idea would be to prefer absolute file paths where possible. [CL 20982284 by paul chipchase in ue5-main branch]
2022-07-07 06:54:33 -04:00
#define LOCTEXT_NAMESPACE "Virtualization"
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
namespace UE::Virtualization
{
When virtualizing payloads we now load the payloads from disk on demand rather than all at once. This allows us to avoid memory spikes when virtualizing large amounts of data. #rb Per.Larsson #rnx #jira UE-151000 #preflight 627271c83b2ed6f85363f53a - In the old code when submitting packages for virtualization we would load all local payloads into memory then pass them to the virtualization system to be pushed to persistent storage. This works fine for an average users submits but when resaving large projects we can have submits with thousands of packages containing gigabytes of payloads. - FPushRequest has been changed to accept either a payload directly (in the form of FCompressedBuffer) or accept a reference to a IPayloadProvider which will return the payload when asked. - Most of the existing backends make no use of the batched pushing feature and only push one payload at a time. The default implementation of this will be to ask each request one at a time for the payload and then pass it to the backend for a single push. If the IPayloadProvider is being used we will therefor only ever have one payload loaded at a time avoiding memory spikes. - The source control backend currently does implement the batched pushing feature. This backend writes each payload to disk before submitting them to perforce, so all we have to do is make sure that request one payload, write it to disk, then discard the payload to avoid memory spikes. - This change required that FPushRequest make it's members private and provide accessors to them, as the caller shouldn't need to care if the payload is loaded or comes via the provider. - NOTE that this implementation is less efficient if we have packages containing many different payloads as we will end up opening the package, parsing the trailer then load a payload many times over, where as the original code would load all payloads from the file in one go. In practice, given the overhead of the source control backend this didn't make a huge difference and is probably not worth the effort to optimize at this point. [CL 20040609 by paul chipchase in ue5-main branch]
2022-05-04 08:58:50 -04:00
/**
* Implementation of the IPayloadProvider interface so that payloads can be requested on demand
* when they are being virtualized.
*
* This implementation is not optimized. If a package holds many payloads that are all virtualized
* we will end up loading the same trailer over and over, as well as opening the same package file
* for read many times.
*
* So far this has shown to be a rounding error compared to the actual cost of virtualization
* and so implementing any level of caching has been left as a future task.
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
*
* TODO: Implement a MRU cache for payloads to prevent loading the same payload off disk many
* times for different backends if it will not cause a huge memory spike.
When virtualizing payloads we now load the payloads from disk on demand rather than all at once. This allows us to avoid memory spikes when virtualizing large amounts of data. #rb Per.Larsson #rnx #jira UE-151000 #preflight 627271c83b2ed6f85363f53a - In the old code when submitting packages for virtualization we would load all local payloads into memory then pass them to the virtualization system to be pushed to persistent storage. This works fine for an average users submits but when resaving large projects we can have submits with thousands of packages containing gigabytes of payloads. - FPushRequest has been changed to accept either a payload directly (in the form of FCompressedBuffer) or accept a reference to a IPayloadProvider which will return the payload when asked. - Most of the existing backends make no use of the batched pushing feature and only push one payload at a time. The default implementation of this will be to ask each request one at a time for the payload and then pass it to the backend for a single push. If the IPayloadProvider is being used we will therefor only ever have one payload loaded at a time avoiding memory spikes. - The source control backend currently does implement the batched pushing feature. This backend writes each payload to disk before submitting them to perforce, so all we have to do is make sure that request one payload, write it to disk, then discard the payload to avoid memory spikes. - This change required that FPushRequest make it's members private and provide accessors to them, as the caller shouldn't need to care if the payload is loaded or comes via the provider. - NOTE that this implementation is less efficient if we have packages containing many different payloads as we will end up opening the package, parsing the trailer then load a payload many times over, where as the original code would load all payloads from the file in one go. In practice, given the overhead of the source control backend this didn't make a huge difference and is probably not worth the effort to optimize at this point. [CL 20040609 by paul chipchase in ue5-main branch]
2022-05-04 08:58:50 -04:00
*/
class FWorkspaceDomainPayloadProvider final : public IPayloadProvider
{
public:
FWorkspaceDomainPayloadProvider() = default;
virtual ~FWorkspaceDomainPayloadProvider() = default;
/** Register the payload with it's trailer and package name so that we can access it later as needed */
void RegisterPayload(const FIoHash& PayloadId, uint64 SizeOnDisk, const FString& PackageName)
{
if (!PayloadId.IsZero())
{
PayloadLookupTable.Emplace(PayloadId, FPayloadData(SizeOnDisk, PackageName));
}
}
private:
virtual FCompressedBuffer RequestPayload(const FIoHash& Identifier) override
{
if (Identifier.IsZero())
{
return FCompressedBuffer();
}
const FPayloadData* Data = PayloadLookupTable.Find(Identifier);
if (Data == nullptr)
{
Temp fix to prevent payloads that should be filtered out from virtualizing from being virtualized if they are already in persistent storage, which was causing unexpected packages to become virtualized. #rb Per.Larsson #rnx #jira UE-156312 #preflight 62a99553943e7bb256fe174c ### Problem - Firstly this fix is a hack and not intended for shipping 5.1, but the real fix is already planned work, which involves moving the filtering from the push process to being applied to each payload when the package is saved, just as the asset filtering is done at the moment. -- Doing the proper fix will take a bit of time and testing so we need a quicker fix so that I can re-enable the virtualization process to unblock some workflows. - All new code is wrapped in a single preprocessor define 'ENABLE_FILTERING_HACK' making the code easy to identify and remove. - Submitting this hack will cause the priority of the real fix to be bumped in the backlog so that we can remove the hack asap. - The problem code comes from a pass we do on the payloads to check if they are already in persistent storage or not. If the payload is then we just change it to be virtualized and don't attempt to push. In addition to fixing the filtering there is also a work item to investigate if we really want to keep this logic or if we want to remove it anyway. This might be looked into before I look into the filtering fix as an alternative way to remove the hack. ### Fix - After we have queried the system to check which of the local payloads are already in persistent storage, BUT before we change them over to be virtualized we now run a new pass, checking which payloads should be filtered. -- This is a bespoke code path expected only to be used here, so in this case it is not in the IVirtualizationSystem API but requires us to cast to FVirtualizationManager to gain access. This should stop anyone else from using the hack before it is removed. - The filtering check mimics the filtering that would be run on a normal push operation. - Once done we can check for any payload that would be filtered, and if so, set the status to not found. This prevents the payload from being changed to be virtualized and means it will be included in the normal push operation where it will be correctly filtered. - None of this is the best way to do it, but it did seem like the easiest way to add the hack in isolation as no existing code was changed, so it should be simple to just 'delete' the hack when ready. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 20667712 via CL 20667730 via CL 20667733 #ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017) [CL 20668639 by paul chipchase in ue5-main branch]
2022-06-15 08:56:57 -04:00
UE_LOG(LogVirtualization, Error, TEXT("FWorkspaceDomainPayloadProvider was unable to find a payload with the identifier '%s'"),
When virtualizing payloads we now load the payloads from disk on demand rather than all at once. This allows us to avoid memory spikes when virtualizing large amounts of data. #rb Per.Larsson #rnx #jira UE-151000 #preflight 627271c83b2ed6f85363f53a - In the old code when submitting packages for virtualization we would load all local payloads into memory then pass them to the virtualization system to be pushed to persistent storage. This works fine for an average users submits but when resaving large projects we can have submits with thousands of packages containing gigabytes of payloads. - FPushRequest has been changed to accept either a payload directly (in the form of FCompressedBuffer) or accept a reference to a IPayloadProvider which will return the payload when asked. - Most of the existing backends make no use of the batched pushing feature and only push one payload at a time. The default implementation of this will be to ask each request one at a time for the payload and then pass it to the backend for a single push. If the IPayloadProvider is being used we will therefor only ever have one payload loaded at a time avoiding memory spikes. - The source control backend currently does implement the batched pushing feature. This backend writes each payload to disk before submitting them to perforce, so all we have to do is make sure that request one payload, write it to disk, then discard the payload to avoid memory spikes. - This change required that FPushRequest make it's members private and provide accessors to them, as the caller shouldn't need to care if the payload is loaded or comes via the provider. - NOTE that this implementation is less efficient if we have packages containing many different payloads as we will end up opening the package, parsing the trailer then load a payload many times over, where as the original code would load all payloads from the file in one go. In practice, given the overhead of the source control backend this didn't make a huge difference and is probably not worth the effort to optimize at this point. [CL 20040609 by paul chipchase in ue5-main branch]
2022-05-04 08:58:50 -04:00
*LexToString(Identifier));
return FCompressedBuffer();
}
TUniquePtr<FArchive> PackageAr = IPackageResourceManager::Get().OpenReadExternalResource(EPackageExternalResource::WorkspaceDomainFile, *Data->PackageName);
if (!PackageAr.IsValid())
{
Temp fix to prevent payloads that should be filtered out from virtualizing from being virtualized if they are already in persistent storage, which was causing unexpected packages to become virtualized. #rb Per.Larsson #rnx #jira UE-156312 #preflight 62a99553943e7bb256fe174c ### Problem - Firstly this fix is a hack and not intended for shipping 5.1, but the real fix is already planned work, which involves moving the filtering from the push process to being applied to each payload when the package is saved, just as the asset filtering is done at the moment. -- Doing the proper fix will take a bit of time and testing so we need a quicker fix so that I can re-enable the virtualization process to unblock some workflows. - All new code is wrapped in a single preprocessor define 'ENABLE_FILTERING_HACK' making the code easy to identify and remove. - Submitting this hack will cause the priority of the real fix to be bumped in the backlog so that we can remove the hack asap. - The problem code comes from a pass we do on the payloads to check if they are already in persistent storage or not. If the payload is then we just change it to be virtualized and don't attempt to push. In addition to fixing the filtering there is also a work item to investigate if we really want to keep this logic or if we want to remove it anyway. This might be looked into before I look into the filtering fix as an alternative way to remove the hack. ### Fix - After we have queried the system to check which of the local payloads are already in persistent storage, BUT before we change them over to be virtualized we now run a new pass, checking which payloads should be filtered. -- This is a bespoke code path expected only to be used here, so in this case it is not in the IVirtualizationSystem API but requires us to cast to FVirtualizationManager to gain access. This should stop anyone else from using the hack before it is removed. - The filtering check mimics the filtering that would be run on a normal push operation. - Once done we can check for any payload that would be filtered, and if so, set the status to not found. This prevents the payload from being changed to be virtualized and means it will be included in the normal push operation where it will be correctly filtered. - None of this is the best way to do it, but it did seem like the easiest way to add the hack in isolation as no existing code was changed, so it should be simple to just 'delete' the hack when ready. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 20667712 via CL 20667730 via CL 20667733 #ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017) [CL 20668639 by paul chipchase in ue5-main branch]
2022-06-15 08:56:57 -04:00
UE_LOG(LogVirtualization, Error, TEXT("FWorkspaceDomainPayloadProvider was unable to open the package '%s' for reading"),
When virtualizing payloads we now load the payloads from disk on demand rather than all at once. This allows us to avoid memory spikes when virtualizing large amounts of data. #rb Per.Larsson #rnx #jira UE-151000 #preflight 627271c83b2ed6f85363f53a - In the old code when submitting packages for virtualization we would load all local payloads into memory then pass them to the virtualization system to be pushed to persistent storage. This works fine for an average users submits but when resaving large projects we can have submits with thousands of packages containing gigabytes of payloads. - FPushRequest has been changed to accept either a payload directly (in the form of FCompressedBuffer) or accept a reference to a IPayloadProvider which will return the payload when asked. - Most of the existing backends make no use of the batched pushing feature and only push one payload at a time. The default implementation of this will be to ask each request one at a time for the payload and then pass it to the backend for a single push. If the IPayloadProvider is being used we will therefor only ever have one payload loaded at a time avoiding memory spikes. - The source control backend currently does implement the batched pushing feature. This backend writes each payload to disk before submitting them to perforce, so all we have to do is make sure that request one payload, write it to disk, then discard the payload to avoid memory spikes. - This change required that FPushRequest make it's members private and provide accessors to them, as the caller shouldn't need to care if the payload is loaded or comes via the provider. - NOTE that this implementation is less efficient if we have packages containing many different payloads as we will end up opening the package, parsing the trailer then load a payload many times over, where as the original code would load all payloads from the file in one go. In practice, given the overhead of the source control backend this didn't make a huge difference and is probably not worth the effort to optimize at this point. [CL 20040609 by paul chipchase in ue5-main branch]
2022-05-04 08:58:50 -04:00
*Data->PackageName);
return FCompressedBuffer();
}
PackageAr->Seek(PackageAr->TotalSize());
FPackageTrailer Trailer;
if (!Trailer.TryLoadBackwards(*PackageAr))
{
Temp fix to prevent payloads that should be filtered out from virtualizing from being virtualized if they are already in persistent storage, which was causing unexpected packages to become virtualized. #rb Per.Larsson #rnx #jira UE-156312 #preflight 62a99553943e7bb256fe174c ### Problem - Firstly this fix is a hack and not intended for shipping 5.1, but the real fix is already planned work, which involves moving the filtering from the push process to being applied to each payload when the package is saved, just as the asset filtering is done at the moment. -- Doing the proper fix will take a bit of time and testing so we need a quicker fix so that I can re-enable the virtualization process to unblock some workflows. - All new code is wrapped in a single preprocessor define 'ENABLE_FILTERING_HACK' making the code easy to identify and remove. - Submitting this hack will cause the priority of the real fix to be bumped in the backlog so that we can remove the hack asap. - The problem code comes from a pass we do on the payloads to check if they are already in persistent storage or not. If the payload is then we just change it to be virtualized and don't attempt to push. In addition to fixing the filtering there is also a work item to investigate if we really want to keep this logic or if we want to remove it anyway. This might be looked into before I look into the filtering fix as an alternative way to remove the hack. ### Fix - After we have queried the system to check which of the local payloads are already in persistent storage, BUT before we change them over to be virtualized we now run a new pass, checking which payloads should be filtered. -- This is a bespoke code path expected only to be used here, so in this case it is not in the IVirtualizationSystem API but requires us to cast to FVirtualizationManager to gain access. This should stop anyone else from using the hack before it is removed. - The filtering check mimics the filtering that would be run on a normal push operation. - Once done we can check for any payload that would be filtered, and if so, set the status to not found. This prevents the payload from being changed to be virtualized and means it will be included in the normal push operation where it will be correctly filtered. - None of this is the best way to do it, but it did seem like the easiest way to add the hack in isolation as no existing code was changed, so it should be simple to just 'delete' the hack when ready. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 20667712 via CL 20667730 via CL 20667733 #ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017) [CL 20668639 by paul chipchase in ue5-main branch]
2022-06-15 08:56:57 -04:00
UE_LOG(LogVirtualization, Error, TEXT("FWorkspaceDomainPayloadProvider failed to load the package trailer from the package '%s'"),
When virtualizing payloads we now load the payloads from disk on demand rather than all at once. This allows us to avoid memory spikes when virtualizing large amounts of data. #rb Per.Larsson #rnx #jira UE-151000 #preflight 627271c83b2ed6f85363f53a - In the old code when submitting packages for virtualization we would load all local payloads into memory then pass them to the virtualization system to be pushed to persistent storage. This works fine for an average users submits but when resaving large projects we can have submits with thousands of packages containing gigabytes of payloads. - FPushRequest has been changed to accept either a payload directly (in the form of FCompressedBuffer) or accept a reference to a IPayloadProvider which will return the payload when asked. - Most of the existing backends make no use of the batched pushing feature and only push one payload at a time. The default implementation of this will be to ask each request one at a time for the payload and then pass it to the backend for a single push. If the IPayloadProvider is being used we will therefor only ever have one payload loaded at a time avoiding memory spikes. - The source control backend currently does implement the batched pushing feature. This backend writes each payload to disk before submitting them to perforce, so all we have to do is make sure that request one payload, write it to disk, then discard the payload to avoid memory spikes. - This change required that FPushRequest make it's members private and provide accessors to them, as the caller shouldn't need to care if the payload is loaded or comes via the provider. - NOTE that this implementation is less efficient if we have packages containing many different payloads as we will end up opening the package, parsing the trailer then load a payload many times over, where as the original code would load all payloads from the file in one go. In practice, given the overhead of the source control backend this didn't make a huge difference and is probably not worth the effort to optimize at this point. [CL 20040609 by paul chipchase in ue5-main branch]
2022-05-04 08:58:50 -04:00
*Data->PackageName);
return FCompressedBuffer();
}
FCompressedBuffer Payload = Trailer.LoadLocalPayload(Identifier, *PackageAr);
if (!Payload)
{
Temp fix to prevent payloads that should be filtered out from virtualizing from being virtualized if they are already in persistent storage, which was causing unexpected packages to become virtualized. #rb Per.Larsson #rnx #jira UE-156312 #preflight 62a99553943e7bb256fe174c ### Problem - Firstly this fix is a hack and not intended for shipping 5.1, but the real fix is already planned work, which involves moving the filtering from the push process to being applied to each payload when the package is saved, just as the asset filtering is done at the moment. -- Doing the proper fix will take a bit of time and testing so we need a quicker fix so that I can re-enable the virtualization process to unblock some workflows. - All new code is wrapped in a single preprocessor define 'ENABLE_FILTERING_HACK' making the code easy to identify and remove. - Submitting this hack will cause the priority of the real fix to be bumped in the backlog so that we can remove the hack asap. - The problem code comes from a pass we do on the payloads to check if they are already in persistent storage or not. If the payload is then we just change it to be virtualized and don't attempt to push. In addition to fixing the filtering there is also a work item to investigate if we really want to keep this logic or if we want to remove it anyway. This might be looked into before I look into the filtering fix as an alternative way to remove the hack. ### Fix - After we have queried the system to check which of the local payloads are already in persistent storage, BUT before we change them over to be virtualized we now run a new pass, checking which payloads should be filtered. -- This is a bespoke code path expected only to be used here, so in this case it is not in the IVirtualizationSystem API but requires us to cast to FVirtualizationManager to gain access. This should stop anyone else from using the hack before it is removed. - The filtering check mimics the filtering that would be run on a normal push operation. - Once done we can check for any payload that would be filtered, and if so, set the status to not found. This prevents the payload from being changed to be virtualized and means it will be included in the normal push operation where it will be correctly filtered. - None of this is the best way to do it, but it did seem like the easiest way to add the hack in isolation as no existing code was changed, so it should be simple to just 'delete' the hack when ready. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 20667712 via CL 20667730 via CL 20667733 #ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017) [CL 20668639 by paul chipchase in ue5-main branch]
2022-06-15 08:56:57 -04:00
UE_LOG(LogVirtualization, Error, TEXT("FWorkspaceDomainPayloadProvider was uanble to load the payload '%s' from the package '%s'"),
When virtualizing payloads we now load the payloads from disk on demand rather than all at once. This allows us to avoid memory spikes when virtualizing large amounts of data. #rb Per.Larsson #rnx #jira UE-151000 #preflight 627271c83b2ed6f85363f53a - In the old code when submitting packages for virtualization we would load all local payloads into memory then pass them to the virtualization system to be pushed to persistent storage. This works fine for an average users submits but when resaving large projects we can have submits with thousands of packages containing gigabytes of payloads. - FPushRequest has been changed to accept either a payload directly (in the form of FCompressedBuffer) or accept a reference to a IPayloadProvider which will return the payload when asked. - Most of the existing backends make no use of the batched pushing feature and only push one payload at a time. The default implementation of this will be to ask each request one at a time for the payload and then pass it to the backend for a single push. If the IPayloadProvider is being used we will therefor only ever have one payload loaded at a time avoiding memory spikes. - The source control backend currently does implement the batched pushing feature. This backend writes each payload to disk before submitting them to perforce, so all we have to do is make sure that request one payload, write it to disk, then discard the payload to avoid memory spikes. - This change required that FPushRequest make it's members private and provide accessors to them, as the caller shouldn't need to care if the payload is loaded or comes via the provider. - NOTE that this implementation is less efficient if we have packages containing many different payloads as we will end up opening the package, parsing the trailer then load a payload many times over, where as the original code would load all payloads from the file in one go. In practice, given the overhead of the source control backend this didn't make a huge difference and is probably not worth the effort to optimize at this point. [CL 20040609 by paul chipchase in ue5-main branch]
2022-05-04 08:58:50 -04:00
*LexToString(Identifier),
*Data->PackageName);
return FCompressedBuffer();
}
if (Identifier != FIoHash(Payload.GetRawHash()))
{
Temp fix to prevent payloads that should be filtered out from virtualizing from being virtualized if they are already in persistent storage, which was causing unexpected packages to become virtualized. #rb Per.Larsson #rnx #jira UE-156312 #preflight 62a99553943e7bb256fe174c ### Problem - Firstly this fix is a hack and not intended for shipping 5.1, but the real fix is already planned work, which involves moving the filtering from the push process to being applied to each payload when the package is saved, just as the asset filtering is done at the moment. -- Doing the proper fix will take a bit of time and testing so we need a quicker fix so that I can re-enable the virtualization process to unblock some workflows. - All new code is wrapped in a single preprocessor define 'ENABLE_FILTERING_HACK' making the code easy to identify and remove. - Submitting this hack will cause the priority of the real fix to be bumped in the backlog so that we can remove the hack asap. - The problem code comes from a pass we do on the payloads to check if they are already in persistent storage or not. If the payload is then we just change it to be virtualized and don't attempt to push. In addition to fixing the filtering there is also a work item to investigate if we really want to keep this logic or if we want to remove it anyway. This might be looked into before I look into the filtering fix as an alternative way to remove the hack. ### Fix - After we have queried the system to check which of the local payloads are already in persistent storage, BUT before we change them over to be virtualized we now run a new pass, checking which payloads should be filtered. -- This is a bespoke code path expected only to be used here, so in this case it is not in the IVirtualizationSystem API but requires us to cast to FVirtualizationManager to gain access. This should stop anyone else from using the hack before it is removed. - The filtering check mimics the filtering that would be run on a normal push operation. - Once done we can check for any payload that would be filtered, and if so, set the status to not found. This prevents the payload from being changed to be virtualized and means it will be included in the normal push operation where it will be correctly filtered. - None of this is the best way to do it, but it did seem like the easiest way to add the hack in isolation as no existing code was changed, so it should be simple to just 'delete' the hack when ready. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 20667712 via CL 20667730 via CL 20667733 #ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017) [CL 20668639 by paul chipchase in ue5-main branch]
2022-06-15 08:56:57 -04:00
UE_LOG(LogVirtualization, Error, TEXT("FWorkspaceDomainPayloadProvider loaded an incorrect payload from the package '%s'. Expected '%s' Loaded '%s'"),
When virtualizing payloads we now load the payloads from disk on demand rather than all at once. This allows us to avoid memory spikes when virtualizing large amounts of data. #rb Per.Larsson #rnx #jira UE-151000 #preflight 627271c83b2ed6f85363f53a - In the old code when submitting packages for virtualization we would load all local payloads into memory then pass them to the virtualization system to be pushed to persistent storage. This works fine for an average users submits but when resaving large projects we can have submits with thousands of packages containing gigabytes of payloads. - FPushRequest has been changed to accept either a payload directly (in the form of FCompressedBuffer) or accept a reference to a IPayloadProvider which will return the payload when asked. - Most of the existing backends make no use of the batched pushing feature and only push one payload at a time. The default implementation of this will be to ask each request one at a time for the payload and then pass it to the backend for a single push. If the IPayloadProvider is being used we will therefor only ever have one payload loaded at a time avoiding memory spikes. - The source control backend currently does implement the batched pushing feature. This backend writes each payload to disk before submitting them to perforce, so all we have to do is make sure that request one payload, write it to disk, then discard the payload to avoid memory spikes. - This change required that FPushRequest make it's members private and provide accessors to them, as the caller shouldn't need to care if the payload is loaded or comes via the provider. - NOTE that this implementation is less efficient if we have packages containing many different payloads as we will end up opening the package, parsing the trailer then load a payload many times over, where as the original code would load all payloads from the file in one go. In practice, given the overhead of the source control backend this didn't make a huge difference and is probably not worth the effort to optimize at this point. [CL 20040609 by paul chipchase in ue5-main branch]
2022-05-04 08:58:50 -04:00
*Data->PackageName,
*LexToString(Identifier),
*LexToString(Payload.GetRawHash()));
return FCompressedBuffer();
}
return Payload;
}
virtual uint64 GetPayloadSize(const FIoHash& Identifier) override
{
if (Identifier.IsZero())
{
return 0;
}
const FPayloadData* Data = PayloadLookupTable.Find(Identifier);
if (Data != nullptr)
{
return Data->SizeOnDisk;
}
else
{
Temp fix to prevent payloads that should be filtered out from virtualizing from being virtualized if they are already in persistent storage, which was causing unexpected packages to become virtualized. #rb Per.Larsson #rnx #jira UE-156312 #preflight 62a99553943e7bb256fe174c ### Problem - Firstly this fix is a hack and not intended for shipping 5.1, but the real fix is already planned work, which involves moving the filtering from the push process to being applied to each payload when the package is saved, just as the asset filtering is done at the moment. -- Doing the proper fix will take a bit of time and testing so we need a quicker fix so that I can re-enable the virtualization process to unblock some workflows. - All new code is wrapped in a single preprocessor define 'ENABLE_FILTERING_HACK' making the code easy to identify and remove. - Submitting this hack will cause the priority of the real fix to be bumped in the backlog so that we can remove the hack asap. - The problem code comes from a pass we do on the payloads to check if they are already in persistent storage or not. If the payload is then we just change it to be virtualized and don't attempt to push. In addition to fixing the filtering there is also a work item to investigate if we really want to keep this logic or if we want to remove it anyway. This might be looked into before I look into the filtering fix as an alternative way to remove the hack. ### Fix - After we have queried the system to check which of the local payloads are already in persistent storage, BUT before we change them over to be virtualized we now run a new pass, checking which payloads should be filtered. -- This is a bespoke code path expected only to be used here, so in this case it is not in the IVirtualizationSystem API but requires us to cast to FVirtualizationManager to gain access. This should stop anyone else from using the hack before it is removed. - The filtering check mimics the filtering that would be run on a normal push operation. - Once done we can check for any payload that would be filtered, and if so, set the status to not found. This prevents the payload from being changed to be virtualized and means it will be included in the normal push operation where it will be correctly filtered. - None of this is the best way to do it, but it did seem like the easiest way to add the hack in isolation as no existing code was changed, so it should be simple to just 'delete' the hack when ready. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 20667712 via CL 20667730 via CL 20667733 #ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017) [CL 20668639 by paul chipchase in ue5-main branch]
2022-06-15 08:56:57 -04:00
UE_LOG(LogVirtualization, Error, TEXT("FWorkspaceDomainPayloadProvider was unable to find a payload with the identifier '%s'"),
When virtualizing payloads we now load the payloads from disk on demand rather than all at once. This allows us to avoid memory spikes when virtualizing large amounts of data. #rb Per.Larsson #rnx #jira UE-151000 #preflight 627271c83b2ed6f85363f53a - In the old code when submitting packages for virtualization we would load all local payloads into memory then pass them to the virtualization system to be pushed to persistent storage. This works fine for an average users submits but when resaving large projects we can have submits with thousands of packages containing gigabytes of payloads. - FPushRequest has been changed to accept either a payload directly (in the form of FCompressedBuffer) or accept a reference to a IPayloadProvider which will return the payload when asked. - Most of the existing backends make no use of the batched pushing feature and only push one payload at a time. The default implementation of this will be to ask each request one at a time for the payload and then pass it to the backend for a single push. If the IPayloadProvider is being used we will therefor only ever have one payload loaded at a time avoiding memory spikes. - The source control backend currently does implement the batched pushing feature. This backend writes each payload to disk before submitting them to perforce, so all we have to do is make sure that request one payload, write it to disk, then discard the payload to avoid memory spikes. - This change required that FPushRequest make it's members private and provide accessors to them, as the caller shouldn't need to care if the payload is loaded or comes via the provider. - NOTE that this implementation is less efficient if we have packages containing many different payloads as we will end up opening the package, parsing the trailer then load a payload many times over, where as the original code would load all payloads from the file in one go. In practice, given the overhead of the source control backend this didn't make a huge difference and is probably not worth the effort to optimize at this point. [CL 20040609 by paul chipchase in ue5-main branch]
2022-05-04 08:58:50 -04:00
*LexToString(Identifier));
return 0;
}
}
/* This structure holds additional info about the payload that we might need later */
struct FPayloadData
{
FPayloadData(uint64 InSizeOnDisk, const FString& InPackageName)
: SizeOnDisk(InSizeOnDisk)
, PackageName(InPackageName)
{
}
uint64 SizeOnDisk;
FString PackageName;
};
TMap<FIoHash, FPayloadData> PayloadLookupTable;
};
Add a new overload to IVirtualizationSystem::TryVirtualizePackages, which takes additional options (via a bitfield enum) and returns more info about the resulting process. The original version has been drepcated. #rb Per.Larsson #jira UE-169626 #rnx #preflight 639c4112012902cb8db43e13 - This allows us to provide the user with more ways to customize the virtualization and return more detailed info about it if the calling code wishes to log additional info. In both cases we can extend the options and the data returned without changing the API. - Previously if we virtualized a package that was not checked out in revision control we would warn the user and then skip updating the package file on disk. This means the payloads would be uploaded but the user would be left with no local changes. Since sometimes we know we don't need to check out any package (virtualizing the packages in a change list for example) we don't want to always incur the cost of polling reivision control to see which packages do need checking out. This is why we now allow the caller to request package files be checked out via the new options enum EVirtualizationOptions. -- If the EVirtualizationOptions::Checkout flag is provided we will poll the revision control status of all package files and then check out those which need it. -- We still check if packages can be modified and warn the user if they can't, as package files could be locked in other ways. - Added a new utility function to SourceControlUtilties to make it easier to check out packages. There is similar functionality elsewhere in the code base but the virtualization module is too low level to make use of it. - Updated existing code that calls ::TryVirtualizePackages and add cases of ''using namespace UE::Virtualization' where required to improve readability. - The UnrealVirtualizationTool now supports a new cmdline option "-checkout" that can be used when virtualizing packages. This will checkout any package that was actually virtualized so the result can be saved back out to the workspace domain. This means we no longer require the caller to have checked out the packages before running the tool. [CL 23536832 by paul chipchase in ue5-main branch]
2022-12-16 06:25:07 -05:00
void VirtualizePackages(TConstArrayView<FString> PackagePaths, EVirtualizationOptions Options, FVirtualizationResult& OutResultInfo)
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
{
TRACE_CPUPROFILER_EVENT_SCOPE(UE::Virtualization::VirtualizePackages);
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
IVirtualizationSystem& System = IVirtualizationSystem::Get();
const double StartTime = FPlatformTime::Seconds();
ON_SCOPE_EXIT
{
OutResultInfo.TimeTaken = FPlatformTime::Seconds() - StartTime;
UE_LOG(LogVirtualization, Display, TEXT("Virtualization process complete"));
UE_LOG(LogVirtualization, Verbose, TEXT("Virtualization process took %.3f(s)"), OutResultInfo.TimeTaken);
};
FScopedSlowTask Progress(5.0f, LOCTEXT("Virtualization_Task", "Virtualizing Assets..."));
Virtualizing an asset will no longer display an error message if the project has no caching storage backends set up. #rb Per.Larsson #jira UE-171492 #rnx #preflight 6391f02d67018b14b581c0b0 - It is possible that a project might want to have persistent storage enabled but no caching storage, it is probably a bad idea but it should work. However before this fix doing so would cause an error to be logged during the virtualization of a package and fail the process. - During the virtualization process we now check if pushing is enabled for cache storage before trying to push there, if not then we log that the phase is being skipped. - Even though the above fixes the problem we would still get odd results when pushing payloads to cached storage without any backends enabled. -- We now check if the pushing process is disabled before processing requests, so we can fail faster. There is now also a proper error value for this that can be stored in each request (FPushRequest) where as previously they'd all still display "pending" -- We also check to see if the requested push type has any associated backend before processing requests and have a specific error value for this. Note that we consider - We also now force the visibility of the slow task progress bar, as otherwise we might not end up displaying the virtualization progress to the progress dialog (this is a work around to a long standing problem with the slow task system due to the time limit associated with updating the dialog via EnterProgressFrame, this needs to be fixed elsewhere) #ushell-cherrypick of 23445439 by paul.chipchase [CL 23461405 by paul chipchase in ue5-main branch]
2022-12-09 03:39:20 -05:00
// Force the task to be visible otherwise it might not be shown if the initial progress frames are too fast
Progress.Visibility = ESlowTaskVisibility::ForceVisible;
Progress.MakeDialog();
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
// Other systems may have added errors to this array, we need to check so later we can determine if this function added any additional errors.
Add a new overload to IVirtualizationSystem::TryVirtualizePackages, which takes additional options (via a bitfield enum) and returns more info about the resulting process. The original version has been drepcated. #rb Per.Larsson #jira UE-169626 #rnx #preflight 639c4112012902cb8db43e13 - This allows us to provide the user with more ways to customize the virtualization and return more detailed info about it if the calling code wishes to log additional info. In both cases we can extend the options and the data returned without changing the API. - Previously if we virtualized a package that was not checked out in revision control we would warn the user and then skip updating the package file on disk. This means the payloads would be uploaded but the user would be left with no local changes. Since sometimes we know we don't need to check out any package (virtualizing the packages in a change list for example) we don't want to always incur the cost of polling reivision control to see which packages do need checking out. This is why we now allow the caller to request package files be checked out via the new options enum EVirtualizationOptions. -- If the EVirtualizationOptions::Checkout flag is provided we will poll the revision control status of all package files and then check out those which need it. -- We still check if packages can be modified and warn the user if they can't, as package files could be locked in other ways. - Added a new utility function to SourceControlUtilties to make it easier to check out packages. There is similar functionality elsewhere in the code base but the virtualization module is too low level to make use of it. - Updated existing code that calls ::TryVirtualizePackages and add cases of ''using namespace UE::Virtualization' where required to improve readability. - The UnrealVirtualizationTool now supports a new cmdline option "-checkout" that can be used when virtualizing packages. This will checkout any package that was actually virtualized so the result can be saved back out to the workspace domain. This means we no longer require the caller to have checked out the packages before running the tool. [CL 23536832 by paul chipchase in ue5-main branch]
2022-12-16 06:25:07 -05:00
const int32 NumErrors = OutResultInfo.GetNumErrors();
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
struct FPackageInfo
{
FPackagePath Path;
FPackageTrailer Trailer;
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
TArray<FIoHash> LocalPayloads;
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
/** Index where the FPushRequest for this package can be found */
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
int32 PayloadIndex = INDEX_NONE;
bool bWasTrailerUpdated = false;
};
UE_LOG(LogVirtualization, Display, TEXT("Considering %d file(s) for virtualization"), PackagePaths.Num());
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
TArray<FPackageInfo> Packages;
Packages.Reserve(PackagePaths.Num());
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
Progress.EnterProgressFrame(1.0f);
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
// From the list of files to submit we need to find all of the valid packages that contain
// local payloads that need to be virtualized.
int64 TotalPackagesFound = 0;
int64 TotalOutOfDatePackages = 0;
int64 TotalPackageTrailersFound = 0;
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
int64 TotalPayloadsToVirtualize = 0;
for (const FString& AbsoluteFilePath : PackagePaths)
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
{
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
FPackagePath PackagePath = FPackagePath::FromLocalPath(AbsoluteFilePath);
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
Clean up virtualized bulkdata serialization and allow the package trailer to contain payloads with different access types. #rb PJ.Kack #jira UE-139348 #rnx #preflight 61e6742a3778a195dea02199 ### PackageTrailer - We no longer store the access mode for all payloads in the trailer. Instead each payload entry in the lookup table will store the access mode for that specific payload. This allow us to store many payloads of different access types in the same trailer. - Note that this format change is currently versioned (going from 0 to 1) and is compatible with older version. -- To allow this, the << operator has been removed from FLookupTableEntry and replaced with a ::Serialize method that takes the version of the package trailer header as an input so that it can run it's own versioning checks. - When loading trailers with version 0 we apply the access mode to all non virtualized payload entries in the lookup table to reproduce the single access mode for the entire trailer. - A FPackageTrailerBuilder can now be created directly from an existing FPackageTrailer that will reference all of the trailers local payloads via the new static method ::CreateReferenceToTrailer - Removed FPackageTrailer::TrySave as it is no longer used when creating a trailer for the editor domain. - Renamed FPackageTrailer::LoadPayload to ::LoadLocalPayload to make it clear the type of payload that can be loaded. -- Someday the package trailer should be able to load payloads of any type. ### VirtualizedBulkData Serialization Changes - We now record if the bulkdata's payload is stored in a package trailer or not via the 'StoredInPackageTrailer' flag, which makes it easier to identify critical errors in serialization (the trailer is missing for example) -- Storing this info as a flag is now possible as the design for the trailer has changed and a fully virtualized package will still contain a trailer. - We now only write out OffsetInFile if the payload is not stored in a FPackageTrailer, if the bulkdata object does use a package trailer via the new flag then we take the offset from the trailer instead. - We also no longer write out a path when saving a payload as referenced to the editor domain, it is assumed that the trailer will know where the workspace domain data is (namely the owning package path) -- Due to the 'StoredInPackageTrailer' only being added now, this means that packages already saved with a package trailer will work with this code logic as those packages will have serialized the offsets to disk, but won't have the new flag and so will serialize the offsets when loading until re-saved. (worth noting that nobody should encounter this as the package trailer system is disabled by default) - The loading code is a lot easier to read now and a lot of the branches have been removed although we still have a few extra checks for older formats that might have been saved during development. - When loading the only real consideration we have is if the payload is in the trailer or not. - Future Work: -- If we are not writing to the trailer then we use the legacy serialization system which appends the bulkdata to the end of the package. This is used in two cases, 1) When the package trailer system is disabled 2) If the payload is updated in postload and we are writing out the package to the editor domain as the package trailer does not support writing out local payloads outside of the workspace domain at the moment. The trailer should handle this itself and not rely on the legacy serialization system so that we can remove it. -- Note that if a payload is stored by reference, we are not adding it explicitly to the payload trailer and we are relying on the trailer builder already containing the reference information when it is created in SavePackage. We do however run an assert to ensure that the reference does exist in the trailer but it might be worth revisiting this. ### VirtualizedBulkData Misc - The member 'PackageSegment' is now initialized in the constructor to EPackageSegment::Header. - Moved ::GetPackagePathFromOwner from being a method to a utility function as it accessed no members. - Validating the payload for saving during serialization has been moved from ::Serialize to it's own method ::TryPayloadValidationForSaving. ### PackageSubmissionChecks - Added an error message if we detect referenced payloads in a trailer we are trying to virtualize as this condition should never occur. If we encounter it, we give a user facing error message and fail the submit process. It might be worth demoting this to an assert instead as it is not an error the user is expected to see. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18637906 in //UE5/Release-5.0/... via CL 18637926 via CL 18637929 #ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v899-18417669) [CL 18637932 by paul chipchase in ue5-main branch]
2022-01-18 05:41:44 -05:00
// TODO: How to handle text packages?
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
if (FPackageName::IsPackageExtension(PackagePath.GetHeaderExtension()) || FPackageName::IsTextPackageExtension(PackagePath.GetHeaderExtension()))
{
TotalPackagesFound++;
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
FPackageTrailer Trailer;
if (FPackageTrailer::TryLoadFromPackage(PackagePath, Trailer))
{
TotalPackageTrailersFound++;
Clean up virtualized bulkdata serialization and allow the package trailer to contain payloads with different access types. #rb PJ.Kack #jira UE-139348 #rnx #preflight 61e6742a3778a195dea02199 ### PackageTrailer - We no longer store the access mode for all payloads in the trailer. Instead each payload entry in the lookup table will store the access mode for that specific payload. This allow us to store many payloads of different access types in the same trailer. - Note that this format change is currently versioned (going from 0 to 1) and is compatible with older version. -- To allow this, the << operator has been removed from FLookupTableEntry and replaced with a ::Serialize method that takes the version of the package trailer header as an input so that it can run it's own versioning checks. - When loading trailers with version 0 we apply the access mode to all non virtualized payload entries in the lookup table to reproduce the single access mode for the entire trailer. - A FPackageTrailerBuilder can now be created directly from an existing FPackageTrailer that will reference all of the trailers local payloads via the new static method ::CreateReferenceToTrailer - Removed FPackageTrailer::TrySave as it is no longer used when creating a trailer for the editor domain. - Renamed FPackageTrailer::LoadPayload to ::LoadLocalPayload to make it clear the type of payload that can be loaded. -- Someday the package trailer should be able to load payloads of any type. ### VirtualizedBulkData Serialization Changes - We now record if the bulkdata's payload is stored in a package trailer or not via the 'StoredInPackageTrailer' flag, which makes it easier to identify critical errors in serialization (the trailer is missing for example) -- Storing this info as a flag is now possible as the design for the trailer has changed and a fully virtualized package will still contain a trailer. - We now only write out OffsetInFile if the payload is not stored in a FPackageTrailer, if the bulkdata object does use a package trailer via the new flag then we take the offset from the trailer instead. - We also no longer write out a path when saving a payload as referenced to the editor domain, it is assumed that the trailer will know where the workspace domain data is (namely the owning package path) -- Due to the 'StoredInPackageTrailer' only being added now, this means that packages already saved with a package trailer will work with this code logic as those packages will have serialized the offsets to disk, but won't have the new flag and so will serialize the offsets when loading until re-saved. (worth noting that nobody should encounter this as the package trailer system is disabled by default) - The loading code is a lot easier to read now and a lot of the branches have been removed although we still have a few extra checks for older formats that might have been saved during development. - When loading the only real consideration we have is if the payload is in the trailer or not. - Future Work: -- If we are not writing to the trailer then we use the legacy serialization system which appends the bulkdata to the end of the package. This is used in two cases, 1) When the package trailer system is disabled 2) If the payload is updated in postload and we are writing out the package to the editor domain as the package trailer does not support writing out local payloads outside of the workspace domain at the moment. The trailer should handle this itself and not rely on the legacy serialization system so that we can remove it. -- Note that if a payload is stored by reference, we are not adding it explicitly to the payload trailer and we are relying on the trailer builder already containing the reference information when it is created in SavePackage. We do however run an assert to ensure that the reference does exist in the trailer but it might be worth revisiting this. ### VirtualizedBulkData Misc - The member 'PackageSegment' is now initialized in the constructor to EPackageSegment::Header. - Moved ::GetPackagePathFromOwner from being a method to a utility function as it accessed no members. - Validating the payload for saving during serialization has been moved from ::Serialize to it's own method ::TryPayloadValidationForSaving. ### PackageSubmissionChecks - Added an error message if we detect referenced payloads in a trailer we are trying to virtualize as this condition should never occur. If we encounter it, we give a user facing error message and fail the submit process. It might be worth demoting this to an assert instead as it is not an error the user is expected to see. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18637906 in //UE5/Release-5.0/... via CL 18637926 via CL 18637929 #ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v899-18417669) [CL 18637932 by paul chipchase in ue5-main branch]
2022-01-18 05:41:44 -05:00
// The following is not expected to ever happen, currently we give a user facing error but it generally means that the asset is broken somehow.
It is now possible to disable virtualization for all payloads owned by a specific asset type by adding the name of the asset to [Core.ContentVirtualization]DisabledAsset string array in the engine ini file. #rb Per.Larsson #rnx #jira UE-151377 #preflight 628364050039ea57a52d6989 ### Virtualization - [Core.ContentVirtualization] in the engine ini file now supports an array called 'DisabledAsset' which can be used to name asset types that should not virtualize their payloads. -- By default (in BaseEngine.ini) we have disabled the StaticMesh asset as we know it will crash if a payload is missing and the SoundWave asset as it still is pending testing. - This new way to disable virtualization is data driven. The older hard coded method has not been removed but will likely be reworked in a future submit. - Now when an editor bulkdata is adding it's payload to the package trailer builder during package save it will poll the virtualization system with a call to the new method ::IsDisabledForObject by passing in it's owner. -- If the owner is valid and was present in the 'DisabledAsset' array then the method will return true and the EPayloadFlags::DisableVirtualization flag will be applied. ### Package Trailer - The pre-existing functionality of enum EPayloadFilter has been moved to a new enum EPayloadStorageType as will only filter payloads based on their storage type. - EPayloadFilter has been modified to filter payloads based on functionality although at the moment the only thing it can filter for is to return payloads that can be virtualized, it is left for future expansion. - EPayloadFlags has been reduced to a uint16 with the remaining 2bytes being turned into a new member EPayloadFilterReason. - This new member allows us to record the exact reason why a payload is excluded from virtualization. If it is zero then the payload can virtualize, otherwise it will contain one or more reasons as to why it is being excluded. For this reason the enum is a bitfield. - Added overloads of ::GetPayloads and ::GetNumPayloads that take EPayloadFilter rather than a EPayloadStorageType - Added wrappers around all AccessMode types for FLookupTableEntry. - FPackageTrailerBuilder has been extended to take a EPayloadFilterReason so that the caller can already provide a reason why the payload cannot be virtualized. -- As a future peace of work this will probably be changed and we will ask the caller to pass in the owner UObject pointer instead and then we will process the filtering when building the package trailer to a) keep all of the filtering code in one place b) keep the filtering consistent ### PackageSubmissionChecks - The virtualization process in will now request the payloads that can be virtualized from the package trailer, which respects the new payload flag, rather than requesting all locally stored payloads. ### UObjects - There is no need for the SoundWave or MeshDescription classes to opt out of virtualization on construction. This will be done when the package is saved and is now data driven rather than being hardcoded. ### DumpPackagePayloadInfo - The command has been updated to also display the filter reasons applied to each payload [CL 20240971 by paul chipchase in ue5-main branch]
2022-05-17 07:54:28 -04:00
ensureMsgf(Trailer.GetNumPayloads(EPayloadStorageType::Referenced) == 0, TEXT("Trying to virtualize a package that already contains payload references which the workspace file should not ever contain!"));
if (Trailer.GetNumPayloads(EPayloadStorageType::Referenced) > 0)
Clean up virtualized bulkdata serialization and allow the package trailer to contain payloads with different access types. #rb PJ.Kack #jira UE-139348 #rnx #preflight 61e6742a3778a195dea02199 ### PackageTrailer - We no longer store the access mode for all payloads in the trailer. Instead each payload entry in the lookup table will store the access mode for that specific payload. This allow us to store many payloads of different access types in the same trailer. - Note that this format change is currently versioned (going from 0 to 1) and is compatible with older version. -- To allow this, the << operator has been removed from FLookupTableEntry and replaced with a ::Serialize method that takes the version of the package trailer header as an input so that it can run it's own versioning checks. - When loading trailers with version 0 we apply the access mode to all non virtualized payload entries in the lookup table to reproduce the single access mode for the entire trailer. - A FPackageTrailerBuilder can now be created directly from an existing FPackageTrailer that will reference all of the trailers local payloads via the new static method ::CreateReferenceToTrailer - Removed FPackageTrailer::TrySave as it is no longer used when creating a trailer for the editor domain. - Renamed FPackageTrailer::LoadPayload to ::LoadLocalPayload to make it clear the type of payload that can be loaded. -- Someday the package trailer should be able to load payloads of any type. ### VirtualizedBulkData Serialization Changes - We now record if the bulkdata's payload is stored in a package trailer or not via the 'StoredInPackageTrailer' flag, which makes it easier to identify critical errors in serialization (the trailer is missing for example) -- Storing this info as a flag is now possible as the design for the trailer has changed and a fully virtualized package will still contain a trailer. - We now only write out OffsetInFile if the payload is not stored in a FPackageTrailer, if the bulkdata object does use a package trailer via the new flag then we take the offset from the trailer instead. - We also no longer write out a path when saving a payload as referenced to the editor domain, it is assumed that the trailer will know where the workspace domain data is (namely the owning package path) -- Due to the 'StoredInPackageTrailer' only being added now, this means that packages already saved with a package trailer will work with this code logic as those packages will have serialized the offsets to disk, but won't have the new flag and so will serialize the offsets when loading until re-saved. (worth noting that nobody should encounter this as the package trailer system is disabled by default) - The loading code is a lot easier to read now and a lot of the branches have been removed although we still have a few extra checks for older formats that might have been saved during development. - When loading the only real consideration we have is if the payload is in the trailer or not. - Future Work: -- If we are not writing to the trailer then we use the legacy serialization system which appends the bulkdata to the end of the package. This is used in two cases, 1) When the package trailer system is disabled 2) If the payload is updated in postload and we are writing out the package to the editor domain as the package trailer does not support writing out local payloads outside of the workspace domain at the moment. The trailer should handle this itself and not rely on the legacy serialization system so that we can remove it. -- Note that if a payload is stored by reference, we are not adding it explicitly to the payload trailer and we are relying on the trailer builder already containing the reference information when it is created in SavePackage. We do however run an assert to ensure that the reference does exist in the trailer but it might be worth revisiting this. ### VirtualizedBulkData Misc - The member 'PackageSegment' is now initialized in the constructor to EPackageSegment::Header. - Moved ::GetPackagePathFromOwner from being a method to a utility function as it accessed no members. - Validating the payload for saving during serialization has been moved from ::Serialize to it's own method ::TryPayloadValidationForSaving. ### PackageSubmissionChecks - Added an error message if we detect referenced payloads in a trailer we are trying to virtualize as this condition should never occur. If we encounter it, we give a user facing error message and fail the submit process. It might be worth demoting this to an assert instead as it is not an error the user is expected to see. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18637906 in //UE5/Release-5.0/... via CL 18637926 via CL 18637929 #ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v899-18417669) [CL 18637932 by paul chipchase in ue5-main branch]
2022-01-18 05:41:44 -05:00
{
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
FText Message = FText::Format(LOCTEXT("Virtualization_PkgHasReferences", "Cannot virtualize the package '{1}' as it has referenced payloads in the trailer"),
FText::FromString(PackagePath.GetDebugName()));
Add a new overload to IVirtualizationSystem::TryVirtualizePackages, which takes additional options (via a bitfield enum) and returns more info about the resulting process. The original version has been drepcated. #rb Per.Larsson #jira UE-169626 #rnx #preflight 639c4112012902cb8db43e13 - This allows us to provide the user with more ways to customize the virtualization and return more detailed info about it if the calling code wishes to log additional info. In both cases we can extend the options and the data returned without changing the API. - Previously if we virtualized a package that was not checked out in revision control we would warn the user and then skip updating the package file on disk. This means the payloads would be uploaded but the user would be left with no local changes. Since sometimes we know we don't need to check out any package (virtualizing the packages in a change list for example) we don't want to always incur the cost of polling reivision control to see which packages do need checking out. This is why we now allow the caller to request package files be checked out via the new options enum EVirtualizationOptions. -- If the EVirtualizationOptions::Checkout flag is provided we will poll the revision control status of all package files and then check out those which need it. -- We still check if packages can be modified and warn the user if they can't, as package files could be locked in other ways. - Added a new utility function to SourceControlUtilties to make it easier to check out packages. There is similar functionality elsewhere in the code base but the virtualization module is too low level to make use of it. - Updated existing code that calls ::TryVirtualizePackages and add cases of ''using namespace UE::Virtualization' where required to improve readability. - The UnrealVirtualizationTool now supports a new cmdline option "-checkout" that can be used when virtualizing packages. This will checkout any package that was actually virtualized so the result can be saved back out to the workspace domain. This means we no longer require the caller to have checked out the packages before running the tool. [CL 23536832 by paul chipchase in ue5-main branch]
2022-12-16 06:25:07 -05:00
OutResultInfo.AddError(MoveTemp(Message));
Clean up virtualized bulkdata serialization and allow the package trailer to contain payloads with different access types. #rb PJ.Kack #jira UE-139348 #rnx #preflight 61e6742a3778a195dea02199 ### PackageTrailer - We no longer store the access mode for all payloads in the trailer. Instead each payload entry in the lookup table will store the access mode for that specific payload. This allow us to store many payloads of different access types in the same trailer. - Note that this format change is currently versioned (going from 0 to 1) and is compatible with older version. -- To allow this, the << operator has been removed from FLookupTableEntry and replaced with a ::Serialize method that takes the version of the package trailer header as an input so that it can run it's own versioning checks. - When loading trailers with version 0 we apply the access mode to all non virtualized payload entries in the lookup table to reproduce the single access mode for the entire trailer. - A FPackageTrailerBuilder can now be created directly from an existing FPackageTrailer that will reference all of the trailers local payloads via the new static method ::CreateReferenceToTrailer - Removed FPackageTrailer::TrySave as it is no longer used when creating a trailer for the editor domain. - Renamed FPackageTrailer::LoadPayload to ::LoadLocalPayload to make it clear the type of payload that can be loaded. -- Someday the package trailer should be able to load payloads of any type. ### VirtualizedBulkData Serialization Changes - We now record if the bulkdata's payload is stored in a package trailer or not via the 'StoredInPackageTrailer' flag, which makes it easier to identify critical errors in serialization (the trailer is missing for example) -- Storing this info as a flag is now possible as the design for the trailer has changed and a fully virtualized package will still contain a trailer. - We now only write out OffsetInFile if the payload is not stored in a FPackageTrailer, if the bulkdata object does use a package trailer via the new flag then we take the offset from the trailer instead. - We also no longer write out a path when saving a payload as referenced to the editor domain, it is assumed that the trailer will know where the workspace domain data is (namely the owning package path) -- Due to the 'StoredInPackageTrailer' only being added now, this means that packages already saved with a package trailer will work with this code logic as those packages will have serialized the offsets to disk, but won't have the new flag and so will serialize the offsets when loading until re-saved. (worth noting that nobody should encounter this as the package trailer system is disabled by default) - The loading code is a lot easier to read now and a lot of the branches have been removed although we still have a few extra checks for older formats that might have been saved during development. - When loading the only real consideration we have is if the payload is in the trailer or not. - Future Work: -- If we are not writing to the trailer then we use the legacy serialization system which appends the bulkdata to the end of the package. This is used in two cases, 1) When the package trailer system is disabled 2) If the payload is updated in postload and we are writing out the package to the editor domain as the package trailer does not support writing out local payloads outside of the workspace domain at the moment. The trailer should handle this itself and not rely on the legacy serialization system so that we can remove it. -- Note that if a payload is stored by reference, we are not adding it explicitly to the payload trailer and we are relying on the trailer builder already containing the reference information when it is created in SavePackage. We do however run an assert to ensure that the reference does exist in the trailer but it might be worth revisiting this. ### VirtualizedBulkData Misc - The member 'PackageSegment' is now initialized in the constructor to EPackageSegment::Header. - Moved ::GetPackagePathFromOwner from being a method to a utility function as it accessed no members. - Validating the payload for saving during serialization has been moved from ::Serialize to it's own method ::TryPayloadValidationForSaving. ### PackageSubmissionChecks - Added an error message if we detect referenced payloads in a trailer we are trying to virtualize as this condition should never occur. If we encounter it, we give a user facing error message and fail the submit process. It might be worth demoting this to an assert instead as it is not an error the user is expected to see. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18637906 in //UE5/Release-5.0/... via CL 18637926 via CL 18637929 #ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v899-18417669) [CL 18637932 by paul chipchase in ue5-main branch]
2022-01-18 05:41:44 -05:00
return;
}
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
FPackageInfo PkgInfo;
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
PkgInfo.Path = MoveTemp(PackagePath);
PkgInfo.Trailer = MoveTemp(Trailer);
It is now possible to disable virtualization for all payloads owned by a specific asset type by adding the name of the asset to [Core.ContentVirtualization]DisabledAsset string array in the engine ini file. #rb Per.Larsson #rnx #jira UE-151377 #preflight 628364050039ea57a52d6989 ### Virtualization - [Core.ContentVirtualization] in the engine ini file now supports an array called 'DisabledAsset' which can be used to name asset types that should not virtualize their payloads. -- By default (in BaseEngine.ini) we have disabled the StaticMesh asset as we know it will crash if a payload is missing and the SoundWave asset as it still is pending testing. - This new way to disable virtualization is data driven. The older hard coded method has not been removed but will likely be reworked in a future submit. - Now when an editor bulkdata is adding it's payload to the package trailer builder during package save it will poll the virtualization system with a call to the new method ::IsDisabledForObject by passing in it's owner. -- If the owner is valid and was present in the 'DisabledAsset' array then the method will return true and the EPayloadFlags::DisableVirtualization flag will be applied. ### Package Trailer - The pre-existing functionality of enum EPayloadFilter has been moved to a new enum EPayloadStorageType as will only filter payloads based on their storage type. - EPayloadFilter has been modified to filter payloads based on functionality although at the moment the only thing it can filter for is to return payloads that can be virtualized, it is left for future expansion. - EPayloadFlags has been reduced to a uint16 with the remaining 2bytes being turned into a new member EPayloadFilterReason. - This new member allows us to record the exact reason why a payload is excluded from virtualization. If it is zero then the payload can virtualize, otherwise it will contain one or more reasons as to why it is being excluded. For this reason the enum is a bitfield. - Added overloads of ::GetPayloads and ::GetNumPayloads that take EPayloadFilter rather than a EPayloadStorageType - Added wrappers around all AccessMode types for FLookupTableEntry. - FPackageTrailerBuilder has been extended to take a EPayloadFilterReason so that the caller can already provide a reason why the payload cannot be virtualized. -- As a future peace of work this will probably be changed and we will ask the caller to pass in the owner UObject pointer instead and then we will process the filtering when building the package trailer to a) keep all of the filtering code in one place b) keep the filtering consistent ### PackageSubmissionChecks - The virtualization process in will now request the payloads that can be virtualized from the package trailer, which respects the new payload flag, rather than requesting all locally stored payloads. ### UObjects - There is no need for the SoundWave or MeshDescription classes to opt out of virtualization on construction. This will be done when the package is saved and is now data driven rather than being hardcoded. ### DumpPackagePayloadInfo - The command has been updated to also display the filter reasons applied to each payload [CL 20240971 by paul chipchase in ue5-main branch]
2022-05-17 07:54:28 -04:00
PkgInfo.LocalPayloads = PkgInfo.Trailer.GetPayloads(EPayloadFilter::CanVirtualize);
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
if (!PkgInfo.LocalPayloads.IsEmpty())
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
{
TotalPayloadsToVirtualize += PkgInfo.LocalPayloads.Num();
Temp fix to prevent payloads that should be filtered out from virtualizing from being virtualized if they are already in persistent storage, which was causing unexpected packages to become virtualized. #rb Per.Larsson #rnx #jira UE-156312 #preflight 62a99553943e7bb256fe174c ### Problem - Firstly this fix is a hack and not intended for shipping 5.1, but the real fix is already planned work, which involves moving the filtering from the push process to being applied to each payload when the package is saved, just as the asset filtering is done at the moment. -- Doing the proper fix will take a bit of time and testing so we need a quicker fix so that I can re-enable the virtualization process to unblock some workflows. - All new code is wrapped in a single preprocessor define 'ENABLE_FILTERING_HACK' making the code easy to identify and remove. - Submitting this hack will cause the priority of the real fix to be bumped in the backlog so that we can remove the hack asap. - The problem code comes from a pass we do on the payloads to check if they are already in persistent storage or not. If the payload is then we just change it to be virtualized and don't attempt to push. In addition to fixing the filtering there is also a work item to investigate if we really want to keep this logic or if we want to remove it anyway. This might be looked into before I look into the filtering fix as an alternative way to remove the hack. ### Fix - After we have queried the system to check which of the local payloads are already in persistent storage, BUT before we change them over to be virtualized we now run a new pass, checking which payloads should be filtered. -- This is a bespoke code path expected only to be used here, so in this case it is not in the IVirtualizationSystem API but requires us to cast to FVirtualizationManager to gain access. This should stop anyone else from using the hack before it is removed. - The filtering check mimics the filtering that would be run on a normal push operation. - Once done we can check for any payload that would be filtered, and if so, set the status to not found. This prevents the payload from being changed to be virtualized and means it will be included in the normal push operation where it will be correctly filtered. - None of this is the best way to do it, but it did seem like the easiest way to add the hack in isolation as no existing code was changed, so it should be simple to just 'delete' the hack when ready. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 20667712 via CL 20667730 via CL 20667733 #ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017) [CL 20668639 by paul chipchase in ue5-main branch]
2022-06-15 08:56:57 -04:00
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
Packages.Emplace(MoveTemp(PkgInfo));
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
}
}
else if(Utils::FindTrailerFailedReason(PackagePath) == Utils::ETrailerFailedReason::OutOfDate)
{
TotalOutOfDatePackages++;
}
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
}
}
Add a rehydration command to the stand alone virtualization tool, making it easier to reverse the effects of asset virtualization. Unlike previous processes, this one does not require that we load the package and will just manipulate the data storaged in the package trailer. #rb Sebastian.Nordgren #rnx #jira UE-156436 #preflight 62c287f9a3568e30664eb94f ### VA Standalone Tool - We now plan to add much more functionality to the tool than just virtualizing and submitting changelists, so to make this easier I am moving the tool towards a design where it should be fairly easy to add new functionality. - Added FCommand, which is a base class for adding new functionality, simple derive from FCommand and hook it up at the appropriate locations. -- In the future it should be possible for new command types to automatically register themselves to be initiated from the command line. There should be no need to edit UnrealVirtualizationToolApp to add a new command but this will be done as an additional work item. -- At the moment FCommand comes with a number of utility methods to call that cover some common source control commands. -- The original functionality has not yet been moved to the command system and so the code is a little bit weird at the moment. Updating older code to the new system will be done as an additional work item. - FProject/FPlugin have been moved to their own code files. ### Rehydrate Command - The rehydrate command will take a number of packages, check them out of source control and then attempt to virtualize them. - At the moment the chekout logic is fairly basic, we just check out every package supplied, we don't check if the package is virtualized or not yet. This can be improved in additional work items. Ideally by the end of command the only packages that we have checked out should also be rehydrated. - At the moment the command can either take a path of a specific package, a path of a directory to find packages in, or a changelist containing packages that should be rehydrated. - A cleint spec (workspace) can optionally be provided, but if not supplied we will attempt to find a client spec for which to check out the packages. - Currently we will check out the packages to the default change list. ### Rehydrate process - Added the rehydration process in it's own code files in the virtualization module. Like the virtualization process this is exposed in a public header file and no via the Core interface which means it is very specific to our module/implementation. - The process expects that the caller will have checked out any required packages from source control. It will treat being unable to update a package file as an error. - Added PackageUtils.h/.cpp and moved some of the generic code from the virtualization process code there so that it can be shared by the rehydration process. ### Misc Moving away from the using things like FPackagePath as that requires that the correct mount points have been registered for a project and at the moment (with the flakiness of FConfig*) it seems that the best idea would be to prefer absolute file paths where possible. [CL 20982284 by paul chipchase in ue5-main branch]
2022-07-07 06:54:33 -04:00
UE_LOG(LogVirtualization, Display, TEXT("Found %" INT64_FMT " package(s), %" INT64_FMT " of which had payload trailers"), TotalPackagesFound, TotalPackageTrailersFound);
UE_CLOG(TotalOutOfDatePackages > 0, LogVirtualization, Warning, TEXT("Found %" INT64_FMT " package(s) that are out of date and need resaving"), TotalOutOfDatePackages);
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
// TODO: Currently not all of the filtering is done as package save time, so some of the local payloads may not get virtualized.
// When/if we move all filtering to package save we can change this log message to state that the local payloads *will* be virtualized.
UE_LOG(LogVirtualization, Display, TEXT("Found %" INT64_FMT " locally stored payload(s) in %d package(s) that maybe need to be virtualized"), TotalPayloadsToVirtualize, Packages.Num());
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
Progress.EnterProgressFrame(1.0f);
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
// TODO Optimization: We might want to check for duplicate payloads and remove them at this point
When virtualizing payloads we now load the payloads from disk on demand rather than all at once. This allows us to avoid memory spikes when virtualizing large amounts of data. #rb Per.Larsson #rnx #jira UE-151000 #preflight 627271c83b2ed6f85363f53a - In the old code when submitting packages for virtualization we would load all local payloads into memory then pass them to the virtualization system to be pushed to persistent storage. This works fine for an average users submits but when resaving large projects we can have submits with thousands of packages containing gigabytes of payloads. - FPushRequest has been changed to accept either a payload directly (in the form of FCompressedBuffer) or accept a reference to a IPayloadProvider which will return the payload when asked. - Most of the existing backends make no use of the batched pushing feature and only push one payload at a time. The default implementation of this will be to ask each request one at a time for the payload and then pass it to the backend for a single push. If the IPayloadProvider is being used we will therefor only ever have one payload loaded at a time avoiding memory spikes. - The source control backend currently does implement the batched pushing feature. This backend writes each payload to disk before submitting them to perforce, so all we have to do is make sure that request one payload, write it to disk, then discard the payload to avoid memory spikes. - This change required that FPushRequest make it's members private and provide accessors to them, as the caller shouldn't need to care if the payload is loaded or comes via the provider. - NOTE that this implementation is less efficient if we have packages containing many different payloads as we will end up opening the package, parsing the trailer then load a payload many times over, where as the original code would load all payloads from the file in one go. In practice, given the overhead of the source control backend this didn't make a huge difference and is probably not worth the effort to optimize at this point. [CL 20040609 by paul chipchase in ue5-main branch]
2022-05-04 08:58:50 -04:00
// Build up the info in the payload provider and the final array of payload push requests
FWorkspaceDomainPayloadProvider PayloadProvider;
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
TArray<Virtualization::FPushRequest> PayloadsToSubmit;
PayloadsToSubmit.Reserve( IntCastChecked<int32>(TotalPayloadsToVirtualize) );
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
for (FPackageInfo& PackageInfo : Packages)
{
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
check(!PackageInfo.LocalPayloads.IsEmpty());
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
PackageInfo.PayloadIndex = PayloadsToSubmit.Num();
for (const FIoHash& PayloadId : PackageInfo.LocalPayloads)
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
{
When virtualizing payloads we now load the payloads from disk on demand rather than all at once. This allows us to avoid memory spikes when virtualizing large amounts of data. #rb Per.Larsson #rnx #jira UE-151000 #preflight 627271c83b2ed6f85363f53a - In the old code when submitting packages for virtualization we would load all local payloads into memory then pass them to the virtualization system to be pushed to persistent storage. This works fine for an average users submits but when resaving large projects we can have submits with thousands of packages containing gigabytes of payloads. - FPushRequest has been changed to accept either a payload directly (in the form of FCompressedBuffer) or accept a reference to a IPayloadProvider which will return the payload when asked. - Most of the existing backends make no use of the batched pushing feature and only push one payload at a time. The default implementation of this will be to ask each request one at a time for the payload and then pass it to the backend for a single push. If the IPayloadProvider is being used we will therefor only ever have one payload loaded at a time avoiding memory spikes. - The source control backend currently does implement the batched pushing feature. This backend writes each payload to disk before submitting them to perforce, so all we have to do is make sure that request one payload, write it to disk, then discard the payload to avoid memory spikes. - This change required that FPushRequest make it's members private and provide accessors to them, as the caller shouldn't need to care if the payload is loaded or comes via the provider. - NOTE that this implementation is less efficient if we have packages containing many different payloads as we will end up opening the package, parsing the trailer then load a payload many times over, where as the original code would load all payloads from the file in one go. In practice, given the overhead of the source control backend this didn't make a huge difference and is probably not worth the effort to optimize at this point. [CL 20040609 by paul chipchase in ue5-main branch]
2022-05-04 08:58:50 -04:00
const uint64 SizeOnDisk = PackageInfo.Trailer.FindPayloadSizeOnDisk(PayloadId);
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
When virtualizing payloads we now load the payloads from disk on demand rather than all at once. This allows us to avoid memory spikes when virtualizing large amounts of data. #rb Per.Larsson #rnx #jira UE-151000 #preflight 627271c83b2ed6f85363f53a - In the old code when submitting packages for virtualization we would load all local payloads into memory then pass them to the virtualization system to be pushed to persistent storage. This works fine for an average users submits but when resaving large projects we can have submits with thousands of packages containing gigabytes of payloads. - FPushRequest has been changed to accept either a payload directly (in the form of FCompressedBuffer) or accept a reference to a IPayloadProvider which will return the payload when asked. - Most of the existing backends make no use of the batched pushing feature and only push one payload at a time. The default implementation of this will be to ask each request one at a time for the payload and then pass it to the backend for a single push. If the IPayloadProvider is being used we will therefor only ever have one payload loaded at a time avoiding memory spikes. - The source control backend currently does implement the batched pushing feature. This backend writes each payload to disk before submitting them to perforce, so all we have to do is make sure that request one payload, write it to disk, then discard the payload to avoid memory spikes. - This change required that FPushRequest make it's members private and provide accessors to them, as the caller shouldn't need to care if the payload is loaded or comes via the provider. - NOTE that this implementation is less efficient if we have packages containing many different payloads as we will end up opening the package, parsing the trailer then load a payload many times over, where as the original code would load all payloads from the file in one go. In practice, given the overhead of the source control backend this didn't make a huge difference and is probably not worth the effort to optimize at this point. [CL 20040609 by paul chipchase in ue5-main branch]
2022-05-04 08:58:50 -04:00
PayloadProvider.RegisterPayload(PayloadId, SizeOnDisk, PackageInfo.Path.GetPackageName());
PayloadsToSubmit.Emplace(PayloadId, PayloadProvider, PackageInfo.Path.GetPackageName());
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
}
}
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
// TODO: We should be able to do both Cache and Persistent pushes in the same call
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
// Push payloads to cache storage
Virtualizing an asset will no longer display an error message if the project has no caching storage backends set up. #rb Per.Larsson #jira UE-171492 #rnx #preflight 6391f02d67018b14b581c0b0 - It is possible that a project might want to have persistent storage enabled but no caching storage, it is probably a bad idea but it should work. However before this fix doing so would cause an error to be logged during the virtualization of a package and fail the process. - During the virtualization process we now check if pushing is enabled for cache storage before trying to push there, if not then we log that the phase is being skipped. - Even though the above fixes the problem we would still get odd results when pushing payloads to cached storage without any backends enabled. -- We now check if the pushing process is disabled before processing requests, so we can fail faster. There is now also a proper error value for this that can be stored in each request (FPushRequest) where as previously they'd all still display "pending" -- We also check to see if the requested push type has any associated backend before processing requests and have a specific error value for this. Note that we consider - We also now force the visibility of the slow task progress bar, as otherwise we might not end up displaying the virtualization progress to the progress dialog (this is a work around to a long standing problem with the slow task system due to the time limit associated with updating the dialog via EnterProgressFrame, this needs to be fixed elsewhere) #ushell-cherrypick of 23445439 by paul.chipchase [CL 23461405 by paul chipchase in ue5-main branch]
2022-12-09 03:39:20 -05:00
Progress.EnterProgressFrame(1.0f);
if(System.IsPushingEnabled(EStorageType::Cache))
{
UE_LOG(LogVirtualization, Display, TEXT("Pushing payload(s) to EStorageType::Cache storage..."));
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
if (!System.PushData(PayloadsToSubmit, EStorageType::Cache))
{
// Caching is not critical to the process so we only warn if it fails
UE_LOG(LogVirtualization, Warning, TEXT("Failed to push to EStorageType::Cache storage"));
}
int64 TotalPayloadsCached = 0;
for (Virtualization::FPushRequest& Request : PayloadsToSubmit)
{
TotalPayloadsCached += Request.GetResult().WasPushed() ? 1 : 0;
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
// TODO: This really shouldn't be required, fix when we allow both pushes to be done in the same call
// Reset the status for the persistent storage push
Request.ResetResult();
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
}
UE_LOG(LogVirtualization, Display, TEXT("Pushed %" INT64_FMT " payload(s) to cached storage"), TotalPayloadsCached);
}
Virtualizing an asset will no longer display an error message if the project has no caching storage backends set up. #rb Per.Larsson #jira UE-171492 #rnx #preflight 6391f02d67018b14b581c0b0 - It is possible that a project might want to have persistent storage enabled but no caching storage, it is probably a bad idea but it should work. However before this fix doing so would cause an error to be logged during the virtualization of a package and fail the process. - During the virtualization process we now check if pushing is enabled for cache storage before trying to push there, if not then we log that the phase is being skipped. - Even though the above fixes the problem we would still get odd results when pushing payloads to cached storage without any backends enabled. -- We now check if the pushing process is disabled before processing requests, so we can fail faster. There is now also a proper error value for this that can be stored in each request (FPushRequest) where as previously they'd all still display "pending" -- We also check to see if the requested push type has any associated backend before processing requests and have a specific error value for this. Note that we consider - We also now force the visibility of the slow task progress bar, as otherwise we might not end up displaying the virtualization progress to the progress dialog (this is a work around to a long standing problem with the slow task system due to the time limit associated with updating the dialog via EnterProgressFrame, this needs to be fixed elsewhere) #ushell-cherrypick of 23445439 by paul.chipchase [CL 23461405 by paul chipchase in ue5-main branch]
2022-12-09 03:39:20 -05:00
else
{
UE_LOG(LogVirtualization, Display, TEXT("Pushing payload(s) to cached storage is disbled, skipping"));
}
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
// Push payloads to persistent storage
{
Progress.EnterProgressFrame(1.0f);
UE_LOG(LogVirtualization, Display, TEXT("Pushing payload(s) to EStorageType::Persistent storage..."));
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
if (!System.PushData(PayloadsToSubmit, EStorageType::Persistent))
{
FText Message = LOCTEXT("Virtualization_PushFailure", "Failed to push payloads");
Add a new overload to IVirtualizationSystem::TryVirtualizePackages, which takes additional options (via a bitfield enum) and returns more info about the resulting process. The original version has been drepcated. #rb Per.Larsson #jira UE-169626 #rnx #preflight 639c4112012902cb8db43e13 - This allows us to provide the user with more ways to customize the virtualization and return more detailed info about it if the calling code wishes to log additional info. In both cases we can extend the options and the data returned without changing the API. - Previously if we virtualized a package that was not checked out in revision control we would warn the user and then skip updating the package file on disk. This means the payloads would be uploaded but the user would be left with no local changes. Since sometimes we know we don't need to check out any package (virtualizing the packages in a change list for example) we don't want to always incur the cost of polling reivision control to see which packages do need checking out. This is why we now allow the caller to request package files be checked out via the new options enum EVirtualizationOptions. -- If the EVirtualizationOptions::Checkout flag is provided we will poll the revision control status of all package files and then check out those which need it. -- We still check if packages can be modified and warn the user if they can't, as package files could be locked in other ways. - Added a new utility function to SourceControlUtilties to make it easier to check out packages. There is similar functionality elsewhere in the code base but the virtualization module is too low level to make use of it. - Updated existing code that calls ::TryVirtualizePackages and add cases of ''using namespace UE::Virtualization' where required to improve readability. - The UnrealVirtualizationTool now supports a new cmdline option "-checkout" that can be used when virtualizing packages. This will checkout any package that was actually virtualized so the result can be saved back out to the workspace domain. This means we no longer require the caller to have checked out the packages before running the tool. [CL 23536832 by paul chipchase in ue5-main branch]
2022-12-16 06:25:07 -05:00
OutResultInfo.AddError(MoveTemp(Message));
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
return;
}
int64 TotalPayloadsVirtualized = 0;
for (const Virtualization::FPushRequest& Request : PayloadsToSubmit)
{
TotalPayloadsVirtualized += Request.GetResult().WasPushed() ? 1 : 0;
Virtualizing payloads when submitting via the editor will now correctly push the payloads to cache storage backends as well as persistent backends #rb Per.Larsson #jira UE-160943, UE-151671 #rnx #preflight 62fa3334153b17e7462954b3 ### Problem - The virtualization process first checks to see if any of the local payloads are already stored in persistent storage and only try to submit those which aren't, which would mean if we are submitting a package with a local payload that is already in the persistent storage system but not in cached storage we have no way to cache it. - This logic also introduced problems where filtered out payloads were being virtualized which was fixed with the addition of ENABLE_FILTERING_HACK which was not a robust long term solution. - Checking if the payloads need to be pushed or not complicated the logic of the virtualization process and makes it harder to understand and make fixes too. ### Fix - Removed code for ENABLE_FILTERING_HACK and UE_PRECHECK_PAYLOAD_STATUS from the virtualization process - Removed some code loops from the virtualization process but we still end up with one loop finding all of the package trailers and a second loop setting up the FWorkspaceDomainPayloadProvider -- This second loop will be removed in a future work item to avoid trying to push duplicate payloads (unless I end up doing that work at the virtualization manager level instead) - We make one push for cached storage and another for persistent -- Cached storage push failure will only result in a warning, failing the persistent push will error out as before -- We need to reset the request states after the cache push. In a future work item we will likely allow a single push to specific both storage solutions, so I am leaving this code using the same set of requests for both pushes. #robomerge FNMain [CL 21386503 by paul chipchase in ue5-main branch]
2022-08-15 10:17:46 -04:00
}
UE_LOG(LogVirtualization, Display, TEXT("Pushed %" INT64_FMT " payload(s) to EStorageType::Persistent storage"), TotalPayloadsVirtualized);
}
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
// Update the package info for the submitted payloads
for (FPackageInfo& PackageInfo : Packages)
{
for (int32 Index = 0; Index < PackageInfo.LocalPayloads.Num(); ++Index)
{
const Virtualization::FPushRequest& Request = PayloadsToSubmit[PackageInfo.PayloadIndex + Index];
When virtualizing payloads we now load the payloads from disk on demand rather than all at once. This allows us to avoid memory spikes when virtualizing large amounts of data. #rb Per.Larsson #rnx #jira UE-151000 #preflight 627271c83b2ed6f85363f53a - In the old code when submitting packages for virtualization we would load all local payloads into memory then pass them to the virtualization system to be pushed to persistent storage. This works fine for an average users submits but when resaving large projects we can have submits with thousands of packages containing gigabytes of payloads. - FPushRequest has been changed to accept either a payload directly (in the form of FCompressedBuffer) or accept a reference to a IPayloadProvider which will return the payload when asked. - Most of the existing backends make no use of the batched pushing feature and only push one payload at a time. The default implementation of this will be to ask each request one at a time for the payload and then pass it to the backend for a single push. If the IPayloadProvider is being used we will therefor only ever have one payload loaded at a time avoiding memory spikes. - The source control backend currently does implement the batched pushing feature. This backend writes each payload to disk before submitting them to perforce, so all we have to do is make sure that request one payload, write it to disk, then discard the payload to avoid memory spikes. - This change required that FPushRequest make it's members private and provide accessors to them, as the caller shouldn't need to care if the payload is loaded or comes via the provider. - NOTE that this implementation is less efficient if we have packages containing many different payloads as we will end up opening the package, parsing the trailer then load a payload many times over, where as the original code would load all payloads from the file in one go. In practice, given the overhead of the source control backend this didn't make a huge difference and is probably not worth the effort to optimize at this point. [CL 20040609 by paul chipchase in ue5-main branch]
2022-05-04 08:58:50 -04:00
check(Request.GetIdentifier() == PackageInfo.LocalPayloads[Index]);
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
if (Request.GetResult().IsVirtualized())
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
{
When virtualizing payloads we now load the payloads from disk on demand rather than all at once. This allows us to avoid memory spikes when virtualizing large amounts of data. #rb Per.Larsson #rnx #jira UE-151000 #preflight 627271c83b2ed6f85363f53a - In the old code when submitting packages for virtualization we would load all local payloads into memory then pass them to the virtualization system to be pushed to persistent storage. This works fine for an average users submits but when resaving large projects we can have submits with thousands of packages containing gigabytes of payloads. - FPushRequest has been changed to accept either a payload directly (in the form of FCompressedBuffer) or accept a reference to a IPayloadProvider which will return the payload when asked. - Most of the existing backends make no use of the batched pushing feature and only push one payload at a time. The default implementation of this will be to ask each request one at a time for the payload and then pass it to the backend for a single push. If the IPayloadProvider is being used we will therefor only ever have one payload loaded at a time avoiding memory spikes. - The source control backend currently does implement the batched pushing feature. This backend writes each payload to disk before submitting them to perforce, so all we have to do is make sure that request one payload, write it to disk, then discard the payload to avoid memory spikes. - This change required that FPushRequest make it's members private and provide accessors to them, as the caller shouldn't need to care if the payload is loaded or comes via the provider. - NOTE that this implementation is less efficient if we have packages containing many different payloads as we will end up opening the package, parsing the trailer then load a payload many times over, where as the original code would load all payloads from the file in one go. In practice, given the overhead of the source control backend this didn't make a huge difference and is probably not worth the effort to optimize at this point. [CL 20040609 by paul chipchase in ue5-main branch]
2022-05-04 08:58:50 -04:00
if (PackageInfo.Trailer.UpdatePayloadAsVirtualized(Request.GetIdentifier()))
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
{
PackageInfo.bWasTrailerUpdated = true;
}
else
{
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
FText Message = FText::Format( LOCTEXT("Virtualization_UpdateStatusFailed", "Unable to update the status for the payload '{0}' in the package '{1}'"),
When virtualizing payloads we now load the payloads from disk on demand rather than all at once. This allows us to avoid memory spikes when virtualizing large amounts of data. #rb Per.Larsson #rnx #jira UE-151000 #preflight 627271c83b2ed6f85363f53a - In the old code when submitting packages for virtualization we would load all local payloads into memory then pass them to the virtualization system to be pushed to persistent storage. This works fine for an average users submits but when resaving large projects we can have submits with thousands of packages containing gigabytes of payloads. - FPushRequest has been changed to accept either a payload directly (in the form of FCompressedBuffer) or accept a reference to a IPayloadProvider which will return the payload when asked. - Most of the existing backends make no use of the batched pushing feature and only push one payload at a time. The default implementation of this will be to ask each request one at a time for the payload and then pass it to the backend for a single push. If the IPayloadProvider is being used we will therefor only ever have one payload loaded at a time avoiding memory spikes. - The source control backend currently does implement the batched pushing feature. This backend writes each payload to disk before submitting them to perforce, so all we have to do is make sure that request one payload, write it to disk, then discard the payload to avoid memory spikes. - This change required that FPushRequest make it's members private and provide accessors to them, as the caller shouldn't need to care if the payload is loaded or comes via the provider. - NOTE that this implementation is less efficient if we have packages containing many different payloads as we will end up opening the package, parsing the trailer then load a payload many times over, where as the original code would load all payloads from the file in one go. In practice, given the overhead of the source control backend this didn't make a huge difference and is probably not worth the effort to optimize at this point. [CL 20040609 by paul chipchase in ue5-main branch]
2022-05-04 08:58:50 -04:00
FText::FromString(LexToString(Request.GetIdentifier())),
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
FText::FromString(PackageInfo.Path.GetDebugName()));
Add a new overload to IVirtualizationSystem::TryVirtualizePackages, which takes additional options (via a bitfield enum) and returns more info about the resulting process. The original version has been drepcated. #rb Per.Larsson #jira UE-169626 #rnx #preflight 639c4112012902cb8db43e13 - This allows us to provide the user with more ways to customize the virtualization and return more detailed info about it if the calling code wishes to log additional info. In both cases we can extend the options and the data returned without changing the API. - Previously if we virtualized a package that was not checked out in revision control we would warn the user and then skip updating the package file on disk. This means the payloads would be uploaded but the user would be left with no local changes. Since sometimes we know we don't need to check out any package (virtualizing the packages in a change list for example) we don't want to always incur the cost of polling reivision control to see which packages do need checking out. This is why we now allow the caller to request package files be checked out via the new options enum EVirtualizationOptions. -- If the EVirtualizationOptions::Checkout flag is provided we will poll the revision control status of all package files and then check out those which need it. -- We still check if packages can be modified and warn the user if they can't, as package files could be locked in other ways. - Added a new utility function to SourceControlUtilties to make it easier to check out packages. There is similar functionality elsewhere in the code base but the virtualization module is too low level to make use of it. - Updated existing code that calls ::TryVirtualizePackages and add cases of ''using namespace UE::Virtualization' where required to improve readability. - The UnrealVirtualizationTool now supports a new cmdline option "-checkout" that can be used when virtualizing packages. This will checkout any package that was actually virtualized so the result can be saved back out to the workspace domain. This means we no longer require the caller to have checked out the packages before running the tool. [CL 23536832 by paul chipchase in ue5-main branch]
2022-12-16 06:25:07 -05:00
OutResultInfo.AddError(MoveTemp(Message));
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
return;
}
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
}
}
}
// If the process has created new errors by this point we should early out before we actually start making changes on disk
if (NumErrors != OutResultInfo.GetNumErrors())
{
return;
}
Progress.EnterProgressFrame(1.0f);
struct FPackageReplacement
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
{
FPackagePath Path;
FPackageTrailer Trailer;
};
TArray<FPackageReplacement> PackagesToReplace;
PackagesToReplace.Reserve(Packages.Num());
for (FPackageInfo& PackageInfo : Packages)
{
if (PackageInfo.bWasTrailerUpdated)
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
{
PackagesToReplace.Add({MoveTemp(PackageInfo.Path), MoveTemp(PackageInfo.Trailer)});
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
}
}
Packages.Empty(); // No longer used, the useful data have been moved to PackagesToReplace
if (PackagesToReplace.IsEmpty())
{
UE_LOG(LogVirtualization, Display, TEXT("No packages need to be updated on disk"));
return;
}
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time. #rb PJ.Kack #jira UE-136126 #rnx #preflight ### VirtualizationSystem - Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request. - Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit. - The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need. - The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense. - NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit. ### SourceControlBackend - Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description. ### PackageSubmissionChecks - Converted the submission process to use the new batch push operation in VirtualizationSystem. -- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized. - Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend. This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values. -- The hope is that over time the submission processes will become fast enough that we can remove the precheck. - Fixed up logging to not always assume more than one package or payload. ### General Notes - Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway. - The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc. - As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages. -- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option? #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469) [CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
UE_LOG(LogVirtualization, Display, TEXT("%d package(s) had their trailer container modified and need to be updated"), PackagesToReplace.Num());
Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized. #rb PJ.Kack #rnx #preflight 61a795773c29b3cf13cd8250 ### PackageSubmissionChecks - Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized. - Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch. - Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them. - Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk. - Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario. - To make the code easier to follow we now early out of the entire check when errors are encountered. - Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future. - Notes -- There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed. -- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it. -- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work. -- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done. ### VirtualizationSystem - EStorageType has been promoted to enum class. - Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not. - Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system. ### VirtualizationManager - Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status. ### IVirtualizationBackend - ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this. - Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload. -- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it. ### VirtualizationSourceControlBackend - This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches. - In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version. ### PackageTrailer - The trailer can now be queries to request how many payloads of a given type it contains #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
{
// We need to reset the loader of any loaded package that we want save over on disk so that
// the file lock is relinquished
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
TRACE_CPUPROFILER_EVENT_SCOPE(UE::Virtualization::VirtualizePackages::ResetLoaders);
UE_LOG(LogVirtualization, Display, TEXT("Detaching loaded packages from disk..."));
int32 NumPackagesReset = 0;
for (const FPackageReplacement& PackageInfo : PackagesToReplace)
{
UPackage* LoadedPackage = FindObjectFast<UPackage>(nullptr, PackageInfo.Path.GetPackageFName());
if (LoadedPackage != nullptr)
{
UE_LOG(LogVirtualization, Verbose, TEXT("Detaching '%s'"), *PackageInfo.Path.GetDebugName());
// TODO: Consider using the batch API
ResetLoadersForSave(LoadedPackage, *PackageInfo.Path.GetLocalFullPath());
NumPackagesReset++;
}
Add a new overload to IVirtualizationSystem::TryVirtualizePackages, which takes additional options (via a bitfield enum) and returns more info about the resulting process. The original version has been drepcated. #rb Per.Larsson #jira UE-169626 #rnx #preflight 639c4112012902cb8db43e13 - This allows us to provide the user with more ways to customize the virtualization and return more detailed info about it if the calling code wishes to log additional info. In both cases we can extend the options and the data returned without changing the API. - Previously if we virtualized a package that was not checked out in revision control we would warn the user and then skip updating the package file on disk. This means the payloads would be uploaded but the user would be left with no local changes. Since sometimes we know we don't need to check out any package (virtualizing the packages in a change list for example) we don't want to always incur the cost of polling reivision control to see which packages do need checking out. This is why we now allow the caller to request package files be checked out via the new options enum EVirtualizationOptions. -- If the EVirtualizationOptions::Checkout flag is provided we will poll the revision control status of all package files and then check out those which need it. -- We still check if packages can be modified and warn the user if they can't, as package files could be locked in other ways. - Added a new utility function to SourceControlUtilties to make it easier to check out packages. There is similar functionality elsewhere in the code base but the virtualization module is too low level to make use of it. - Updated existing code that calls ::TryVirtualizePackages and add cases of ''using namespace UE::Virtualization' where required to improve readability. - The UnrealVirtualizationTool now supports a new cmdline option "-checkout" that can be used when virtualizing packages. This will checkout any package that was actually virtualized so the result can be saved back out to the workspace domain. This means we no longer require the caller to have checked out the packages before running the tool. [CL 23536832 by paul chipchase in ue5-main branch]
2022-12-16 06:25:07 -05:00
}
UE_LOG(LogVirtualization, Display, TEXT("Reset the loaders of %d package(s)"), NumPackagesReset);
}
if (EnumHasAnyFlags(Options, EVirtualizationOptions::Checkout))
{
TRACE_CPUPROFILER_EVENT_SCOPE(UE::Virtualization::VirtualizePackages::Checkout);
UE_LOG(LogVirtualization, Display, TEXT("Checking out packages from revision control..."));
TArray<FString> FilesToCheckState;
FilesToCheckState.Reserve(PackagesToReplace.Num());
for (const FPackageReplacement& PackageInfo : PackagesToReplace)
Add a new overload to IVirtualizationSystem::TryVirtualizePackages, which takes additional options (via a bitfield enum) and returns more info about the resulting process. The original version has been drepcated. #rb Per.Larsson #jira UE-169626 #rnx #preflight 639c4112012902cb8db43e13 - This allows us to provide the user with more ways to customize the virtualization and return more detailed info about it if the calling code wishes to log additional info. In both cases we can extend the options and the data returned without changing the API. - Previously if we virtualized a package that was not checked out in revision control we would warn the user and then skip updating the package file on disk. This means the payloads would be uploaded but the user would be left with no local changes. Since sometimes we know we don't need to check out any package (virtualizing the packages in a change list for example) we don't want to always incur the cost of polling reivision control to see which packages do need checking out. This is why we now allow the caller to request package files be checked out via the new options enum EVirtualizationOptions. -- If the EVirtualizationOptions::Checkout flag is provided we will poll the revision control status of all package files and then check out those which need it. -- We still check if packages can be modified and warn the user if they can't, as package files could be locked in other ways. - Added a new utility function to SourceControlUtilties to make it easier to check out packages. There is similar functionality elsewhere in the code base but the virtualization module is too low level to make use of it. - Updated existing code that calls ::TryVirtualizePackages and add cases of ''using namespace UE::Virtualization' where required to improve readability. - The UnrealVirtualizationTool now supports a new cmdline option "-checkout" that can be used when virtualizing packages. This will checkout any package that was actually virtualized so the result can be saved back out to the workspace domain. This means we no longer require the caller to have checked out the packages before running the tool. [CL 23536832 by paul chipchase in ue5-main branch]
2022-12-16 06:25:07 -05:00
{
FilesToCheckState.Add(PackageInfo.Path.GetLocalFullPath());
Add a new overload to IVirtualizationSystem::TryVirtualizePackages, which takes additional options (via a bitfield enum) and returns more info about the resulting process. The original version has been drepcated. #rb Per.Larsson #jira UE-169626 #rnx #preflight 639c4112012902cb8db43e13 - This allows us to provide the user with more ways to customize the virtualization and return more detailed info about it if the calling code wishes to log additional info. In both cases we can extend the options and the data returned without changing the API. - Previously if we virtualized a package that was not checked out in revision control we would warn the user and then skip updating the package file on disk. This means the payloads would be uploaded but the user would be left with no local changes. Since sometimes we know we don't need to check out any package (virtualizing the packages in a change list for example) we don't want to always incur the cost of polling reivision control to see which packages do need checking out. This is why we now allow the caller to request package files be checked out via the new options enum EVirtualizationOptions. -- If the EVirtualizationOptions::Checkout flag is provided we will poll the revision control status of all package files and then check out those which need it. -- We still check if packages can be modified and warn the user if they can't, as package files could be locked in other ways. - Added a new utility function to SourceControlUtilties to make it easier to check out packages. There is similar functionality elsewhere in the code base but the virtualization module is too low level to make use of it. - Updated existing code that calls ::TryVirtualizePackages and add cases of ''using namespace UE::Virtualization' where required to improve readability. - The UnrealVirtualizationTool now supports a new cmdline option "-checkout" that can be used when virtualizing packages. This will checkout any package that was actually virtualized so the result can be saved back out to the workspace domain. This means we no longer require the caller to have checked out the packages before running the tool. [CL 23536832 by paul chipchase in ue5-main branch]
2022-12-16 06:25:07 -05:00
}
if (!TryCheckoutFiles(FilesToCheckState, OutResultInfo.Errors, &OutResultInfo.CheckedOutPackages))
{
return;
}
UE_LOG(LogVirtualization, Display, TEXT("Checked out %d package(s)"), OutResultInfo.CheckedOutPackages.Num());
}
{
TRACE_CPUPROFILER_EVENT_SCOPE(UE::Virtualization::VirtualizePackages::EnsureWritePermissions);
UE_CLOG(!PackagesToReplace.IsEmpty(), LogVirtualization, Display, TEXT("Checking packages for write access permission..."));
Add a new overload to IVirtualizationSystem::TryVirtualizePackages, which takes additional options (via a bitfield enum) and returns more info about the resulting process. The original version has been drepcated. #rb Per.Larsson #jira UE-169626 #rnx #preflight 639c4112012902cb8db43e13 - This allows us to provide the user with more ways to customize the virtualization and return more detailed info about it if the calling code wishes to log additional info. In both cases we can extend the options and the data returned without changing the API. - Previously if we virtualized a package that was not checked out in revision control we would warn the user and then skip updating the package file on disk. This means the payloads would be uploaded but the user would be left with no local changes. Since sometimes we know we don't need to check out any package (virtualizing the packages in a change list for example) we don't want to always incur the cost of polling reivision control to see which packages do need checking out. This is why we now allow the caller to request package files be checked out via the new options enum EVirtualizationOptions. -- If the EVirtualizationOptions::Checkout flag is provided we will poll the revision control status of all package files and then check out those which need it. -- We still check if packages can be modified and warn the user if they can't, as package files could be locked in other ways. - Added a new utility function to SourceControlUtilties to make it easier to check out packages. There is similar functionality elsewhere in the code base but the virtualization module is too low level to make use of it. - Updated existing code that calls ::TryVirtualizePackages and add cases of ''using namespace UE::Virtualization' where required to improve readability. - The UnrealVirtualizationTool now supports a new cmdline option "-checkout" that can be used when virtualizing packages. This will checkout any package that was actually virtualized so the result can be saved back out to the workspace domain. This means we no longer require the caller to have checked out the packages before running the tool. [CL 23536832 by paul chipchase in ue5-main branch]
2022-12-16 06:25:07 -05:00
// Now check to see if there are package files that cannot be edited because they are read only
int32 NumSkipped = 0;
Add a new overload to IVirtualizationSystem::TryVirtualizePackages, which takes additional options (via a bitfield enum) and returns more info about the resulting process. The original version has been drepcated. #rb Per.Larsson #jira UE-169626 #rnx #preflight 639c4112012902cb8db43e13 - This allows us to provide the user with more ways to customize the virtualization and return more detailed info about it if the calling code wishes to log additional info. In both cases we can extend the options and the data returned without changing the API. - Previously if we virtualized a package that was not checked out in revision control we would warn the user and then skip updating the package file on disk. This means the payloads would be uploaded but the user would be left with no local changes. Since sometimes we know we don't need to check out any package (virtualizing the packages in a change list for example) we don't want to always incur the cost of polling reivision control to see which packages do need checking out. This is why we now allow the caller to request package files be checked out via the new options enum EVirtualizationOptions. -- If the EVirtualizationOptions::Checkout flag is provided we will poll the revision control status of all package files and then check out those which need it. -- We still check if packages can be modified and warn the user if they can't, as package files could be locked in other ways. - Added a new utility function to SourceControlUtilties to make it easier to check out packages. There is similar functionality elsewhere in the code base but the virtualization module is too low level to make use of it. - Updated existing code that calls ::TryVirtualizePackages and add cases of ''using namespace UE::Virtualization' where required to improve readability. - The UnrealVirtualizationTool now supports a new cmdline option "-checkout" that can be used when virtualizing packages. This will checkout any package that was actually virtualized so the result can be saved back out to the workspace domain. This means we no longer require the caller to have checked out the packages before running the tool. [CL 23536832 by paul chipchase in ue5-main branch]
2022-12-16 06:25:07 -05:00
for (int32 Index = 0; Index < PackagesToReplace.Num(); ++Index)
{
const FPackageReplacement& PackageInfo = PackagesToReplace[Index];
if (!CanWriteToFile(PackageInfo.Path.GetLocalFullPath()))
{
// Technically the package could have local payloads that won't be virtualized due to filtering or min payload sizes and so the
// following warning is misleading. This will be solved if we move that evaluation to the point of saving a package.
// If not then we probably need to extend QueryPayloadStatuses to test filtering etc as well, then check for potential package
// modification after that.
// Long term, the stand alone tool should be able to request the UnrealEditor relinquish the lock on the package file so this becomes
// less of a problem.
FText Message = FText::Format(LOCTEXT("Virtualization_PkgLocked", "The package file '{0}' has local payloads but is locked for modification and cannot be virtualized, this package will be skipped!"),
FText::FromString(PackageInfo.Path.GetDebugName()));
UE_LOG(LogVirtualization, Warning, TEXT("%s"), *Message.ToString());
PackagesToReplace.RemoveAt(Index--);
NumSkipped++;
}
}
UE_CLOG(NumSkipped > 0, LogVirtualization, Warning, TEXT("Skipped %d package(s)"), NumSkipped);
UE_CLOG(NumSkipped == 0, LogVirtualization, Display, TEXT("All packages have write permission"));
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
}
{
// Since we had no errors we can now replace all of the packages that were virtualized data with the virtualized replacement file.
TRACE_CPUPROFILER_EVENT_SCOPE(UE::Virtualization::VirtualizePackages::ReplacePackages);
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
UE_LOG(LogVirtualization, Display, TEXT("Replacing old packages with the virtualized version..."));
int64 OriginalSizeTotal = 0;
int64 ReducedSizeTotal = 0;
for (const FPackageReplacement& PackageInfo : PackagesToReplace)
{
const FString OriginalPackagePath = PackageInfo.Path.GetLocalFullPath();
const FString NewPackagePath = DuplicatePackageWithUpdatedTrailer(OriginalPackagePath, PackageInfo.Trailer, OutResultInfo.Errors);
if (NewPackagePath.IsEmpty())
{
// Duplication failed so skip this package for now. The error will be in OutResultInfo.Errors
continue;
}
const int64 OriginalSize = IFileManager::Get().GetStatData(*OriginalPackagePath).FileSize;
const int64 ReducedSize = IFileManager::Get().GetStatData(*NewPackagePath).FileSize;
if (IFileManager::Get().Move(*OriginalPackagePath, *NewPackagePath))
{
OutResultInfo.VirtualizedPackages.Add(OriginalPackagePath);
OriginalSizeTotal += OriginalSize;
ReducedSizeTotal += ReducedSize;
UE_LOG(LogVirtualization, Verbose, TEXT("Reducing %s: %s -> %s"),
*PackageInfo.Path.GetDebugName(),
*FText::AsMemory(OriginalSize).ToString(),
*FText::AsMemory(ReducedSize).ToString());
}
else
{
FText Message = FText::Format(LOCTEXT("Virtualization_MoveFailed", "Unable to replace the package '{0}' with the virtualized version"),
FText::FromString(PackageInfo.Path.GetDebugName()));
OutResultInfo.AddError(MoveTemp(Message));
continue;
}
}
if (OriginalSizeTotal != ReducedSizeTotal)
{
UE_LOG(LogVirtualization, Display, TEXT("Total Reduction %s (%s -> %s)"),
*FText::AsMemory(OriginalSizeTotal - ReducedSizeTotal).ToString(),
*FText::AsMemory(OriginalSizeTotal).ToString(),
*FText::AsMemory(ReducedSizeTotal).ToString());
}
UE_LOG(LogVirtualization, Display, TEXT("Replaced %d package(s)"), OutResultInfo.VirtualizedPackages.Num());
}
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
}
} // namespace UE::Virtualization
#undef LOCTEXT_NAMESPACE
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous. #rb PJ.Kack #rnx #preflight 61a4b235405273b2c3daa7c7 ### Virtualization - The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted. - The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again. - This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc. - In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead. - In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm) - For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk. ### Current Issues - The system does not work with the editor domain and has minor issues with text based assets - The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional. ### PackageTrailer - The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer - FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use. - When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire. - the header file contains a breakdown of the new format. ### VirtualizedBulkData - FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file. - FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code. - Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads. - We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave. - Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object. - Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use. - ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs. - We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package. - We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled. - We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code. -- Once the FPackageTrailer feature is no longer optional we can consider changing this. ### SavePackage/SavePackage2 - Removed the functions that would create the FPayloadToc - Added new functions to help create the trailer. - Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer. - We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems. ### LinkerLoad - After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package -- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead. - By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file. ### LinkerSave - We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed. - We now gather data to be appended to the package trailer via TrailerBuilder. ### DumpPayloadToc - This command now dumps info based on the package trailer if one can be found. #ROBOMERGE-AUTHOR: paul.chipchase #ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905 #ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469) [CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00