2021-10-25 20:05:28 -04:00
// Copyright Epic Games, Inc. All Rights Reserved.
2022-07-18 08:40:11 -04:00
# include "PackageVirtualizationProcess.h"
2021-10-25 20:05:28 -04:00
# include "Containers/UnrealString.h"
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
# include "HAL/FileManager.h"
# include "HAL/PlatformFileManager.h"
2021-11-30 02:37:48 -05:00
# include "HAL/PlatformTime.h"
2021-11-07 23:43:01 -05:00
# include "Internationalization/Internationalization.h"
2021-10-25 20:05:28 -04:00
# include "Misc/PackageName.h"
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
# include "Misc/Paths.h"
2022-02-16 07:51:39 -05:00
# include "Misc/ScopedSlowTask.h"
Add a rehydration command to the stand alone virtualization tool, making it easier to reverse the effects of asset virtualization. Unlike previous processes, this one does not require that we load the package and will just manipulate the data storaged in the package trailer.
#rb Sebastian.Nordgren
#rnx
#jira UE-156436
#preflight 62c287f9a3568e30664eb94f
### VA Standalone Tool
- We now plan to add much more functionality to the tool than just virtualizing and submitting changelists, so to make this easier I am moving the tool towards a design where it should be fairly easy to add new functionality.
- Added FCommand, which is a base class for adding new functionality, simple derive from FCommand and hook it up at the appropriate locations.
-- In the future it should be possible for new command types to automatically register themselves to be initiated from the command line. There should be no need to edit UnrealVirtualizationToolApp to add a new command but this will be done as an additional work item.
-- At the moment FCommand comes with a number of utility methods to call that cover some common source control commands.
-- The original functionality has not yet been moved to the command system and so the code is a little bit weird at the moment. Updating older code to the new system will be done as an additional work item.
- FProject/FPlugin have been moved to their own code files.
### Rehydrate Command
- The rehydrate command will take a number of packages, check them out of source control and then attempt to virtualize them.
- At the moment the chekout logic is fairly basic, we just check out every package supplied, we don't check if the package is virtualized or not yet. This can be improved in additional work items. Ideally by the end of command the only packages that we have checked out should also be rehydrated.
- At the moment the command can either take a path of a specific package, a path of a directory to find packages in, or a changelist containing packages that should be rehydrated.
- A cleint spec (workspace) can optionally be provided, but if not supplied we will attempt to find a client spec for which to check out the packages.
- Currently we will check out the packages to the default change list.
### Rehydrate process
- Added the rehydration process in it's own code files in the virtualization module. Like the virtualization process this is exposed in a public header file and no via the Core interface which means it is very specific to our module/implementation.
- The process expects that the caller will have checked out any required packages from source control. It will treat being unable to update a package file as an error.
- Added PackageUtils.h/.cpp and moved some of the generic code from the virtualization process code there so that it can be shared by the rehydration process.
### Misc
Moving away from the using things like FPackagePath as that requires that the correct mount points have been registered for a project and at the moment (with the flakiness of FConfig*) it seems that the best idea would be to prefer absolute file paths where possible.
[CL 20982284 by paul chipchase in ue5-main branch]
2022-07-07 06:54:33 -04:00
# include "PackageUtils.h"
2022-01-21 09:18:53 -05:00
# include "Serialization/EditorBulkData.h"
2021-11-30 02:37:48 -05:00
# include "UObject/Linker.h"
# include "UObject/Package.h"
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
# include "UObject/PackageResourceManager.h"
# include "UObject/PackageTrailer.h"
2021-11-30 02:37:48 -05:00
# include "UObject/UObjectGlobals.h"
2021-10-25 20:05:28 -04:00
# include "Virtualization/VirtualizationSystem.h"
Temp fix to prevent payloads that should be filtered out from virtualizing from being virtualized if they are already in persistent storage, which was causing unexpected packages to become virtualized.
#rb Per.Larsson
#rnx
#jira UE-156312
#preflight 62a99553943e7bb256fe174c
### Problem
- Firstly this fix is a hack and not intended for shipping 5.1, but the real fix is already planned work, which involves moving the filtering from the push process to being applied to each payload when the package is saved, just as the asset filtering is done at the moment.
-- Doing the proper fix will take a bit of time and testing so we need a quicker fix so that I can re-enable the virtualization process to unblock some workflows.
- All new code is wrapped in a single preprocessor define 'ENABLE_FILTERING_HACK' making the code easy to identify and remove.
- Submitting this hack will cause the priority of the real fix to be bumped in the backlog so that we can remove the hack asap.
- The problem code comes from a pass we do on the payloads to check if they are already in persistent storage or not. If the payload is then we just change it to be virtualized and don't attempt to push. In addition to fixing the filtering there is also a work item to investigate if we really want to keep this logic or if we want to remove it anyway. This might be looked into before I look into the filtering fix as an alternative way to remove the hack.
### Fix
- After we have queried the system to check which of the local payloads are already in persistent storage, BUT before we change them over to be virtualized we now run a new pass, checking which payloads should be filtered.
-- This is a bespoke code path expected only to be used here, so in this case it is not in the IVirtualizationSystem API but requires us to cast to FVirtualizationManager to gain access. This should stop anyone else from using the hack before it is removed.
- The filtering check mimics the filtering that would be run on a normal push operation.
- Once done we can check for any payload that would be filtered, and if so, set the status to not found. This prevents the payload from being changed to be virtualized and means it will be included in the normal push operation where it will be correctly filtered.
- None of this is the best way to do it, but it did seem like the easiest way to add the hack in isolation as no existing code was changed, so it should be simple to just 'delete' the hack when ready.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 20667712 via CL 20667730 via CL 20667733
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017)
[CL 20668639 by paul chipchase in ue5-main branch]
2022-06-15 08:56:57 -04:00
# include "VirtualizationManager.h"
2021-10-25 20:05:28 -04:00
Add a rehydration command to the stand alone virtualization tool, making it easier to reverse the effects of asset virtualization. Unlike previous processes, this one does not require that we load the package and will just manipulate the data storaged in the package trailer.
#rb Sebastian.Nordgren
#rnx
#jira UE-156436
#preflight 62c287f9a3568e30664eb94f
### VA Standalone Tool
- We now plan to add much more functionality to the tool than just virtualizing and submitting changelists, so to make this easier I am moving the tool towards a design where it should be fairly easy to add new functionality.
- Added FCommand, which is a base class for adding new functionality, simple derive from FCommand and hook it up at the appropriate locations.
-- In the future it should be possible for new command types to automatically register themselves to be initiated from the command line. There should be no need to edit UnrealVirtualizationToolApp to add a new command but this will be done as an additional work item.
-- At the moment FCommand comes with a number of utility methods to call that cover some common source control commands.
-- The original functionality has not yet been moved to the command system and so the code is a little bit weird at the moment. Updating older code to the new system will be done as an additional work item.
- FProject/FPlugin have been moved to their own code files.
### Rehydrate Command
- The rehydrate command will take a number of packages, check them out of source control and then attempt to virtualize them.
- At the moment the chekout logic is fairly basic, we just check out every package supplied, we don't check if the package is virtualized or not yet. This can be improved in additional work items. Ideally by the end of command the only packages that we have checked out should also be rehydrated.
- At the moment the command can either take a path of a specific package, a path of a directory to find packages in, or a changelist containing packages that should be rehydrated.
- A cleint spec (workspace) can optionally be provided, but if not supplied we will attempt to find a client spec for which to check out the packages.
- Currently we will check out the packages to the default change list.
### Rehydrate process
- Added the rehydration process in it's own code files in the virtualization module. Like the virtualization process this is exposed in a public header file and no via the Core interface which means it is very specific to our module/implementation.
- The process expects that the caller will have checked out any required packages from source control. It will treat being unable to update a package file as an error.
- Added PackageUtils.h/.cpp and moved some of the generic code from the virtualization process code there so that it can be shared by the rehydration process.
### Misc
Moving away from the using things like FPackagePath as that requires that the correct mount points have been registered for a project and at the moment (with the flakiness of FConfig*) it seems that the best idea would be to prefer absolute file paths where possible.
[CL 20982284 by paul chipchase in ue5-main branch]
2022-07-07 06:54:33 -04:00
# define LOCTEXT_NAMESPACE "Virtualization"
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
2021-10-25 20:05:28 -04:00
namespace UE : : Virtualization
{
2022-05-04 08:58:50 -04:00
/**
* Implementation of the IPayloadProvider interface so that payloads can be requested on demand
* when they are being virtualized .
*
* This implementation is not optimized . If a package holds many payloads that are all virtualized
* we will end up loading the same trailer over and over , as well as opening the same package file
* for read many times .
*
* So far this has shown to be a rounding error compared to the actual cost of virtualization
* and so implementing any level of caching has been left as a future task .
2022-08-15 10:17:46 -04:00
*
* TODO : Implement a MRU cache for payloads to prevent loading the same payload off disk many
* times for different backends if it will not cause a huge memory spike .
2022-05-04 08:58:50 -04:00
*/
class FWorkspaceDomainPayloadProvider final : public IPayloadProvider
{
public :
FWorkspaceDomainPayloadProvider ( ) = default ;
virtual ~ FWorkspaceDomainPayloadProvider ( ) = default ;
/** Register the payload with it's trailer and package name so that we can access it later as needed */
void RegisterPayload ( const FIoHash & PayloadId , uint64 SizeOnDisk , const FString & PackageName )
{
if ( ! PayloadId . IsZero ( ) )
{
PayloadLookupTable . Emplace ( PayloadId , FPayloadData ( SizeOnDisk , PackageName ) ) ;
}
}
private :
virtual FCompressedBuffer RequestPayload ( const FIoHash & Identifier ) override
{
if ( Identifier . IsZero ( ) )
{
return FCompressedBuffer ( ) ;
}
const FPayloadData * Data = PayloadLookupTable . Find ( Identifier ) ;
if ( Data = = nullptr )
{
Temp fix to prevent payloads that should be filtered out from virtualizing from being virtualized if they are already in persistent storage, which was causing unexpected packages to become virtualized.
#rb Per.Larsson
#rnx
#jira UE-156312
#preflight 62a99553943e7bb256fe174c
### Problem
- Firstly this fix is a hack and not intended for shipping 5.1, but the real fix is already planned work, which involves moving the filtering from the push process to being applied to each payload when the package is saved, just as the asset filtering is done at the moment.
-- Doing the proper fix will take a bit of time and testing so we need a quicker fix so that I can re-enable the virtualization process to unblock some workflows.
- All new code is wrapped in a single preprocessor define 'ENABLE_FILTERING_HACK' making the code easy to identify and remove.
- Submitting this hack will cause the priority of the real fix to be bumped in the backlog so that we can remove the hack asap.
- The problem code comes from a pass we do on the payloads to check if they are already in persistent storage or not. If the payload is then we just change it to be virtualized and don't attempt to push. In addition to fixing the filtering there is also a work item to investigate if we really want to keep this logic or if we want to remove it anyway. This might be looked into before I look into the filtering fix as an alternative way to remove the hack.
### Fix
- After we have queried the system to check which of the local payloads are already in persistent storage, BUT before we change them over to be virtualized we now run a new pass, checking which payloads should be filtered.
-- This is a bespoke code path expected only to be used here, so in this case it is not in the IVirtualizationSystem API but requires us to cast to FVirtualizationManager to gain access. This should stop anyone else from using the hack before it is removed.
- The filtering check mimics the filtering that would be run on a normal push operation.
- Once done we can check for any payload that would be filtered, and if so, set the status to not found. This prevents the payload from being changed to be virtualized and means it will be included in the normal push operation where it will be correctly filtered.
- None of this is the best way to do it, but it did seem like the easiest way to add the hack in isolation as no existing code was changed, so it should be simple to just 'delete' the hack when ready.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 20667712 via CL 20667730 via CL 20667733
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017)
[CL 20668639 by paul chipchase in ue5-main branch]
2022-06-15 08:56:57 -04:00
UE_LOG ( LogVirtualization , Error , TEXT ( " FWorkspaceDomainPayloadProvider was unable to find a payload with the identifier '%s' " ) ,
2022-05-04 08:58:50 -04:00
* LexToString ( Identifier ) ) ;
return FCompressedBuffer ( ) ;
}
TUniquePtr < FArchive > PackageAr = IPackageResourceManager : : Get ( ) . OpenReadExternalResource ( EPackageExternalResource : : WorkspaceDomainFile , * Data - > PackageName ) ;
if ( ! PackageAr . IsValid ( ) )
{
Temp fix to prevent payloads that should be filtered out from virtualizing from being virtualized if they are already in persistent storage, which was causing unexpected packages to become virtualized.
#rb Per.Larsson
#rnx
#jira UE-156312
#preflight 62a99553943e7bb256fe174c
### Problem
- Firstly this fix is a hack and not intended for shipping 5.1, but the real fix is already planned work, which involves moving the filtering from the push process to being applied to each payload when the package is saved, just as the asset filtering is done at the moment.
-- Doing the proper fix will take a bit of time and testing so we need a quicker fix so that I can re-enable the virtualization process to unblock some workflows.
- All new code is wrapped in a single preprocessor define 'ENABLE_FILTERING_HACK' making the code easy to identify and remove.
- Submitting this hack will cause the priority of the real fix to be bumped in the backlog so that we can remove the hack asap.
- The problem code comes from a pass we do on the payloads to check if they are already in persistent storage or not. If the payload is then we just change it to be virtualized and don't attempt to push. In addition to fixing the filtering there is also a work item to investigate if we really want to keep this logic or if we want to remove it anyway. This might be looked into before I look into the filtering fix as an alternative way to remove the hack.
### Fix
- After we have queried the system to check which of the local payloads are already in persistent storage, BUT before we change them over to be virtualized we now run a new pass, checking which payloads should be filtered.
-- This is a bespoke code path expected only to be used here, so in this case it is not in the IVirtualizationSystem API but requires us to cast to FVirtualizationManager to gain access. This should stop anyone else from using the hack before it is removed.
- The filtering check mimics the filtering that would be run on a normal push operation.
- Once done we can check for any payload that would be filtered, and if so, set the status to not found. This prevents the payload from being changed to be virtualized and means it will be included in the normal push operation where it will be correctly filtered.
- None of this is the best way to do it, but it did seem like the easiest way to add the hack in isolation as no existing code was changed, so it should be simple to just 'delete' the hack when ready.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 20667712 via CL 20667730 via CL 20667733
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017)
[CL 20668639 by paul chipchase in ue5-main branch]
2022-06-15 08:56:57 -04:00
UE_LOG ( LogVirtualization , Error , TEXT ( " FWorkspaceDomainPayloadProvider was unable to open the package '%s' for reading " ) ,
2022-05-04 08:58:50 -04:00
* Data - > PackageName ) ;
return FCompressedBuffer ( ) ;
}
PackageAr - > Seek ( PackageAr - > TotalSize ( ) ) ;
FPackageTrailer Trailer ;
if ( ! Trailer . TryLoadBackwards ( * PackageAr ) )
{
Temp fix to prevent payloads that should be filtered out from virtualizing from being virtualized if they are already in persistent storage, which was causing unexpected packages to become virtualized.
#rb Per.Larsson
#rnx
#jira UE-156312
#preflight 62a99553943e7bb256fe174c
### Problem
- Firstly this fix is a hack and not intended for shipping 5.1, but the real fix is already planned work, which involves moving the filtering from the push process to being applied to each payload when the package is saved, just as the asset filtering is done at the moment.
-- Doing the proper fix will take a bit of time and testing so we need a quicker fix so that I can re-enable the virtualization process to unblock some workflows.
- All new code is wrapped in a single preprocessor define 'ENABLE_FILTERING_HACK' making the code easy to identify and remove.
- Submitting this hack will cause the priority of the real fix to be bumped in the backlog so that we can remove the hack asap.
- The problem code comes from a pass we do on the payloads to check if they are already in persistent storage or not. If the payload is then we just change it to be virtualized and don't attempt to push. In addition to fixing the filtering there is also a work item to investigate if we really want to keep this logic or if we want to remove it anyway. This might be looked into before I look into the filtering fix as an alternative way to remove the hack.
### Fix
- After we have queried the system to check which of the local payloads are already in persistent storage, BUT before we change them over to be virtualized we now run a new pass, checking which payloads should be filtered.
-- This is a bespoke code path expected only to be used here, so in this case it is not in the IVirtualizationSystem API but requires us to cast to FVirtualizationManager to gain access. This should stop anyone else from using the hack before it is removed.
- The filtering check mimics the filtering that would be run on a normal push operation.
- Once done we can check for any payload that would be filtered, and if so, set the status to not found. This prevents the payload from being changed to be virtualized and means it will be included in the normal push operation where it will be correctly filtered.
- None of this is the best way to do it, but it did seem like the easiest way to add the hack in isolation as no existing code was changed, so it should be simple to just 'delete' the hack when ready.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 20667712 via CL 20667730 via CL 20667733
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017)
[CL 20668639 by paul chipchase in ue5-main branch]
2022-06-15 08:56:57 -04:00
UE_LOG ( LogVirtualization , Error , TEXT ( " FWorkspaceDomainPayloadProvider failed to load the package trailer from the package '%s' " ) ,
2022-05-04 08:58:50 -04:00
* Data - > PackageName ) ;
return FCompressedBuffer ( ) ;
}
FCompressedBuffer Payload = Trailer . LoadLocalPayload ( Identifier , * PackageAr ) ;
if ( ! Payload )
{
Temp fix to prevent payloads that should be filtered out from virtualizing from being virtualized if they are already in persistent storage, which was causing unexpected packages to become virtualized.
#rb Per.Larsson
#rnx
#jira UE-156312
#preflight 62a99553943e7bb256fe174c
### Problem
- Firstly this fix is a hack and not intended for shipping 5.1, but the real fix is already planned work, which involves moving the filtering from the push process to being applied to each payload when the package is saved, just as the asset filtering is done at the moment.
-- Doing the proper fix will take a bit of time and testing so we need a quicker fix so that I can re-enable the virtualization process to unblock some workflows.
- All new code is wrapped in a single preprocessor define 'ENABLE_FILTERING_HACK' making the code easy to identify and remove.
- Submitting this hack will cause the priority of the real fix to be bumped in the backlog so that we can remove the hack asap.
- The problem code comes from a pass we do on the payloads to check if they are already in persistent storage or not. If the payload is then we just change it to be virtualized and don't attempt to push. In addition to fixing the filtering there is also a work item to investigate if we really want to keep this logic or if we want to remove it anyway. This might be looked into before I look into the filtering fix as an alternative way to remove the hack.
### Fix
- After we have queried the system to check which of the local payloads are already in persistent storage, BUT before we change them over to be virtualized we now run a new pass, checking which payloads should be filtered.
-- This is a bespoke code path expected only to be used here, so in this case it is not in the IVirtualizationSystem API but requires us to cast to FVirtualizationManager to gain access. This should stop anyone else from using the hack before it is removed.
- The filtering check mimics the filtering that would be run on a normal push operation.
- Once done we can check for any payload that would be filtered, and if so, set the status to not found. This prevents the payload from being changed to be virtualized and means it will be included in the normal push operation where it will be correctly filtered.
- None of this is the best way to do it, but it did seem like the easiest way to add the hack in isolation as no existing code was changed, so it should be simple to just 'delete' the hack when ready.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 20667712 via CL 20667730 via CL 20667733
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017)
[CL 20668639 by paul chipchase in ue5-main branch]
2022-06-15 08:56:57 -04:00
UE_LOG ( LogVirtualization , Error , TEXT ( " FWorkspaceDomainPayloadProvider was uanble to load the payload '%s' from the package '%s' " ) ,
2022-05-04 08:58:50 -04:00
* LexToString ( Identifier ) ,
* Data - > PackageName ) ;
return FCompressedBuffer ( ) ;
}
if ( Identifier ! = FIoHash ( Payload . GetRawHash ( ) ) )
{
Temp fix to prevent payloads that should be filtered out from virtualizing from being virtualized if they are already in persistent storage, which was causing unexpected packages to become virtualized.
#rb Per.Larsson
#rnx
#jira UE-156312
#preflight 62a99553943e7bb256fe174c
### Problem
- Firstly this fix is a hack and not intended for shipping 5.1, but the real fix is already planned work, which involves moving the filtering from the push process to being applied to each payload when the package is saved, just as the asset filtering is done at the moment.
-- Doing the proper fix will take a bit of time and testing so we need a quicker fix so that I can re-enable the virtualization process to unblock some workflows.
- All new code is wrapped in a single preprocessor define 'ENABLE_FILTERING_HACK' making the code easy to identify and remove.
- Submitting this hack will cause the priority of the real fix to be bumped in the backlog so that we can remove the hack asap.
- The problem code comes from a pass we do on the payloads to check if they are already in persistent storage or not. If the payload is then we just change it to be virtualized and don't attempt to push. In addition to fixing the filtering there is also a work item to investigate if we really want to keep this logic or if we want to remove it anyway. This might be looked into before I look into the filtering fix as an alternative way to remove the hack.
### Fix
- After we have queried the system to check which of the local payloads are already in persistent storage, BUT before we change them over to be virtualized we now run a new pass, checking which payloads should be filtered.
-- This is a bespoke code path expected only to be used here, so in this case it is not in the IVirtualizationSystem API but requires us to cast to FVirtualizationManager to gain access. This should stop anyone else from using the hack before it is removed.
- The filtering check mimics the filtering that would be run on a normal push operation.
- Once done we can check for any payload that would be filtered, and if so, set the status to not found. This prevents the payload from being changed to be virtualized and means it will be included in the normal push operation where it will be correctly filtered.
- None of this is the best way to do it, but it did seem like the easiest way to add the hack in isolation as no existing code was changed, so it should be simple to just 'delete' the hack when ready.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 20667712 via CL 20667730 via CL 20667733
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017)
[CL 20668639 by paul chipchase in ue5-main branch]
2022-06-15 08:56:57 -04:00
UE_LOG ( LogVirtualization , Error , TEXT ( " FWorkspaceDomainPayloadProvider loaded an incorrect payload from the package '%s'. Expected '%s' Loaded '%s' " ) ,
2022-05-04 08:58:50 -04:00
* Data - > PackageName ,
* LexToString ( Identifier ) ,
* LexToString ( Payload . GetRawHash ( ) ) ) ;
return FCompressedBuffer ( ) ;
}
return Payload ;
}
virtual uint64 GetPayloadSize ( const FIoHash & Identifier ) override
{
if ( Identifier . IsZero ( ) )
{
return 0 ;
}
const FPayloadData * Data = PayloadLookupTable . Find ( Identifier ) ;
if ( Data ! = nullptr )
{
return Data - > SizeOnDisk ;
}
else
{
Temp fix to prevent payloads that should be filtered out from virtualizing from being virtualized if they are already in persistent storage, which was causing unexpected packages to become virtualized.
#rb Per.Larsson
#rnx
#jira UE-156312
#preflight 62a99553943e7bb256fe174c
### Problem
- Firstly this fix is a hack and not intended for shipping 5.1, but the real fix is already planned work, which involves moving the filtering from the push process to being applied to each payload when the package is saved, just as the asset filtering is done at the moment.
-- Doing the proper fix will take a bit of time and testing so we need a quicker fix so that I can re-enable the virtualization process to unblock some workflows.
- All new code is wrapped in a single preprocessor define 'ENABLE_FILTERING_HACK' making the code easy to identify and remove.
- Submitting this hack will cause the priority of the real fix to be bumped in the backlog so that we can remove the hack asap.
- The problem code comes from a pass we do on the payloads to check if they are already in persistent storage or not. If the payload is then we just change it to be virtualized and don't attempt to push. In addition to fixing the filtering there is also a work item to investigate if we really want to keep this logic or if we want to remove it anyway. This might be looked into before I look into the filtering fix as an alternative way to remove the hack.
### Fix
- After we have queried the system to check which of the local payloads are already in persistent storage, BUT before we change them over to be virtualized we now run a new pass, checking which payloads should be filtered.
-- This is a bespoke code path expected only to be used here, so in this case it is not in the IVirtualizationSystem API but requires us to cast to FVirtualizationManager to gain access. This should stop anyone else from using the hack before it is removed.
- The filtering check mimics the filtering that would be run on a normal push operation.
- Once done we can check for any payload that would be filtered, and if so, set the status to not found. This prevents the payload from being changed to be virtualized and means it will be included in the normal push operation where it will be correctly filtered.
- None of this is the best way to do it, but it did seem like the easiest way to add the hack in isolation as no existing code was changed, so it should be simple to just 'delete' the hack when ready.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 20667712 via CL 20667730 via CL 20667733
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017)
[CL 20668639 by paul chipchase in ue5-main branch]
2022-06-15 08:56:57 -04:00
UE_LOG ( LogVirtualization , Error , TEXT ( " FWorkspaceDomainPayloadProvider was unable to find a payload with the identifier '%s' " ) ,
2022-05-04 08:58:50 -04:00
* LexToString ( Identifier ) ) ;
return 0 ;
}
}
/* This structure holds additional info about the payload that we might need later */
struct FPayloadData
{
FPayloadData ( uint64 InSizeOnDisk , const FString & InPackageName )
: SizeOnDisk ( InSizeOnDisk )
, PackageName ( InPackageName )
{
}
uint64 SizeOnDisk ;
FString PackageName ;
} ;
TMap < FIoHash , FPayloadData > PayloadLookupTable ;
} ;
2022-07-22 03:58:27 -04:00
void VirtualizePackages ( const TArray < FString > & FilesToSubmit , TArray < FText > & OutErrors )
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
{
2022-03-23 11:14:47 -04:00
TRACE_CPUPROFILER_EVENT_SCOPE ( UE : : Virtualization : : VirtualizePackages ) ;
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
IVirtualizationSystem & System = IVirtualizationSystem : : Get ( ) ;
2021-11-30 02:37:48 -05:00
const double StartTime = FPlatformTime : : Seconds ( ) ;
2022-02-16 07:51:39 -05:00
FScopedSlowTask Progress ( 5.0f , LOCTEXT ( " Virtualization_Task " , " Virtualizing Assets... " ) ) ;
Progress . MakeDialog ( ) ;
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
// Other systems may have added errors to this array, we need to check so later we can determine if this function added any additional errors.
2022-03-23 11:14:47 -04:00
const int32 NumErrors = OutErrors . Num ( ) ;
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
2021-12-01 11:13:31 -05:00
struct FPackageInfo
{
FPackagePath Path ;
FPackageTrailer Trailer ;
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
2022-02-02 02:21:24 -05:00
TArray < FIoHash > LocalPayloads ;
2022-08-15 10:17:46 -04:00
/** Index where the FPushRequest for this package can be found */
2021-12-01 11:13:31 -05:00
int32 PayloadIndex = INDEX_NONE ;
bool bWasTrailerUpdated = false ;
} ;
2021-12-08 02:19:42 -05:00
UE_LOG ( LogVirtualization , Display , TEXT ( " Considering %d file(s) for virtualization " ) , FilesToSubmit . Num ( ) ) ;
2021-12-01 11:13:31 -05:00
TArray < FPackageInfo > Packages ;
Packages . Reserve ( FilesToSubmit . Num ( ) ) ;
2022-02-16 07:51:39 -05:00
Progress . EnterProgressFrame ( 1.0f ) ;
2021-12-01 11:13:31 -05:00
// From the list of files to submit we need to find all of the valid packages that contain
// local payloads that need to be virtualized.
2022-06-27 08:56:51 -04:00
int64 TotalPackagesFound = 0 ;
int64 TotalPackageTrailersFound = 0 ;
2022-08-15 10:17:46 -04:00
int64 TotalPayloadsToVirtualize = 0 ;
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
for ( const FString & AbsoluteFilePath : FilesToSubmit )
{
2021-12-01 11:13:31 -05:00
FPackagePath PackagePath = FPackagePath : : FromLocalPath ( AbsoluteFilePath ) ;
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
2022-01-18 05:41:44 -05:00
// TODO: How to handle text packages?
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
if ( FPackageName : : IsPackageExtension ( PackagePath . GetHeaderExtension ( ) ) | | FPackageName : : IsTextPackageExtension ( PackagePath . GetHeaderExtension ( ) ) )
{
2022-06-27 08:56:51 -04:00
TotalPackagesFound + + ;
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
FPackageTrailer Trailer ;
if ( FPackageTrailer : : TryLoadFromPackage ( PackagePath , Trailer ) )
{
2022-06-27 08:56:51 -04:00
TotalPackageTrailersFound + + ;
2022-01-18 05:41:44 -05:00
// The following is not expected to ever happen, currently we give a user facing error but it generally means that the asset is broken somehow.
2022-05-17 07:54:28 -04:00
ensureMsgf ( Trailer . GetNumPayloads ( EPayloadStorageType : : Referenced ) = = 0 , TEXT ( " Trying to virtualize a package that already contains payload references which the workspace file should not ever contain! " ) ) ;
if ( Trailer . GetNumPayloads ( EPayloadStorageType : : Referenced ) > 0 )
2022-01-18 05:41:44 -05:00
{
2022-08-15 10:17:46 -04:00
FText Message = FText : : Format ( LOCTEXT ( " Virtualization_PkgHasReferences " , " Cannot virtualize the package '{1}' as it has referenced payloads in the trailer " ) ,
FText : : FromString ( PackagePath . GetDebugName ( ) ) ) ;
2022-03-23 11:14:47 -04:00
OutErrors . Add ( Message ) ;
2022-01-18 05:41:44 -05:00
return ;
}
2021-12-01 11:13:31 -05:00
FPackageInfo PkgInfo ;
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
2021-12-01 11:13:31 -05:00
PkgInfo . Path = MoveTemp ( PackagePath ) ;
PkgInfo . Trailer = MoveTemp ( Trailer ) ;
2022-05-17 07:54:28 -04:00
PkgInfo . LocalPayloads = PkgInfo . Trailer . GetPayloads ( EPayloadFilter : : CanVirtualize ) ;
2021-12-01 11:13:31 -05:00
if ( ! PkgInfo . LocalPayloads . IsEmpty ( ) )
2022-08-15 10:17:46 -04:00
{
TotalPayloadsToVirtualize + = PkgInfo . LocalPayloads . Num ( ) ;
Temp fix to prevent payloads that should be filtered out from virtualizing from being virtualized if they are already in persistent storage, which was causing unexpected packages to become virtualized.
#rb Per.Larsson
#rnx
#jira UE-156312
#preflight 62a99553943e7bb256fe174c
### Problem
- Firstly this fix is a hack and not intended for shipping 5.1, but the real fix is already planned work, which involves moving the filtering from the push process to being applied to each payload when the package is saved, just as the asset filtering is done at the moment.
-- Doing the proper fix will take a bit of time and testing so we need a quicker fix so that I can re-enable the virtualization process to unblock some workflows.
- All new code is wrapped in a single preprocessor define 'ENABLE_FILTERING_HACK' making the code easy to identify and remove.
- Submitting this hack will cause the priority of the real fix to be bumped in the backlog so that we can remove the hack asap.
- The problem code comes from a pass we do on the payloads to check if they are already in persistent storage or not. If the payload is then we just change it to be virtualized and don't attempt to push. In addition to fixing the filtering there is also a work item to investigate if we really want to keep this logic or if we want to remove it anyway. This might be looked into before I look into the filtering fix as an alternative way to remove the hack.
### Fix
- After we have queried the system to check which of the local payloads are already in persistent storage, BUT before we change them over to be virtualized we now run a new pass, checking which payloads should be filtered.
-- This is a bespoke code path expected only to be used here, so in this case it is not in the IVirtualizationSystem API but requires us to cast to FVirtualizationManager to gain access. This should stop anyone else from using the hack before it is removed.
- The filtering check mimics the filtering that would be run on a normal push operation.
- Once done we can check for any payload that would be filtered, and if so, set the status to not found. This prevents the payload from being changed to be virtualized and means it will be included in the normal push operation where it will be correctly filtered.
- None of this is the best way to do it, but it did seem like the easiest way to add the hack in isolation as no existing code was changed, so it should be simple to just 'delete' the hack when ready.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 20667712 via CL 20667730 via CL 20667733
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017)
[CL 20668639 by paul chipchase in ue5-main branch]
2022-06-15 08:56:57 -04:00
2022-08-15 10:17:46 -04:00
Packages . Emplace ( MoveTemp ( PkgInfo ) ) ;
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
}
}
}
}
Add a rehydration command to the stand alone virtualization tool, making it easier to reverse the effects of asset virtualization. Unlike previous processes, this one does not require that we load the package and will just manipulate the data storaged in the package trailer.
#rb Sebastian.Nordgren
#rnx
#jira UE-156436
#preflight 62c287f9a3568e30664eb94f
### VA Standalone Tool
- We now plan to add much more functionality to the tool than just virtualizing and submitting changelists, so to make this easier I am moving the tool towards a design where it should be fairly easy to add new functionality.
- Added FCommand, which is a base class for adding new functionality, simple derive from FCommand and hook it up at the appropriate locations.
-- In the future it should be possible for new command types to automatically register themselves to be initiated from the command line. There should be no need to edit UnrealVirtualizationToolApp to add a new command but this will be done as an additional work item.
-- At the moment FCommand comes with a number of utility methods to call that cover some common source control commands.
-- The original functionality has not yet been moved to the command system and so the code is a little bit weird at the moment. Updating older code to the new system will be done as an additional work item.
- FProject/FPlugin have been moved to their own code files.
### Rehydrate Command
- The rehydrate command will take a number of packages, check them out of source control and then attempt to virtualize them.
- At the moment the chekout logic is fairly basic, we just check out every package supplied, we don't check if the package is virtualized or not yet. This can be improved in additional work items. Ideally by the end of command the only packages that we have checked out should also be rehydrated.
- At the moment the command can either take a path of a specific package, a path of a directory to find packages in, or a changelist containing packages that should be rehydrated.
- A cleint spec (workspace) can optionally be provided, but if not supplied we will attempt to find a client spec for which to check out the packages.
- Currently we will check out the packages to the default change list.
### Rehydrate process
- Added the rehydration process in it's own code files in the virtualization module. Like the virtualization process this is exposed in a public header file and no via the Core interface which means it is very specific to our module/implementation.
- The process expects that the caller will have checked out any required packages from source control. It will treat being unable to update a package file as an error.
- Added PackageUtils.h/.cpp and moved some of the generic code from the virtualization process code there so that it can be shared by the rehydration process.
### Misc
Moving away from the using things like FPackagePath as that requires that the correct mount points have been registered for a project and at the moment (with the flakiness of FConfig*) it seems that the best idea would be to prefer absolute file paths where possible.
[CL 20982284 by paul chipchase in ue5-main branch]
2022-07-07 06:54:33 -04:00
UE_LOG ( LogVirtualization , Display , TEXT ( " Found % " INT64_FMT " package(s), % " INT64_FMT " of which had payload trailers " ) , TotalPackagesFound , TotalPackageTrailersFound ) ;
2022-08-15 10:17:46 -04:00
// TODO: Currently not all of the filtering is done as package save time, so some of the local payloads may not get virtualized.
// When/if we move all filtering to package save we can change this log message to state that the local payloads *will* be virtualized.
UE_LOG ( LogVirtualization , Display , TEXT ( " Found % " INT64_FMT " locally stored payload(s) in %d package(s) that maybe need to be virtualized " ) , TotalPayloadsToVirtualize , Packages . Num ( ) ) ;
2021-12-01 11:13:31 -05:00
2022-02-16 07:51:39 -05:00
Progress . EnterProgressFrame ( 1.0f ) ;
2022-08-15 10:17:46 -04:00
// TODO Optimization: We might want to check for duplicate payloads and remove them at this point
2022-02-16 07:51:39 -05:00
2022-05-04 08:58:50 -04:00
// Build up the info in the payload provider and the final array of payload push requests
FWorkspaceDomainPayloadProvider PayloadProvider ;
2021-12-08 02:19:42 -05:00
TArray < Virtualization : : FPushRequest > PayloadsToSubmit ;
PayloadsToSubmit . Reserve ( TotalPayloadsToVirtualize ) ;
2021-12-01 11:13:31 -05:00
for ( FPackageInfo & PackageInfo : Packages )
{
2022-08-15 10:17:46 -04:00
check ( ! PackageInfo . LocalPayloads . IsEmpty ( ) ) ;
2021-12-01 11:13:31 -05:00
2021-12-08 02:19:42 -05:00
PackageInfo . PayloadIndex = PayloadsToSubmit . Num ( ) ;
2022-02-02 02:21:24 -05:00
for ( const FIoHash & PayloadId : PackageInfo . LocalPayloads )
2021-12-01 11:13:31 -05:00
{
2022-05-04 08:58:50 -04:00
const uint64 SizeOnDisk = PackageInfo . Trailer . FindPayloadSizeOnDisk ( PayloadId ) ;
2021-12-01 11:13:31 -05:00
2022-05-04 08:58:50 -04:00
PayloadProvider . RegisterPayload ( PayloadId , SizeOnDisk , PackageInfo . Path . GetPackageName ( ) ) ;
PayloadsToSubmit . Emplace ( PayloadId , PayloadProvider , PackageInfo . Path . GetPackageName ( ) ) ;
2021-12-08 02:19:42 -05:00
}
}
2022-08-15 10:17:46 -04:00
// TODO: We should be able to do both Cache and Persistent pushes in the same call
2021-12-08 02:19:42 -05:00
2022-08-15 10:17:46 -04:00
// Push payloads to cache storage
2022-01-28 07:55:19 -05:00
{
2022-08-15 10:17:46 -04:00
Progress . EnterProgressFrame ( 1.0f ) ;
if ( ! System . PushData ( PayloadsToSubmit , EStorageType : : Cache ) )
{
// Caching is not critical to the process so we only warn if it fails
UE_LOG ( LogVirtualization , Warning , TEXT ( " Failed to push to EStorageType::Cache storage " ) ) ;
}
int64 TotalPayloadsCached = 0 ;
for ( Virtualization : : FPushRequest & Request : PayloadsToSubmit )
{
2022-09-30 15:47:04 -04:00
TotalPayloadsCached + = Request . GetResult ( ) . WasPushed ( ) ? 1 : 0 ;
2022-08-15 10:17:46 -04:00
// TODO: This really shouldn't be required, fix when we allow both pushes to be done in the same call
// Reset the status for the persistent storage push
2022-09-30 15:47:04 -04:00
Request . ResetResult ( ) ;
2022-08-15 10:17:46 -04:00
}
UE_LOG ( LogVirtualization , Display , TEXT ( " Pushed % " INT64_FMT " payload(s) to cached storage " ) , TotalPayloadsCached ) ;
}
// Push payloads to persistent storage
{
Progress . EnterProgressFrame ( 1.0f ) ;
if ( ! System . PushData ( PayloadsToSubmit , EStorageType : : Persistent ) )
{
FText Message = LOCTEXT ( " Virtualization_PushFailure " , " Failed to push payloads " ) ;
OutErrors . Add ( Message ) ;
return ;
}
int64 TotalPayloadsVirtualized = 0 ;
for ( const Virtualization : : FPushRequest & Request : PayloadsToSubmit )
{
2022-09-30 15:47:04 -04:00
TotalPayloadsVirtualized + = Request . GetResult ( ) . WasPushed ( ) ? 1 : 0 ;
2022-08-15 10:17:46 -04:00
}
UE_LOG ( LogVirtualization , Display , TEXT ( " Pushed % " INT64_FMT " payload(s) to EStorageType::Persistent storage " ) , TotalPayloadsVirtualized ) ;
2022-01-28 07:55:19 -05:00
}
2021-12-08 02:19:42 -05:00
// Update the package info for the submitted payloads
for ( FPackageInfo & PackageInfo : Packages )
{
for ( int32 Index = 0 ; Index < PackageInfo . LocalPayloads . Num ( ) ; + + Index )
{
const Virtualization : : FPushRequest & Request = PayloadsToSubmit [ PackageInfo . PayloadIndex + Index ] ;
2022-05-04 08:58:50 -04:00
check ( Request . GetIdentifier ( ) = = PackageInfo . LocalPayloads [ Index ] ) ;
2021-12-08 02:19:42 -05:00
2022-09-30 15:47:04 -04:00
if ( Request . GetResult ( ) . IsVirtualized ( ) )
2021-12-01 11:13:31 -05:00
{
2022-05-04 08:58:50 -04:00
if ( PackageInfo . Trailer . UpdatePayloadAsVirtualized ( Request . GetIdentifier ( ) ) )
2021-12-08 02:19:42 -05:00
{
PackageInfo . bWasTrailerUpdated = true ;
}
else
2022-02-24 19:58:44 -05:00
{
2021-12-08 02:19:42 -05:00
FText Message = FText : : Format ( LOCTEXT ( " Virtualization_UpdateStatusFailed " , " Unable to update the status for the payload '{0}' in the package '{1}' " ) ,
2022-05-04 08:58:50 -04:00
FText : : FromString ( LexToString ( Request . GetIdentifier ( ) ) ) ,
2021-12-08 02:19:42 -05:00
FText : : FromString ( PackageInfo . Path . GetDebugName ( ) ) ) ;
2022-03-23 11:14:47 -04:00
OutErrors . Add ( Message ) ;
2021-12-08 02:19:42 -05:00
return ;
}
2021-12-01 11:13:31 -05:00
}
}
}
2022-02-16 07:51:39 -05:00
Progress . EnterProgressFrame ( 1.0f ) ;
2021-12-01 11:13:31 -05:00
TArray < TPair < FPackagePath , FString > > PackagesToReplace ;
// Any package with an updated trailer needs to be copied and an updated trailer appended
for ( FPackageInfo & PackageInfo : Packages )
{
if ( ! PackageInfo . bWasTrailerUpdated )
{
continue ;
}
const FPackagePath & PackagePath = PackageInfo . Path ; // No need to validate path, we checked this earlier
Add a rehydration command to the stand alone virtualization tool, making it easier to reverse the effects of asset virtualization. Unlike previous processes, this one does not require that we load the package and will just manipulate the data storaged in the package trailer.
#rb Sebastian.Nordgren
#rnx
#jira UE-156436
#preflight 62c287f9a3568e30664eb94f
### VA Standalone Tool
- We now plan to add much more functionality to the tool than just virtualizing and submitting changelists, so to make this easier I am moving the tool towards a design where it should be fairly easy to add new functionality.
- Added FCommand, which is a base class for adding new functionality, simple derive from FCommand and hook it up at the appropriate locations.
-- In the future it should be possible for new command types to automatically register themselves to be initiated from the command line. There should be no need to edit UnrealVirtualizationToolApp to add a new command but this will be done as an additional work item.
-- At the moment FCommand comes with a number of utility methods to call that cover some common source control commands.
-- The original functionality has not yet been moved to the command system and so the code is a little bit weird at the moment. Updating older code to the new system will be done as an additional work item.
- FProject/FPlugin have been moved to their own code files.
### Rehydrate Command
- The rehydrate command will take a number of packages, check them out of source control and then attempt to virtualize them.
- At the moment the chekout logic is fairly basic, we just check out every package supplied, we don't check if the package is virtualized or not yet. This can be improved in additional work items. Ideally by the end of command the only packages that we have checked out should also be rehydrated.
- At the moment the command can either take a path of a specific package, a path of a directory to find packages in, or a changelist containing packages that should be rehydrated.
- A cleint spec (workspace) can optionally be provided, but if not supplied we will attempt to find a client spec for which to check out the packages.
- Currently we will check out the packages to the default change list.
### Rehydrate process
- Added the rehydration process in it's own code files in the virtualization module. Like the virtualization process this is exposed in a public header file and no via the Core interface which means it is very specific to our module/implementation.
- The process expects that the caller will have checked out any required packages from source control. It will treat being unable to update a package file as an error.
- Added PackageUtils.h/.cpp and moved some of the generic code from the virtualization process code there so that it can be shared by the rehydration process.
### Misc
Moving away from the using things like FPackagePath as that requires that the correct mount points have been registered for a project and at the moment (with the flakiness of FConfig*) it seems that the best idea would be to prefer absolute file paths where possible.
[CL 20982284 by paul chipchase in ue5-main branch]
2022-07-07 06:54:33 -04:00
FString NewPackagePath = DuplicatePackageWithUpdatedTrailer ( PackagePath . GetLocalFullPath ( ) , PackageInfo . Trailer , OutErrors ) ;
2021-12-01 11:13:31 -05:00
Add a rehydration command to the stand alone virtualization tool, making it easier to reverse the effects of asset virtualization. Unlike previous processes, this one does not require that we load the package and will just manipulate the data storaged in the package trailer.
#rb Sebastian.Nordgren
#rnx
#jira UE-156436
#preflight 62c287f9a3568e30664eb94f
### VA Standalone Tool
- We now plan to add much more functionality to the tool than just virtualizing and submitting changelists, so to make this easier I am moving the tool towards a design where it should be fairly easy to add new functionality.
- Added FCommand, which is a base class for adding new functionality, simple derive from FCommand and hook it up at the appropriate locations.
-- In the future it should be possible for new command types to automatically register themselves to be initiated from the command line. There should be no need to edit UnrealVirtualizationToolApp to add a new command but this will be done as an additional work item.
-- At the moment FCommand comes with a number of utility methods to call that cover some common source control commands.
-- The original functionality has not yet been moved to the command system and so the code is a little bit weird at the moment. Updating older code to the new system will be done as an additional work item.
- FProject/FPlugin have been moved to their own code files.
### Rehydrate Command
- The rehydrate command will take a number of packages, check them out of source control and then attempt to virtualize them.
- At the moment the chekout logic is fairly basic, we just check out every package supplied, we don't check if the package is virtualized or not yet. This can be improved in additional work items. Ideally by the end of command the only packages that we have checked out should also be rehydrated.
- At the moment the command can either take a path of a specific package, a path of a directory to find packages in, or a changelist containing packages that should be rehydrated.
- A cleint spec (workspace) can optionally be provided, but if not supplied we will attempt to find a client spec for which to check out the packages.
- Currently we will check out the packages to the default change list.
### Rehydrate process
- Added the rehydration process in it's own code files in the virtualization module. Like the virtualization process this is exposed in a public header file and no via the Core interface which means it is very specific to our module/implementation.
- The process expects that the caller will have checked out any required packages from source control. It will treat being unable to update a package file as an error.
- Added PackageUtils.h/.cpp and moved some of the generic code from the virtualization process code there so that it can be shared by the rehydration process.
### Misc
Moving away from the using things like FPackagePath as that requires that the correct mount points have been registered for a project and at the moment (with the flakiness of FConfig*) it seems that the best idea would be to prefer absolute file paths where possible.
[CL 20982284 by paul chipchase in ue5-main branch]
2022-07-07 06:54:33 -04:00
if ( ! NewPackagePath . IsEmpty ( ) )
2021-12-01 11:13:31 -05:00
{
Add a rehydration command to the stand alone virtualization tool, making it easier to reverse the effects of asset virtualization. Unlike previous processes, this one does not require that we load the package and will just manipulate the data storaged in the package trailer.
#rb Sebastian.Nordgren
#rnx
#jira UE-156436
#preflight 62c287f9a3568e30664eb94f
### VA Standalone Tool
- We now plan to add much more functionality to the tool than just virtualizing and submitting changelists, so to make this easier I am moving the tool towards a design where it should be fairly easy to add new functionality.
- Added FCommand, which is a base class for adding new functionality, simple derive from FCommand and hook it up at the appropriate locations.
-- In the future it should be possible for new command types to automatically register themselves to be initiated from the command line. There should be no need to edit UnrealVirtualizationToolApp to add a new command but this will be done as an additional work item.
-- At the moment FCommand comes with a number of utility methods to call that cover some common source control commands.
-- The original functionality has not yet been moved to the command system and so the code is a little bit weird at the moment. Updating older code to the new system will be done as an additional work item.
- FProject/FPlugin have been moved to their own code files.
### Rehydrate Command
- The rehydrate command will take a number of packages, check them out of source control and then attempt to virtualize them.
- At the moment the chekout logic is fairly basic, we just check out every package supplied, we don't check if the package is virtualized or not yet. This can be improved in additional work items. Ideally by the end of command the only packages that we have checked out should also be rehydrated.
- At the moment the command can either take a path of a specific package, a path of a directory to find packages in, or a changelist containing packages that should be rehydrated.
- A cleint spec (workspace) can optionally be provided, but if not supplied we will attempt to find a client spec for which to check out the packages.
- Currently we will check out the packages to the default change list.
### Rehydrate process
- Added the rehydration process in it's own code files in the virtualization module. Like the virtualization process this is exposed in a public header file and no via the Core interface which means it is very specific to our module/implementation.
- The process expects that the caller will have checked out any required packages from source control. It will treat being unable to update a package file as an error.
- Added PackageUtils.h/.cpp and moved some of the generic code from the virtualization process code there so that it can be shared by the rehydration process.
### Misc
Moving away from the using things like FPackagePath as that requires that the correct mount points have been registered for a project and at the moment (with the flakiness of FConfig*) it seems that the best idea would be to prefer absolute file paths where possible.
[CL 20982284 by paul chipchase in ue5-main branch]
2022-07-07 06:54:33 -04:00
// Now that we have successfully created a new version of the package with an updated trailer
// we need to mark that it should replace the original package.
PackagesToReplace . Emplace ( PackagePath , MoveTemp ( NewPackagePath ) ) ;
}
else
2021-12-01 11:13:31 -05:00
{
return ;
}
}
2021-12-08 02:19:42 -05:00
UE_LOG ( LogVirtualization , Display , TEXT ( " %d package(s) had their trailer container modified and need to be updated " ) , PackagesToReplace . Num ( ) ) ;
2021-12-01 11:13:31 -05:00
2022-03-23 11:14:47 -04:00
if ( NumErrors = = OutErrors . Num ( ) )
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
{
// TODO: Consider using the SavePackage model (move the original, then replace, so we can restore all of the original packages if needed)
// having said that, once a package is in PackagesToReplace it should still be safe to submit so maybe we don't need this level of protection?
2021-11-30 02:37:48 -05:00
// We need to reset the loader of any package that we want to re-save over
2022-05-16 06:10:12 -04:00
for ( int32 Index = 0 ; Index < PackagesToReplace . Num ( ) ; + + Index )
2021-11-30 02:37:48 -05:00
{
2022-05-16 06:10:12 -04:00
const TPair < FPackagePath , FString > & Pair = PackagesToReplace [ Index ] ;
UPackage * Package = FindObjectFast < UPackage > ( nullptr , Pair . Key . GetPackageFName ( ) ) ;
2021-11-30 02:37:48 -05:00
if ( Package ! = nullptr )
{
2022-05-16 06:10:12 -04:00
UE_LOG ( LogVirtualization , Verbose , TEXT ( " Detaching '%s' from disk so that it can be virtualized " ) , * Pair . Key . GetDebugName ( ) ) ;
ResetLoadersForSave ( Package , * Pair . Key . GetLocalFullPath ( ) ) ;
}
if ( ! CanWriteToFile ( Pair . Key . GetLocalFullPath ( ) ) )
{
// Technically the package could have local payloads that won't be virtualized due to filtering or min payload sizes and so the
// following warning is misleading. This will be solved if we move that evaluation to the point of saving a package.
// If not then we probably need to extend QueryPayloadStatuses to test filtering etc as well, then check for potential package
// modification after that.
// Long term, the stand alone tool should be able to request the UnrealEditor relinquish the lock on the package file so this becomes
// less of a problem.
FText Message = FText : : Format ( LOCTEXT ( " Virtualization_PkgLocked " , " The package file '{0}' has local payloads but is locked for modification and cannot be virtualized, this package will be skipped! " ) ,
FText : : FromString ( Pair . Key . GetDebugName ( ) ) ) ;
UE_LOG ( LogVirtualization , Warning , TEXT ( " %s " ) , * Message . ToString ( ) ) ;
PackagesToReplace . RemoveAt ( Index - - ) ;
2021-11-30 02:37:48 -05:00
}
}
// Since we had no errors we can now replace all of the packages that were virtualized data with the virtualized replacement file.
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
for ( const TPair < FPackagePath , FString > & Iterator : PackagesToReplace )
{
const FString OriginalPackagePath = Iterator . Key . GetLocalFullPath ( ) ;
const FString & NewPackagePath = Iterator . Value ;
if ( ! IFileManager : : Get ( ) . Move ( * OriginalPackagePath , * NewPackagePath ) )
{
FText Message = FText : : Format ( LOCTEXT ( " Virtualization_MoveFailed " , " Unable to replace the package '{0}' with the virtualized version " ) ,
FText : : FromString ( Iterator . Key . GetDebugName ( ) ) ) ;
2022-03-23 11:14:47 -04:00
OutErrors . Add ( Message ) ;
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
continue ;
}
}
}
2021-12-08 02:19:42 -05:00
const double TimeInSeconds = FPlatformTime : : Seconds ( ) - StartTime ;
2021-11-30 02:37:48 -05:00
UE_LOG ( LogVirtualization , Verbose , TEXT ( " Virtualization pre submit check took %.3f(s) " ) , TimeInSeconds ) ;
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
}
2021-11-30 02:37:48 -05:00
2021-10-25 20:05:28 -04:00
} // namespace UE::Virtualization
# undef LOCTEXT_NAMESPACE
Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7
### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.
### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.
### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.
### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.
### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.
### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.
### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.
### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.
#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00