Commit Graph

25 Commits

Author SHA1 Message Date
paul chipchase
ccaea76256 Rename FPayloadStatus to EPayloadStatus to comply with the coding standards.
#rb Per.Larsson
#rnx
#preflight 62a9bff813004691f9830b8d

- I think when I was first adding it, the type was originally a struct but was changed to an enum before it was submitted, however I failed to update the name.

#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 20668490 via CL 20668521 via CL 20668532
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017)

[CL 20669764 by paul chipchase in ue5-main branch]
2022-06-15 10:37:52 -04:00
paul chipchase
c05158e633 Temp fix to prevent payloads that should be filtered out from virtualizing from being virtualized if they are already in persistent storage, which was causing unexpected packages to become virtualized.
#rb Per.Larsson
#rnx
#jira UE-156312
#preflight 62a99553943e7bb256fe174c

### Problem
- Firstly this fix is a hack and not intended for shipping 5.1, but the real fix is already planned work, which involves moving the filtering from the push process to being applied to each payload when the package is saved, just as the asset filtering is done at the moment.
-- Doing the proper fix will take a bit of time and testing so we need a quicker fix so that I can re-enable the virtualization process to unblock some workflows.
- All new code is wrapped in a single preprocessor define 'ENABLE_FILTERING_HACK' making the code easy to identify and remove.
- Submitting this hack will cause the priority of the real fix to be bumped in the backlog so that we can remove the hack asap.
- The problem code comes from a pass we do on the payloads to check if they are already in persistent storage or not. If the payload is then we just change it to be virtualized and don't attempt to push. In addition to fixing the filtering there is also a work item to investigate if we really want to keep this logic or if we want to remove it anyway. This might be looked into before I look into the filtering fix as an alternative way to remove the hack.

### Fix
- After we have queried the system to check which of the local payloads are already in persistent storage, BUT before we change them over to be virtualized we now run a new pass, checking which payloads should be filtered.
-- This is a bespoke code path expected only to be used here, so in this case it is not in the IVirtualizationSystem API but requires us to cast to FVirtualizationManager to gain access. This should stop anyone else from using the hack before it is removed.
- The filtering check mimics the filtering that would be run on a normal push operation.
- Once done we can check for any payload that would be filtered, and if so, set the status to not found. This prevents the payload from being changed to be virtualized and means it will be included in the normal push operation where it will be correctly filtered.
- None of this is the best way to do it, but it did seem like the easiest way to add the hack in isolation as no existing code was changed, so it should be simple to just 'delete' the hack when ready.

#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 20667712 via CL 20667730 via CL 20667733
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v955-20579017)

[CL 20668639 by paul chipchase in ue5-main branch]
2022-06-15 08:56:57 -04:00
paul chipchase
f85fad88ff It is now possible to disable virtualization for all payloads owned by a specific asset type by adding the name of the asset to [Core.ContentVirtualization]DisabledAsset string array in the engine ini file.
#rb Per.Larsson
#rnx
#jira UE-151377
#preflight 628364050039ea57a52d6989

### Virtualization
- [Core.ContentVirtualization] in the engine ini file now supports an array called 'DisabledAsset' which can be used to name asset types that should not virtualize their payloads.
-- By default (in BaseEngine.ini) we have disabled the StaticMesh asset as we know it will crash if a payload is missing and the SoundWave asset as it still is pending testing.
- This new way to disable virtualization is data driven. The older hard coded method has not been removed but will likely be reworked in a future submit.
- Now when an editor bulkdata is adding it's payload to the package trailer builder during package save it will poll the virtualization system with a call to the new method ::IsDisabledForObject by passing in it's owner.
-- If the owner is valid and was present in the 'DisabledAsset' array then the method will return true and the EPayloadFlags::DisableVirtualization flag will be applied.

### Package Trailer
- The pre-existing functionality of enum EPayloadFilter has been moved to a new enum EPayloadStorageType as will only filter payloads based on their storage type.
- EPayloadFilter has been modified to filter payloads based on functionality although at the moment the only thing it can filter for is to return payloads that can be virtualized, it is left for future expansion.
- EPayloadFlags has been reduced to a uint16 with the remaining 2bytes being turned into a new member EPayloadFilterReason.
- This new member allows us to record the exact reason why a payload is excluded from virtualization. If it is zero then the payload can virtualize, otherwise it will contain one or more reasons as to why it is being excluded. For this reason the enum is a bitfield.
- Added overloads of ::GetPayloads and ::GetNumPayloads that take EPayloadFilter rather than a EPayloadStorageType
- Added wrappers around all AccessMode types for FLookupTableEntry.
- FPackageTrailerBuilder has been extended to take a EPayloadFilterReason so that the caller can already provide a reason why the payload cannot be virtualized.
-- As a future peace of work this will probably be changed and we will ask the caller to pass in the owner UObject pointer instead and then we will process the filtering when building the package trailer to a) keep all of the filtering code in one place b) keep the filtering consistent

### PackageSubmissionChecks
- The virtualization process in  will now request the payloads that can be virtualized from the package trailer, which respects the new payload flag, rather than requesting all locally stored payloads.

### UObjects
- There is no need for the SoundWave or MeshDescription classes to opt out of virtualization on construction. This will be done when the package is saved and is now data driven rather than being hardcoded.

### DumpPackagePayloadInfo
- The command has been updated to also display the filter reasons applied to each payload

[CL 20240971 by paul chipchase in ue5-main branch]
2022-05-17 07:54:28 -04:00
paul chipchase
2f9d959940 Fix a bug where the virtualization process was failing to virtualize loaded packages when submitted from the editor
#rb trivial
#rnx
#preflight 62820fb0046b81bf93921c6b

- When adding the check to see if a package file could be written to, I tried to do this as early as possible to avoid virtualizing payloads for packages that can't be modified.
- This didn't take into account submits via the editor, when the package is loaded which means the editor will have the package file locked for edit.
- For now I have gone with the simple fix, we check if we can write to the package once we have called ResetLoader on the package. This means that we will virtualize the payloads then potentially skip removing them from the package file if another process has a file lock.
- This approach will minimize impact to the users for now but we should revisit this in the future to try and reduce the error scope further.

[CL 20221653 by paul chipchase in ue5-main branch]
2022-05-16 06:10:12 -04:00
paul chipchase
d3a5842177 We now warn users when trying to virtualize packages that are locked and cannot be modified rather than error
#rb trivial
#jira UE-143675
#rnx
#preflight 6274d9b7732d07cf93f55621

### Problem
- When using the stand alone virtualization tool, to virtualize packages directly from p4v it is possible that the editor will have locked the package files. This means we cannot modify it and remove the payload making the virtualization of the package file impossible.
- At some point in the future we will add the ability for the tool to communicate with the editor to ask that the lock be relinquished OR there is the possibility that we will be moving away from the editor permantly locking package files.
- As this problem will be eventually be solved we'd rather this problem not present to the end users as an error as this will cause them to view the tool as annoying and fiddly to use.
- So for now we will treat this problem as a warning and allow the end user to continue and submit the package non-virtualized.

### Fix
- At the moment once we detect a package has local payloads that might need virtualization we check to see if the file could be written to.
- This might raise some false positives if the local payloads are filtered out or otherwise not suppose to be virtualized.
- At the moment this sort of filtering is done when pushing the payloads meaning we could end up pushing payloads then being unable to modify the package.
- A future work item will be added to clean this up at some point in the future (depending on when we will start work to remove the problem entirely)

[CL 20073356 by paul chipchase in ue5-main branch]
2022-05-06 04:28:01 -04:00
paul chipchase
2e03df01a5 When virtualizing payloads we now load the payloads from disk on demand rather than all at once. This allows us to avoid memory spikes when virtualizing large amounts of data.
#rb Per.Larsson
#rnx
#jira UE-151000
#preflight 627271c83b2ed6f85363f53a

- In the old code when submitting packages for virtualization we would load all local payloads into memory then pass them to the virtualization system to be pushed to persistent storage. This works fine for an average users submits but when resaving large projects we can have submits with thousands of packages containing gigabytes of payloads.
- FPushRequest has been changed to accept either a payload directly (in the form of FCompressedBuffer) or accept a reference to a IPayloadProvider which will return the payload when asked.
- Most of the existing backends make no use of the batched pushing feature and only push one payload at a time. The default implementation of this will be to ask each request one at a time for the payload and then pass it to the backend for a single push. If the IPayloadProvider is being used we will therefor only ever have one payload loaded at a time avoiding memory spikes.
- The source control backend currently does implement the batched pushing feature. This backend writes each payload to disk before submitting them to perforce, so all we have to do is make sure that request one payload, write it to disk, then discard the payload to avoid memory spikes.
- This change required that FPushRequest make it's members private and provide accessors to them, as the caller shouldn't need to care if the payload is loaded or comes via the provider.
- NOTE that this implementation is less efficient if we have packages containing many different payloads as we will end up opening the package, parsing the trailer then load a payload many times over, where as the original code would load all payloads from the file in one go. In practice, given the overhead of the source control backend this didn't make a huge difference and is probably not worth the effort to optimize at this point.

[CL 20040609 by paul chipchase in ue5-main branch]
2022-05-04 08:58:50 -04:00
paul chipchase
b2be464942 Remove the check for FPackageTrailer::IsEnabled when virtualizing payloads. We don't care if the system is currently disabled, only what the package file itself contains.
#rb trivial
#rnx
#preflight 623c5912c3399da9533ffd30

- The package trailer will be on by default soon and the option to remove it disabled, so this check is a bit pointless anyway.

[CL 19493442 by paul chipchase in ue5-main branch]
2022-03-24 08:17:01 -04:00
paul chipchase
97fbde0880 Expose the package submission checks we run when submitting changelists from within the editor so that other systems can specifically invoke it in addition to running it when ever a source control module callback is invoked.
#rb PJ.Kack
#rnx
#preflight 623ae40e7b69b01ec15da75c
#robomerge FNNC

- We need to expose the code that checks a set of packages for non virtualized payloads and virtualizes them so that the stand alone submission tool can access it.
- Currently I am adding this to the IVirtualizationSystem interface to avoid adding additional interface, but we might want to consider moving this some day as it is not a great fit with the rest of the interface.

[CL 19479757 by paul chipchase in ue5-main branch]
2022-03-23 11:14:47 -04:00
paul chipchase
519cb8476c Fix the MinPayloadLength config option for asset virtualization.
#rb PJ.Kack
#jira UE-142748
#rnx
#preflight 620e06d53e74a28b341e72b6
#lockdown aurel.cordonnier

- Turns out that the option did work fine, it just needed to be tested properly, so removed the assert that was guarding against mixed package trailers when submitting virtualized assets.
- Improved verbose logging when a payload is filtered based on payload size.

#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 19073541 in //UE5/Release-5.0/... via CL 19090494
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v921-19075845)

[CL 19134901 by paul chipchase in ue5-main branch]
2022-02-24 19:58:44 -05:00
paul chipchase
74ada40df7 Add a progress bar when packages are being virtualized to give feedback to the user.
#rb Sebastian.Nordgren
#jira UE-133472
#rnx
#preflight 620cef8f492761fc5cb53b05

- Because all of the work is blocking on the main thread adding text to ::EnterProgressFrame to better track progress can be misleading so for now we only display "Virtualizing Assets..."

[CL 19012877 by paul chipchase in ue5-main branch]
2022-02-16 07:51:39 -05:00
paul chipchase
20aae9774f Reworked IVirtualizationSystem::DoPayloadsExist to be clearer about how it works.
#rb Per.Larsson
#rnx
#preflight 61fabeb8ad2ae6c3b763a2b9

- Renamed to ::QueryPayloadStatuses since the output data is in the form of an array of FPayloadStatus
- The method now returns EQueryResult rather than bool, which has 'Success' as value 0, all other values indicate an error.
-- This can be expanded in the future if we start to provide more info from backends as to what actually failed.
- FPayloadStatus has been cleaned up so that 'Partial' is not 'FoundPartial' to give a better indication that the payload was found in at least one storage backend, but not all of them.

[CL 18861036 by paul chipchase in ue5-main branch]
2022-02-04 02:41:37 -05:00
paul chipchase
05d81ee7f3 EditorBulkData and the virtualization system now use FIoHash directly and FPayloadId is removed
#rb Per.Larsson, Devin.Doucette
#jira UE-133497
#rnx
#preflight 61f835491c5ac5523462810a

- A lot of files have been touched but most of this is changing FPayload::IsValid to !FIoHash::IsZero, the main changes are in EditorBulkData/EditorBulkDataTests
- The only difference between FPayloadId and FIoHash is that the former considered the hash of an invalid or empty buffer to be 'invalid' too where as FIoHash would return a valid hash
-- To keep behaviour the same, we only hash payloads in EditorBulkData using a utility method ::HashBuffer which will not hash empty or invalid payloads and return a default FIoHash instead.
-- Unit tests have been extended to prove that the behaviour has not changed.

#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18806362 in //UE5/Release-5.0/... via CL 18808527 via CL 18821790
#ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v908-18788545)

[CL 18822154 by paul chipchase in ue5-main branch]
2022-02-02 02:21:24 -05:00
paul chipchase
ba4450750f Log how many payloads were actually uploaded to persistent backend storage when virtualizing packages.
#rb trivial
#rnx
#preflight 61f3e1f774510448a67a5e76

#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18770126 in //UE5/Release-5.0/... via CL 18770139 via CL 18770163
#ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v903-18687472)

[CL 18770165 by paul chipchase in ue5-main branch]
2022-01-28 07:55:19 -05:00
paul chipchase
eef8bafac9 Fix the virtualization config option 'EnablePushToBackend' to no longer produce submission errors when disabled.
#rb PJ.Kack
#rnx
#jira UE-140366
#preflight 61f2a6f752396dbfeeede332

- Add a way to poll the virtualization system to see if pushing to a backend storage type is possible or not.
- If we cannot push to persistent backend storage when submitting packages to source control we just log in verbose mode and return rather than giving an error. Submitting a non-virtualized package is no longer the problem it once was.

#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18769119 in //UE5/Release-5.0/... via CL 18769125 via CL 18769151
#ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v903-18687472)

[CL 18769154 by paul chipchase in ue5-main branch]
2022-01-28 03:48:55 -05:00
paul chipchase
0d4d469bbd Add LexToString to FPayloadId and change existing use of ToString to use it.
#rb Per.Larsson
#rnx
#jira UE-133497
#preflight 61f259a3706ac5ea0cf2ee3e

#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18751491 in //UE5/Release-5.0/... via CL 18751500 via CL 18751554
#ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v903-18687472)

[CL 18751559 by paul chipchase in ue5-main branch]
2022-01-27 04:20:46 -05:00
paul chipchase
77c12cf814 Virtualized bulkdata is now know as EditorBulkData
#rb PJ.Kack
#rnx
#jira UE-139708
#preflight 61eaaedfc92021e535b6d1e0

- The goal is to show a clear separation between the bulkdata object and the virtualization system.
- NOTE that all of these classes were added during 5.0 development so we are just renaming and moving them, no deprecation paths.
- Renamed FVirtualizedUntypedBulkData to FEditorBulkData
-- Also removed the derived templated types, they were never actually used
- Renamed FVirtualizedBulkDataReader to FEditorBulkDataReader
- Renamed FVirtualizedBulkDataWriter to FEditorBulkDataWriter
- Moved the renamed classes to the UE::Serialization namespace from UE::Virtualization
- Renamed the files of the renamed classes where required.
- Replaced use of LogVirtualization with LogSerialization.
- Renamed defines prefixed with VBD_* to UE_ and make sure they are undefed by the end of the cpp
- Edited unit tests to show up under "System.CoreUObject.Serialization.EditorBulkData.*"

#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18688778 in //UE5/Release-5.0/... via CL 18688787 via CL 18688810
#ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v903-18687472)

[CL 18688823 by paul chipchase in ue5-main branch]
2022-01-21 09:18:53 -05:00
paul chipchase
73afc77555 Clean up virtualized bulkdata serialization and allow the package trailer to contain payloads with different access types.
#rb PJ.Kack
#jira UE-139348
#rnx
#preflight 61e6742a3778a195dea02199

### PackageTrailer
- We no longer store the access mode for all payloads in the trailer. Instead each payload entry in the lookup table will store the access mode for that specific payload. This allow us to store many payloads of different access types in the same trailer.
- Note that this format change is currently versioned (going from 0 to 1) and is compatible with older version.
-- To allow this, the << operator has been removed from FLookupTableEntry and replaced with a ::Serialize method that takes the version of the package trailer header as an input so that it can run it's own versioning checks.
- When loading trailers with version 0 we apply the access mode to all non virtualized payload entries in the lookup table to reproduce the single access mode for the entire trailer.
- A FPackageTrailerBuilder can now be created directly from an existing FPackageTrailer that will reference all of the trailers local payloads via the new static method ::CreateReferenceToTrailer
- Removed FPackageTrailer::TrySave as it is no longer used when creating a trailer for the editor domain.
- Renamed FPackageTrailer::LoadPayload to ::LoadLocalPayload to make it clear the type of payload that can be loaded.
-- Someday the package trailer should be able to load payloads of any type.

### VirtualizedBulkData Serialization Changes
- We now record if the bulkdata's payload is stored in a package trailer or not via the 'StoredInPackageTrailer' flag, which makes it easier to identify critical errors in serialization (the trailer is missing for example)
-- Storing this info as a flag is now possible as the design for the trailer has changed and a fully virtualized package will still contain a trailer.
- We now only write out OffsetInFile if the payload is not stored in a FPackageTrailer, if the bulkdata object does use a package trailer via the new flag then we take the offset from the trailer instead.
- We also no longer write out a path when saving a payload as referenced to the editor domain, it is assumed that the trailer will know where the workspace domain data is (namely the owning package path)
-- Due to the 'StoredInPackageTrailer' only being added now, this means that packages already saved with a package trailer will work with this code logic as those packages will have serialized the offsets to disk, but won't have the new flag and so will serialize the offsets when loading until re-saved. (worth noting that nobody should encounter this as the package trailer system is disabled by default)
- The loading code is a lot easier to read now and a lot of the branches have been removed although we still have a few extra checks for older formats that might have been saved during development.
- When loading the only real consideration we have is if the payload is in the trailer or not.
- Future Work:
-- If we are not writing to the trailer then we use the legacy serialization system which appends the bulkdata to the end of the package. This is used in two cases, 1) When the package trailer system is disabled 2) If the payload is updated in postload and we are writing out the package to the editor domain as the package trailer does not support writing out local payloads outside of the workspace domain at the moment. The trailer should handle this itself and not rely on the legacy serialization system so that we can remove it.
-- Note that if a payload is stored by reference, we are not adding it explicitly to the payload trailer and we are relying on the trailer builder already containing the reference information when it is created in SavePackage. We do however run an assert to ensure that the reference does exist in the trailer but it might be worth revisiting this.

### VirtualizedBulkData Misc
- The member 'PackageSegment' is now initialized in the constructor to EPackageSegment::Header.
- Moved ::GetPackagePathFromOwner from being a method to a utility function as it accessed no members.
- Validating the payload for saving during serialization has been moved from ::Serialize to it's own method ::TryPayloadValidationForSaving.

### PackageSubmissionChecks
- Added an error message if we detect referenced payloads in a trailer we are trying to virtualize as this condition should never occur. If we encounter it, we give a user facing error message and fail the submit process. It might be worth demoting this to an assert instead as it is not an error the user is expected to see.

#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18637906 in //UE5/Release-5.0/... via CL 18637926 via CL 18637929
#ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v899-18417669)

[CL 18637932 by paul chipchase in ue5-main branch]
2022-01-18 05:41:44 -05:00
paul chipchase
9dd885f6bb A small clean up on how the package trailer is generated.
#rb PJ.Kack
#rnx
#preflight 61b79447f807c623046e91a5

### FPackageTrailer
- FPackageTrailerBuilder::Create now renamed to ::CreateFromTrailer to make it a bit clearer.
- Added 'AccessMode' to the file format documentation.
- FPackageTrailerBuilder no longer tracks infomation about the trailer once it is written to disk. Instead when writing to disk we will create a valid FPackageTrailer as part of the serialization. This will be passed to callbacks and can be queried by virtualized bulkdata.
-- Since the builder no longer tracks data after creating the trailer, we can remove checks from methods like ::AddPayload.
-- Removed FPackageTrailerBuilder::FindPayloadOffset and added a version to FPackageTrailer instead.
- Changed a number of methods to take FPayloadId by reference rather than by value.

### VirtualizedBulkData
- Replaced areas of code responsible for updating previously written out values with the utility function ::UpdateArchiveData to help with code readability.

### Misc
- FLinkerSave no longer creates a FPackageTrailerBuilder by default, instead one should be added to the FLinkerSave if we want to build a trailer for the package being saved.
- FLinkerLoad now considers 0 to be an invalid value for PayloadTocOffset in addition to INDEX_NONE, so we now check if the value is > than 0. This is because when FPackageFileSummary objects are created, all members are set to zero and we know that a FPackageTrailer cannot be 0 and the rest of the package still exist so we can safely consider it invalid.
- SavePackage/SavePackage2 now create the FPackageTrailerBuilder when the linker is created if the package trailer feature is enabled.
-- We now always call BuildAndWriteTrailer and then only skip the creation of the trailer if the builder is missing or conditions are not met.
-- If we do not write out the trailer, we make sure to set PayloadTocOffset in the FPackageFileSummary to INDEX_NONE as it is clearer than leaving it at the default 0.

#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18453324 in //UE5/Release-5.0/... via CL 18453328
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v898-18417669)

[CL 18453333 by paul chipchase in ue5-release-engine-test branch]
2021-12-14 06:30:37 -05:00
paul chipchase
21530c814b Packages submitted from the editor together will now virtualize their payloads in a single batch rather than one at a time.
#rb PJ.Kack
#jira UE-136126
#rnx
#preflight

### VirtualizationSystem
- Added a new overload for Push to VirtualizationSystem that takes an array of FPushRequest, which is a new structure representing a single payload request.
- Filtering by package name is currently disabled, this is because the API has been forced into changing and passing the package name in via a FString rather than FPackagePath which means we would need to be more careful. This will be done in a future submit.
- The backend interface has been extended to also have a batch version of PushData, by default this will attempt to submit each request one at a time so payloads don't have to try and implement a batched version if there is no need.
- The context being passed with a payload when being pushed has been changed from FPackagePath to FString due to include order issues, as the FPackagePath lives in CoreUObject and the API for virtualization lives in Core. Additionally in the future the payloads might not be owned by a package (there is nothing specifically enforcing this) so the context being a string makes more sense.
- NOTE: Due to the context change we currently no longer support the filtering feature, which allows for payloads belonging to packages under specific directories to be excluded from virtualization. This is something that will be solved in a future submit.

### SourceControlBackend

- Now that we can submit multiple payloads in the same submit, the CL description has been changed slightly. We will now print a list of payload identifiers -> the package trying to submit that payload. This will only tell the users which package originally caused the payload to submit. If a user submits a new package at a later date that contains the same payload we will not be updating the description.

### PackageSubmissionChecks
- Converted the submission process to use the new batch push operation in VirtualizationSystem.
-- This means that we do a single push and then have to update the package trailers to convert the now pushed payloads from local to virtualized.
- Added new define UE_PRECHECK_PAYLOAD_STATUS that makes it easy to toggle off the checks to see which payloads need to be submitted to the persistent backend.  This is useful to test if it actually helps speed up the overall operations or if it is faster to just perform the batch push operations on all payloads and check the return values.
-- The hope is that over time the submission processes will become fast enough that we can remove the precheck.
- Fixed up logging to not always assume more than one package or payload.

### General Notes
- Errors and logging is now a bit more vague as we often not just report that X payloads failed etc rather than specific payload identifiers. This probably doesn't affect the user too much since those identifiers as fairly meaningless to them anyway.
- The source control submission could be further optimized by first checking the status of the files in thge depot and only then creating/switching workspace etc.
- As currently written, we need to load all of the payloads into memory, then the backends will do what they need (in the case of source control this results in the payloads being written to disk then submitted) which could create quite a large memory spike when submitting a large number of packages.
-- One solution would be to change the batch push API to take a "payload provider" interface and have the payloads requested as needed rather than passing in the FCompressedBuffer directly. This would let us immediately write the payload to disk for submission then discard it from memory, preventing larger spikes. Although it could cause overhead if there are multiple backends being submitted to. Internally we are unlikely to have more than one backend per storage solution so maybe we should just make it a config option?

#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18403735 in //UE5/Release-5.0/... via CL 18403737
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v896-18170469)

[CL 18403738 by paul chipchase in ue5-release-engine-test branch]
2021-12-08 02:19:42 -05:00
paul chipchase
624a015519 Prevent invalid FPayloadIds (empty payloads) from being added to the package trailer.
#rb PJ.Kack
#jira UE-136374
#rnx
#preflight 61a8ad08ad6629a51eb1f8d8

- Having invalid payloads in the trailer's lookup table and writing empty payload data to disk was a waste of time and meant we had to add more error handling code elsewhere.
- Now when a VBD attempts to add one to the trailer it will be rejected, although the provided callback will still be invoked so that the VBD object will still mark itself as being serialized to disk and update it's flags etc.
- Asserts have been added when writing the FPackageTrailer to disk or creating a new FPackageTrailerBuilder from an existing FPackageTrailer if invalid FPayloadIds are somehow found. This is not an expected condition and the asserts are there to guard against future code changes that might break the contract.
- Now when virtualizing the package trailer during submission checks we can assume that all FPayloadIds reference actual data.

#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18351024 in //UE5/Release-5.0/... via CL 18351037
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)

[CL 18351039 by paul chipchase in ue5-release-engine-test branch]
2021-12-02 06:58:09 -05:00
paul chipchase
e38a952142 Submitting packages on projects with virtualization enabled is much faster when none of the payloads actually needs to be virtualized.
#rb PJ.Kack
#rnx
#preflight 61a795773c29b3cf13cd8250

### PackageSubmissionChecks
- Under the old model submitting a large number of packages could be very slow as each package would check each payload that it owns and is currently stored locally one at a time. For the source control backend this created quite a large overhead even when all of the payloads were already virtualized.
- Now we do a pass over the submitted files to find all valid packages that have package trailers with locally stored payloads, gather the payloads into a single list and then query that in one large batch.
- Once we find which payloads are not in permanent storage (note that in the case where a project is using multiple permanent storage solutions, if a payload is missing in one backend it counts as not being in permanent storage) we then attempt to virtualized them.
- Only after all of this is done will we create the truncated copy of each package and then append the updated trailer to each one. In theory doing it in this order this might slightly increase the change of submit failures that occur after virtualization that result in a package never being submitted and orphaned payloads being added to permanent storage, but this will always be a risk.
- Added an assert to fire if we detect a trailer with some virtualized and some local payloads. This should be a supported feature but needs proper testing first before we can allow it. With out current project settings no project should actually encounter this scenario.
- To make the code easier to follow we now early out of the entire check when errors are encountered.
- Added logging at various stages in the process to help show the user that something is happening and make problems easier to identify in the future.
- Notes
--  There is a lot of handling of invalid FPayloads. This is because it is currently possible to add empty payloads to the trailer which is inefficient and wastes space. The trailer will be modified to reject empty payloads in a future update at which point a lot of this handling can be removed.
-- This could've also been solved by not fully rehydrating a package on save by the end user, which will be added as a project setting in a future piece of work, but this approach will solve the edge case when the user does have a large amount of hydrated packages which contain payloads that are already virtualized so it was better to fix that now while we have good test cases for it.
-- We still have scaling problems with large number of package being submitted that do have payloads that need to be virtualized, this will be fixed by extending IVirtualizationSystem::Push to also accept batches of payloads in future work.
-- OnPrePackageSubmission could be broken up into smaller chunks to make the code easier to follow. This will be done after the batch payload submission work is done.

### VirtualizationSystem
- EStorageType has been promoted to enum class.
- Added a new enum FPayloadStatus to be used when querying if a payload exists in a backend storage system or not.
- Add a new method ::DoPayloadsExist which allows the caller to query if one or more payloads exists in the given backend storage system.

### VirtualizationManager
- Implemented ::DoPayloadsExist. First we get the results from each backend in the storage system (which return as true or false from each backend) then total how many backends found the payload in order to set the correct status.

### IVirtualizationBackend
- ::DoesPayloadExist which queries the existence of a single payload has been added to the interface. Most backends already implemented this for private use and if so have had their implementation renamed to match this.
- Also added ::DoPayloadsExist which takes a batch of FpayloadIdsto query. Some backends can deal with a batch of payload ids much more efficiently than one at a time, although the default implementation does call ::DoesPayloadExist for each requested payload.
-- The default implementation prevents every backend from needing to implement the same for loop but does allow backends that can gain from batching to override it.

### VirtualizationSourceControlBackend
- This backend does override ::DoPayloadsExist and implements it's own version as it tends to perform very poorly when not operating on larger batches.
- In this case ::DoesPayloadExist calls back to ::DoPayloadsExist to check each payload rather than implement as specific version.

### PackageTrailer
- The trailer can now be queries to request how many payloads of a given type it contains

#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18339847 in //UE5/Release-5.0/... via CL 18339852
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)

[CL 18339859 by paul chipchase in ue5-release-engine-test branch]
2021-12-01 11:13:31 -05:00
paul chipchase
583560b898 Fix a problem that prevented some packages from being correctly virtualized on submit and clean up some of the surrounding code.
#rb PJ.Kack
#rnx
#preflight 61a5cb34801b3619784062d0

- If packages were edited but not submitted and the editor shutdown then restarted there is a possibility that the edited packages will be loaded by the editor, meaning that a FLinkerLoader will have a file lock on the package file. So now we call ::ResetLoadersForSave on each package file that needs to be replaced to remove the file lock.
- When verbose logging is enabled we will now print how long the prepackagesubmission check took as a development aid.
- When packages have passed the prepackagesubmission process we now just append #virtualized to the CL description to mark the assets as having been through the process. The previous marker (a guid) was some what meaningless at the moment.
- Removed the legacy submit code, it is no longer needed for reference

#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18322298 in //UE5/Release-5.0/... via CL 18322317
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)

[CL 18322321 by paul chipchase in ue5-release-engine-test branch]
2021-11-30 02:37:48 -05:00
paul chipchase
ddff3a39f5 Reworked the virtualization system so that we only virtualize a package when it is submitted to source control, which makes the overall system much less dangerous.
#rb PJ.Kack
#rnx
#preflight 61a4b235405273b2c3daa7c7

### Virtualization
- The idea is to move from our current set up, where the virtualization happens when a package is saved and the payloads are saved to a local virtualization storage system with the final push to persistent storage occurring when the package is submitted to a system where we save payloads into the package file, but move them to persistent virtualization storage when the package file is submitted.
- The main advantage is that if someone submits a package file to source control without virtualizing it, we don't have to worry that others might not be able to load the package, the worst case scenario is that data sizes get bigger until the package is virtualized again.
- This is only the first pass, in the future we can do further optimizations, like not storing payloads locally if we know that they are already in the persistent storage system etc.
- In order to keep the virtualization process simple we want to be able to do it without needing to re-save the package, so to this end we will storage the payloads at the end of the package file as before, but instead of storing them inside of the package file format we will use a new container, the FPackageTrailer that is appended to the package file instead.
- In theory this will make future sidecar work easier as the trailer can just be moved to the sidecar file as as long as we know where to find the trailer things will just work (tm)
- For now VBD will continue to serialize it's offset/size to disk and use those values to read the payload directly when not virtualizing. Ideally we'd use the trailer but this will help reduce the risk.

### Current Issues
- The system does not work with the editor domain and has minor issues with text based assets
- The trailer system is disabled by default via the config system [Core]UsePackageTrailer=False and can then be enabled on select test projects until fully functional.

### PackageTrailer
- The trailer is split into two parts FPackageTrailerBuilder and FPackageTrailer
- FPackageTrailerBuilder is used when building the trailer during package save and FPackageTrailer is the structure we load and actually use.
- When saving the trailer we try to avoid using containers and such so that we can try to keep the specification for the file format very clear so that licensees can build their own tools for the virtualization process if they so desire.
- the header file contains a breakdown of the new format.

### VirtualizedBulkData
- FPayloadToc is now only used by the sidecar experimental feature. In a future update the sidecar will be changed to basically be the package trailer but stored in it's own file.
- FindPayloadsInPackageFile has been moved to PackageTrailer.h as it has a closer association with that code.
- Added a new method, ::LoadFromPackageTrailer, to load payloads from the package trailer without needing to know where the payload is on disk. Using this is opt in with the cvar "Serialization.LoadFromTrailer" and is only provided for debugging purposes. Although once this system has matured it will likely become the preferred way to access payloads.
- We no longer allow virtualization of bulkdata payloads on save, but I am not sure if we might want to allow this as an option in the future. So for now the code branch is disabled by the global constant bAllowVirtualizationOnSave.
- Added utility functions ::GetLinkerLoadFromOwner and ::GetTrailerFromOwner for easy access to the LinkerLoad/Trailer from the owning object.
- Added a utility ::UpdateArchiveData for seeking back to a known position in an archive, writing over it, then seeking back, which is fairly common in the package saving code paths. This might be worth moving to Archive.h for general use.
- ::LoadFromPackageFile has been refactored to prefer returning error values early over nested ifs.
- We no longer serialize the EFlags::IsVirtualized flag to disk. Instead it is applied when the package is loaded if we detect that a payload is virtualized and then removed before saved to disk. This allows the virtualization process to occur without resaving the package.
- We currently support the old serialize to disk path which is enabled when the FPackageTrailer system is disabled and the newer system when it is enabled.
- We do however continue to serialize the offset in the package file to disk when saved. Strictly speaking this is not needed as we can look this up from the package trailer when the package is loaded but continuing to serialize it out to disk helps keep the old and newer data format paths working together without adding too much special case code.
-- Once the FPackageTrailer feature is no longer optional we can consider changing this.

### SavePackage/SavePackage2
- Removed the functions that would create the FPayloadToc
- Added new functions to help create the trailer.
- Moved the end of package tag out of the if to it's own scope (still won't run if a package writer exists), the reasoning is to make it clearer where the package format ends and where we can start writing the payload trailer.
- We have checks to make sure that the trailer does not try to write to disk for both text based assets and if the editor domain is enabled. Support for these features will be added later, but since the package trailer system is disabled by default this shouldn't cause problems.

### LinkerLoad
- After we parse the FPackageFileSummary we now also parse the header of the package trailer. In the workspace domain this will incur additional seek costs (which may or not actually seek and invalidate the internal file cache depending on how large the package's exports are) as the trailer is found after the package
-- In the future, the package trailer header can be stored right after the FPackageFileSummary in the editor domain which will remove this overhead.
- By parsing the header of the trailer at this point, virtualized bulkdata objects in the package can use the look up info to determine if their payloads are virtualized or not, and if not then where they reside inside the package file.

### LinkerSave
- We no longer collect a list of all virtualized bulkdata in a package while saving as we no longer generate the FPayloadToc so BulkDataInPackage has been removed.
- We now gather data to be appended to the package trailer via TrailerBuilder.

### DumpPayloadToc
- This command now dumps info based on the package trailer if one can be found.

#ROBOMERGE-AUTHOR: paul.chipchase
#ROBOMERGE-SOURCE: CL 18307893 in //UE5/Release-5.0/... via CL 18307905
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)

[CL 18307913 by paul chipchase in ue5-release-engine-test branch]
2021-11-29 07:02:24 -05:00
aurel cordonnier
fc542f6cfd Merge from Release-Engine-Staging @ 18081189 to Release-Engine-Test
This represents UE4/Main @18073326, Release-5.0 @18081140 and Dev-PerfTest @18045971

[CL 18081471 by aurel cordonnier in ue5-release-engine-test branch]
2021-11-07 23:43:01 -05:00
aurel cordonnier
a6e741e007 Merge from Release-Engine-Staging @ 17915896 to Release-Engine-Test
This represents UE4/Main @17911760, Release-5.0 @17915875 and Dev-PerfTest @17914035

[CL 17918595 by aurel cordonnier in ue5-release-engine-test branch]
2021-10-25 20:05:28 -04:00