Vulkan 1.4: Sooner app masses, much less stutter and fewer Reminiscence Utilization | by Shahbaz Youssefi | Android Builders | Dec, 2024

Vulkan 1.4: Sooner app masses, much less stutter and fewer Reminiscence Utilization | by Shahbaz Youssefi | Android Builders | Dec, 2024


Host Picture Copy is a sport changer for Android

Vulkan 1.4 was launched not too long ago, and with it comes a major function for Android: Host Picture Copy, based mostly on VK_EXT_host_image_copy.

Now we have beforehand written about this extension in this Khronos weblog submit, explaining the technical particulars of utilizing this extension. This extension is especially helpful for Android video games as we’ll see on this submit.

Briefly, Host Picture Copy is a Vulkan function that enables the appliance to switch picture information utilizing the CPU as a substitute of the GPU. This function is especially helpful on UMA units (resembling typical Android units), however might place restrictions on photos. Specifically, most drivers disable framebuffer compression for host-copyable photos which can be in any other case renderable. Learn on to study the place this function actually shines.

To place issues in context, Host Picture Copy is one technique to asynchronously switch picture information. The opposite is utilizing a devoted switch queue (with VK_QUEUE_TRANSFER_BIT, and with out VK_QUEUE_GRAPHICS_BIT). In Vulkan 1.4, at the least one is required. You may anticipate that the overwhelming majority of Android units transport with Vulkan 1.4 will implement Host Picture Copy, and implement it optimally for compressed codecs. That’s, Vulkan requires optimalDeviceAccess to be true for these codecs.

Because it occurs, texture information constitutes the most important quantity of picture information in typical video games, and so they use compressed codecs!

First, let’s see how Host Picture Copy differs from doing information copies on the GPU, resembling with vkCmdCopyBufferToImage2.

With out Host Picture Copy, the trail from texture information loaded from disk to a picture goes by means of a Vulkan buffer:

  • A Vulkan buffer is allotted, taking on about as a lot reminiscence because the Vulkan picture does.
  • The feel information is copied (within the fashion of memcpy) to the buffer after mapping it by the CPU.
  • vkCmdCopyBufferToImage2 is recorded within the command buffer that’s later submitted.
  • The feel information is copied to the picture by the GPU.
  • The buffer reminiscence is freed a couple of frames later as soon as the appliance is aware of the GPU copy is completed.

Within the above, the feel information is copied twice, and for a couple of frames the quantity of reminiscence allotted for the feel information is twice the scale of the picture. There are two additional issues to notice right here:

  • The copy on the CPU is as quick as it will possibly get, as a result of it’s successfully memcpy.
  • The copy on the GPU effectively reorders the info to match the bodily structure of the picture (a.ok.a. structure swizzling), nevertheless it occurs on the graphics queue (assuming no devoted switch queues), interfering with rendering in the identical body.

With Host Picture Copy as a substitute, the copy is completed just by calling vkCopyMemoryToImage. On this case, the CPU does the copy and structure swizzling. This copy is slower than every of the copies above, as a result of the CPU shouldn’t be as environment friendly in reordering the info, however:

  • The copy, even when slower, is simply executed as soon as
  • The copy doesn’t intrude with ongoing GPU work
  • There isn’t any additional reminiscence allotted for texture information

FYI, the rationale this extension has much less utility on NUMA units, resembling units with devoted GPUs (and devoted reminiscence) is that the CPU might not have entry to your complete GPU reminiscence or entry could also be too sluggish, which can restrict the quantity of reminiscence that might be used for host-copyable textures, or the copy could also be prohibitively costly. The identicalMemoryTypeRequirements property signifies whether or not Host Picture Copy limits entry to GPU reminiscence or not.

Within the following, two situations are introduced the place Host Picture Copy can considerably enhance a sport with the above properties in thoughts.

Eradicating stutter throughout texture information streaming whereas concurrently halving reminiscence utilization sounds too good to be true, however that’s precisely the type of factor Host Picture Copy allows.

To set the scene: think about an open-world sport, you might be nearing a brand new space and plenty of new textures must be loaded from persistent storage. You’re cruising at 60 FPS; it will be a disgrace if that drops to twenty FPS or the sport crashes with Out of Reminiscence.

Avoiding such stutters with Host Picture Copy is quite simple.

The appliance can use a CPU thread to stream in texture information instantly into new photos utilizing Host Picture Copy. The GPU would proceed to render frames of constant complexity as earlier than, sustaining FPS, and the reminiscence enhance is as minimal as it will possibly get. Don’t neglect to reminiscence map the feel information file as a substitute of studying right into a CPU buffer first for much more effectivity!

Can we apply the identical technique for when the sport is being loaded within the first place? Certain, use a number of CPU threads to repeat texture information instantly into photos. Provided that the CPU copy is slower because of structure swizzling, load instances might not likely be any sooner, however at the least the reminiscence utilization is halved!

However Host Picture Copy has a secret manner of creating this a lot sooner — as quick as memcpy! Mainly the CPU copy can be simply as environment friendly because the CPU copy within the GPU Switch situation, the GPU copy is gone, the GPU buffer is gone, it’s all goodness and no downsides. The bottom line is VK_HOST_IMAGE_COPY_MEMCPY.

This flag is trivial, it merely tells the CPU not to do structure swizzling. So the feel information being copied to the picture is assumed to be pre-swizzled, and the copy is solely memcpy. However because the structure swizzling of photos on numerous units shouldn’t be public info, how is this convenient?

The reply is in image-to-memory copies with the identical flag, that’s readback of swizzled picture information with out undoing the structure swizzling. Many high-fidelity AAA Android video games obtain huge packages of texture information on the primary run of the sport. Take the next algorithm:

  • Obtain texture information
  • Use a short lived Vulkan picture and name vkCopyMemoryToImage -> the CPU does structure swizzling
  • Learn again the picture contents with vkCopyImageToMemory with the VK_HOST_IMAGE_COPY_MEMCPY flag -> the returned information is pre-swizzled for this explicit system/driver
  • Retailer solely the pre-swizzled information to persistent storage, not the unique texture information, to attenuate storage footprint

The following time the sport runs, it will possibly merely use vkCopyMemoryToImage with VK_HOST_IMAGE_COPY_MEMCPY to copy the pre-swizzled information into the pictures as quick as a easy learn of the file contents can be. This additionally occurs to optimize the streaming situation above!

Solely gotcha is that driver updates would possibly change the structure swizzling of photos. The sport must test that optimalTilingLayoutUUID is unchanged because the pre-swizzled texture information was cached, and redo the above if it ever adjustments. Thankfully, adjustments to the structure swizzle are uncommon. In apply, the sport is unlikely to ever must redownload or reprocess its texture information.

The Host Picture Copy function as conditionally required by Vulkan 1.4, and unconditionally required by Android 16 for brand new units, is a sport changer for video games on Android. On this submit we checked out a couple of straightforward however important wins utilizing this performance, however there are others, notably asynchronous picture reminiscence defragmentation. Absolutely, your ingenuity will result in different optimizations which can be made doable by this function.

You’ll want to take a look at this submit on the Khronos weblog for extra technical particulars across the utilization of this performance. As this performance begins to develop into prevalent on Android telephones, Vulkan video games will probably be . Don’t miss out!

author avatar
roosho Senior Engineer (Technical Services)
I am Rakib Raihan RooSho, Jack of all IT Trades. You got it right. Good for nothing. I try a lot of things and fail more than that. That's how I learn. Whenever I succeed, I note that in my cookbook. Eventually, that became my blog. 
rooshohttps://www.roosho.com
I am Rakib Raihan RooSho, Jack of all IT Trades. You got it right. Good for nothing. I try a lot of things and fail more than that. That's how I learn. Whenever I succeed, I note that in my cookbook. Eventually, that became my blog. 

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here


Latest Articles

author avatar
roosho Senior Engineer (Technical Services)
I am Rakib Raihan RooSho, Jack of all IT Trades. You got it right. Good for nothing. I try a lot of things and fail more than that. That's how I learn. Whenever I succeed, I note that in my cookbook. Eventually, that became my blog.