Common media processing operations with Jetpack Media3 Transformer

March 6, 2025

News

Common media processing operations with Jetpack Media3 Transformer

March 6, 2025

Posted by Nevin Mital – Developer Relations Engineer, and Kristina Simakova – Engineering Supervisor

Android customers have demonstrated an growing want to create, personalize, and share video content material on-line, whether or not to protect their reminiscences or to make folks chortle. As such, media enhancing is a cornerstone of many partaking Android apps, and traditionally builders have usually relied on exterior libraries to deal with operations reminiscent of Trimming and Resizing. Whereas these options are highly effective, integrating and managing exterior library dependencies can introduce complexity and result in challenges with managing efficiency and high quality.

The Jetpack Media3 Transformer APIs provide a local Android answer that streamline media enhancing with quick efficiency, intensive customizability, and broad machine compatibility. On this weblog publish, we’ll stroll by means of among the most typical enhancing operations with Transformer and talk about its efficiency.

Getting arrange with Transformer

To get began with Transformer, try our Getting Began documentation for particulars on easy methods to add the dependency to your mission and a fundamental understanding of the workflow when utilizing Transformer. In a nutshell, you’ll:

Create one or many MediaItem cases out of your video file(s), then

Apply item-specific edits to them by constructing an EditedMediaItem for every MediaItem,

Create a Transformer occasion configured with settings relevant to the entire exported video,

and eventually begin the export to avoid wasting your utilized edits to a file.

Apart: You may also use a CompositionPlayer to preview your edits earlier than exporting, however that is out of scope for this weblog publish, as this API remains to be a piece in progress. Please keep tuned for a future publish!

Right here’s what this seems like in code:

val mediaItem = MediaItem.Builder().setUri(mediaItemUri).construct()
val editedMediaItem = EditedMediaItem.Builder(mediaItem).construct()
val transformer = 
  Transformer.Builder(context)
    .addListener(/* Add a Transformer.Listener occasion right here for completion occasions */)
    .construct()
transformer.begin(editedMediaItem, outputFilePath)

Transcoding, Trimming, Muting, and Resizing with the Transformer API

Let’s now check out 4 of the most typical single-asset media enhancing operations, beginning with Transcoding.

Transcoding is the method of re-encoding an enter file right into a specified output format. For this instance, we’ll request the output to have video in HEVC (H265) and audio in AAC. Beginning with the code above, listed below are the strains that change:

val transformer = 
  Transformer.Builder(context)
    .addListener(...)
    .setVideoMimeType(MimeTypes.VIDEO_H265)
    .setAudioMimeType(MimeTypes.AUDIO_AAC)
    .construct()

Lots of you might already be aware of FFmpeg, a preferred open-source library for processing media recordsdata, so we’ll additionally embody FFmpeg instructions for every instance to function a useful reference. Right here’s how one can carry out the identical transcoding with FFmpeg:

$ ffmpeg -i $inputVideoPath -c:v libx265 -c:a aac $outputFilePath

The following operation we’ll strive is Trimming.

Particularly, we’ll set Transformer as much as trim the enter video from the three second mark to the 8 second mark, leading to a 5 second output video. Beginning once more from the code within the “Getting arrange” part above, listed below are the strains that change:

// Configure the trim operation by including a ClippingConfiguration to
// the media merchandise
val clippingConfiguration =
   MediaItem.ClippingConfiguration.Builder()
     .setStartPositionMs(3000)
     .setEndPositionMs(8000)
     .construct()
val mediaItem =
   MediaItem.Builder()
     .setUri(mediaItemUri)
     .setClippingConfiguration(clippingConfiguration)
     .construct()

// Transformer additionally has a trim optimization function we will allow.
// This can prioritize Transmuxing over Transcoding the place potential.
// See extra about Transmuxing additional down on this publish.
val transformer = 
  Transformer.Builder(context)
    .addListener(...)
    .experimentalSetTrimOptimizationEnabled(true)
    .construct()

With FFmpeg:

$ ffmpeg -ss 00:00:03 -i $inputVideoPath -t 00:00:05 $outputFilePath

Subsequent, we will mute the audio within the exported video file.

val editedMediaItem = 
  EditedMediaItem.Builder(mediaItem)
    .setRemoveAudio(true)
    .construct()

The corresponding FFmpeg command:

$ ffmpeg -i $inputVideoPath -c copy -an $outputFilePath

And for our last instance, we’ll strive resizing the enter video by scaling it all the way down to half its authentic peak and width.

val scaleEffect = 
  ScaleAndRotateTransformation.Builder()
    .setScale(0.5f, 0.5f)
    .construct()
val editedMediaItem =
  EditedMediaItem.Builder(mediaItem)
    .setEffects(
      /* audio */ Results(emptyList(), 
      /* video */ listOf(scaleEffect))
    )
    .construct()

An FFmpeg command may appear like this:

$ ffmpeg -i $inputVideoPath -filter:v scale=w=trunc(iw/4)*2:h=trunc(ih/4)*2 $outputFilePath

In fact, you too can mix these operations to use a number of edits on the identical video, however hopefully these examples serve to reveal that the Transformer APIs make configuring these edits easy.

Transformer API Efficiency outcomes

Listed here are some benchmarking measurements for every of the 4 operations taken with the Stopwatch API, working on a Pixel 9 Professional XL machine:

(Be aware that efficiency for operations like these can rely on a wide range of causes, reminiscent of the present load the machine is underneath, so the numbers beneath must be taken as tough estimates.)

Enter video format: 10s 720p H264 video with AAC audio

Transcoding to H265 video and AAC audio: ~1300ms

Trimming video to 00:03-00:08: ~2300ms

Muting audio: ~200ms

Resizing video to half peak and width: ~1200ms

Enter video format: 25s 360p VP8 video with Vorbis audio

Transcoding to H265 video and AAC audio: ~3400ms

Trimming video to 00:03-00:08: ~1700ms

Muting audio: ~1600ms

Resizing video to half peak and width: ~4800ms

Enter video format: 4s 8k H265 video with AAC audio

Transcoding to H265 video and AAC audio: ~2300ms

Trimming video to 00:03-00:08: ~1800ms

Muting audio: ~2000ms

Resizing video to half peak and width: ~3700ms

One method Transformer makes use of to hurry up enhancing operations is by prioritizing transmuxing for fundamental video edits the place potential. Transmuxing refers back to the technique of repackaging video streams with out re-encoding, which ensures high-quality output and considerably sooner processing instances.

When not potential, Transformer falls again to transcoding, a course of that entails first decoding video samples into uncooked knowledge, then re-encoding them for storage in a brand new container. Listed here are a few of these variations:

Transmuxing

Transformer’s most well-liked strategy when potential – a fast transformation that preserves elementary streams.
Solely relevant to fundamental operations, reminiscent of rotating, trimming, or container conversion.
No high quality loss or bitrate change.

Transcoding

Transformer’s fallback strategy in instances when Transmuxing is not potential – Includes decoding and re-encoding elementary streams.
Extra intensive modifications to the enter video are potential.
Loss in high quality resulting from re-encoding, however can obtain a desired bitrate goal.

We’re repeatedly implementing additional optimizations, such because the lately launched experimentalSetTrimOptimizationEnabled setting that we used within the Trimming instance above.

A trim is normally carried out by re-encoding all of the samples within the file, however since encoded media samples are saved chronologically of their container, we will enhance effectivity by solely re-encoding the group of images (GOP) between the beginning level of the trim and the primary keyframes at/after the beginning level, then stream-copying the remaining.

Since we solely decode and encode a hard and fast portion of any file, the encoding latency is roughly fixed, no matter what the enter video length is. For lengthy movies, this improved latency is dramatic. The optimization depends on having the ability to sew a part of the enter file with newly-encoded output, which signifies that the encoder’s output format and the enter format have to be suitable.

If the optimization fails, Transformer robotically falls again to regular export.

What’s subsequent?

As a part of Media3, Transformer is a local answer with low integration complexity, is examined on and ensures compatibility with all kinds of gadgets, and is customizable to suit your particular wants.

To dive deeper, you’ll be able to discover Media3 Transformer documentation, run our pattern apps, or discover ways to complement your media enhancing pipeline with Jetpack Media3. We’ve already seen app builders profit tremendously from adopting Transformer, so we encourage you to strive them out your self to streamline your media enhancing workflows and improve your app’s efficiency!

Supply hyperlink

roosho Senior Engineer (Technical Services)

I am Rakib Raihan RooSho, Jack of all IT Trades. You got it right. Good for nothing. I try a lot of things and fail more than that. That's how I learn. Whenever I succeed, I note that in my cookbook. Eventually, that became my blog.

See Full Bio

share this article.

Common media processing operations with Jetpack Media3 Transformer

Common media processing operations with Jetpack Media3 Transformer

Getting arrange with Transformer

Transcoding, Trimming, Muting, and Resizing with the Transformer API

Transformer API Efficiency outcomes

Transmuxing

Transcoding

What’s subsequent?

No Comment! Be the first one.

Leave a Reply Cancel reply

related posts .

Microsoft Edge is now significantly faster than before

GNOME 49 Alpha is out now, bringing a change that X11 fans may find disappointing

Recent Posts

Microsoft Edge is now significantly faster than before

GNOME 49 Alpha is out now, bringing a change that X11 fans may find disappointing

‘Clean’ vs ‘dirty’ Windows 11 performance was benchmarked, results may surprise you

Tag Cloud

Type and hit Enter to search

Common media processing operations with Jetpack Media3 Transformer

Common media processing operations with Jetpack Media3 Transformer

Getting arrange with Transformer

Transcoding, Trimming, Muting, and Resizing with the Transformer API

Transformer API Efficiency outcomes

Transmuxing

Transcoding

What’s subsequent?

No Comment! Be the first one.

Leave a Reply Cancel reply

related posts .

Microsoft Edge is now significantly faster than before

GNOME 49 Alpha is out now, bringing a change that X11 fans may find disappointing

Recent Posts

Microsoft Edge is now significantly faster than before

GNOME 49 Alpha is out now, bringing a change that X11 fans may find disappointing

‘Clean’ vs ‘dirty’ Windows 11 performance was benchmarked, results may surprise you

Tag Cloud

Enjoying my articles?

Sign up to get new content delivered straight to your inbox.