How to use RenderScript to convert YUV_420_888 YUV Image to Bitmap
- Note to readers
- RenderScript
- YUV_420_888 to Bitmap in Android using RenderScript
- Performance verdict
- Just FYI, RenderScript is being deprecated in Android S (Android 12)
- References
- Attributions
RenderScript turns out to be one of the best APIs for running
computationally-intensive code on the CPU or GPU (that too, without having to
make use of the NDK or GPU-specific APIs). We can use some existing
intrinsics or create our new kernels that describe the computation and the
framework takes care of scheduling & execution. In this code I have explained
how to use ScriptIntrinsicYuvToRGB
intrinsic that is available in Android APIs
to convert an android.media.Image
in YUV_420_888
format to Bitmap
.
Note to readers
- I have a habit of explaining concepts before showing the code, if you want to see the money, no code - go to Java code section.
- Despite lacking a good sense of humour, I tend to try to write funny.
- For this article, I expect readers to be familiar with Android, Java, JNI, Images in YUV and Bitmap formats.
- This article also serves as an example of how to use
RenderScript
in Android, and it’s performance benefits — even if this is not the exact use case you are targeting.
RenderScript
You can read much more about RenderScript in Android’s documentation. In short:
- RenderScript is a framework for running computationally intensive tasks at high performance on Android.
- Primarily oriented for data-parallel computation (very well suited for image processing).
- Runtime parallelizes the work across available processors such as multi-core CPUs & GPUs.
- Developers can focus on the algorithm rather than it’s scheduling.
- Language derived from C99
YUV_420_888 to Bitmap in Android using RenderScript
If you are here, I assume you are fairly aware of YUV_420_888
format & Bitmap
.
If you are not, and you want to learn more or find other ways to convert YUV_420_888
to Bitmap
in Android, you can check out another article:
ScriptIntrinsicYuvToRGB
ScriptIntrinsicYuvToRGB is an intrinsic for converting Android YUV buffer to
RGB. The input allocation is expected to be in NV21
format and the output will
be RGBA with alpha channel set to 255.
In NV21 format, we have full Y channels and subsampled U & V channels. That is each U & V pixel caters to 4 pixels in the Y channel. Also, the NV21 format is single plane with Y channel data first followed by interleaved V & U pixels. For a 4X4 image it would look like YYYYYYYYYYYYYYYYVUVUVUVU.
BTW if you are wondering what an intrinsic
means:
intrinsic: Normally, “intrinsics” refers to functions that are built-in – i.e. most standard library functions that the compiler can/will generate inline instead of calling an actual function in the library.
(source: What are intrinsics? — StackOverflow)
How to use this intrinsic
So basically, let’s say we want to use the Java APIs available. We:
- Initialise a
RenderScript
Context. - Create input and output
Allocations
.- An Allocation is a
RenderScript
object that provides storage for a fixed amount of data. This needs to be provided by the caller. - Allocations are of type Element.
In this case we would want to create one allocation for input (YUV) and other for output (
Bitmap
). - For
Bitmap
we can directly use theElement
of type RGBA_8888. - For
Image
we don’t have a direct element type, and we need to convert the data into a flatNV21
byte[]
.
- An Allocation is a
- Copy input and output to the allocations.
- Execute the kernel.
Java Code
Since the allocations are expensive, I’d design the allocations to be reusable.
Note: In this example I am not taking care of the thread-safety or making the code asynchronous or whether we should do resource allocation in the constructor. Leaving all of those tasks to you :)
So we can start with a skeleton of our YuvConvertor class:
class YuvConvertor {
private final Allocation in, out;
private final ScriptIntrinsicYuvToRGB script;
public YuvConvertor(Context context, int width, int height) {
// Setup
}
public Bitmap toBitmap(Image image) {
if (image.getFormat() != ImageFormat.YUV_420_888) {
throw new IllegalArgumentException("Only supports YUV_420_888.");
}
// Convert ..
}
}
Constructor
class YuvConvertor {
private final Allocation in, out;
private final ScriptIntrinsicYuvToRGB script;
public YuvConvertor(Context context, int width, int height) {
RenderScript rs = RenderScript.create(context);
this.script = ScriptIntrinsicYuvToRGB.create(
rs, Element.U8_4(rs));
// NV21 YUV image of dimension 4 X 4 has following packing:
// YYYYYYYYYYYYYYYYVUVUVUVU
// With each pixel (of any channel) taking 8 bits.
int yuvByteArrayLength = (int) (width * height * 1.5f);
Type.Builder yuvType = new Type.Builder(rs, Element.U8(rs))
.setX(yuvByteArrayLength);
this.in = Allocation.createTyped(
rs, yuvType.create(), Allocation.USAGE_SCRIPT);
Type.Builder rgbaType = new Type.Builder(rs, Element.RGBA_8888(rs))
.setX(width)
.setY(height);
this.out = Allocation.createTyped(
rs, rgbaType.create(), Allocation.USAGE_SCRIPT);
}
}
Note: There is a certain overhead associated with construction of this object (driven by initialization cost). Be mindful of when to initialize this. On a mid-tier Android device I observed it to take between
0ms - 14ms
. The cost could be higher on a low-end device.
toBitmap(Image image) method.
public Bitmap toBitmap(Image image) {
if (image.getFormat() != ImageFormat.YUV_420_888) {
throw new IllegalArgumentException("Only supports YUV_420_888.");
}
byte[] yuvByteArray = toNv21(image);
in.copyFrom(yuvByteArray);
script.setInput(in);
script.forEach(out);
// Allocate memory for the bitmap to return. If you have a reusable Bitmap
// I recommending using that.
Bitmap bitmap = Bitmap.createBitmap(
image.getWidth(), image.getHeight(), Config.ARGB_8888);
out.copyTo(bitmap);
return bitmap;
}
private byte[] toNv21(Image image) {
// TODO: Implement this.
}
Since RenderScript
inherently doesn’t support YUV_420_888
Image
- we need
to convert the Image
to a byte[]
. This is going to be slow down every thing.
Unless of course we can write a fast highly parallelized way to do so (which should be doable). So I have found two ways to do it - one is fast other is not. Also, important to note that neither of these approaches leverage the parallelizability of this copy.
Let’s start with our Java approach which is slower.
toNv21(Image image) Java Approach
private byte[] toNv21(Image image) {
int width = yuv420Image.getWidth();
int height = yuv420Image.getHeight();
// Order of U/V channel guaranteed, read more:
// https://developer.android.com/reference/android/graphics/ImageFormat#YUV_420_888
Plane yPlane = yuv420Image.getPlanes()[0];
Plane uPlane = yuv420Image.getPlanes()[1];
Plane vPlane = yuv420Image.getPlanes()[2];
ByteBuffer yBuffer = yPlane.getBuffer();
ByteBuffer uBuffer = uPlane.getBuffer();
ByteBuffer vBuffer = vPlane.getBuffer();
// Full size Y channel and quarter size U+V channels.
int numPixels = (int) (width * height * 1.5f);
byte[] nv21 = new byte[numPixels];
int idY = 0;
int idUV = width * height;
int uvWidth = width / 2;
int uvHeight = height / 2;
// Copy Y & UV channel.
// NV21 format is expected to have YYYYVU packaging.
// The U/V planes are guaranteed to have the same row stride and pixel stride.
int uvRowStride = uPlane.getRowStride();
int uvPixelStride = uPlane.getPixelStride();
int yRowStride = yPlane.getRowStride();
int yPixelStride = yPlane.getPixelStride();
for(int y = 0; y < height; ++y) {
int yOffset = y * yRowStride;
int uvOffset = y * uvRowStride;
for (int x = 0; x < width; ++x) {
nv21[idY++] = yBuffer.get(yOffset + x * yPixelStride);
if (y < uvHeight && x < uvWidth) {
int bufferIndex = uvOffset + (x * uvPixelStride);
// V channel.
nv21[idUV++] = vBuffer.get(bufferIndex);
// U channel.
nv21[idUV++] = uBuffer.get(bufferIndex);
}
}
}
return nv21;
}
I have the split the performance numbers into two part:
- Full
toBitmap(..)
method. toBitmap(..)
withouttoNv21(..)
to show the cost difference of the intrinsic and the buffer copy.
# | Average | Max | Min |
---|---|---|---|
Full toBitmap(..) |
195.10 ms | 233 ms | 178 ms |
Without toNv21(..) |
32.00 ms | 40 ms | 27 ms |
Table: The numbers are computed on Pixel 4a for an 8MP (3264x2448) image
Do you see just how sad that Java copy was? It took 83%
of net computation time. BTW, this total time is still faster
than pure Java approach but slower than full native approach mentioned in How to use YUV (YUV_420_888) Image in Android.
Let’s do better with a native toNv21(..)
method.
toNv21(Image image) Java Native (JNI) Approach
In this case we basically port the java copy code to native and plug it in to the Android code with JNI. Here I am assuming you know how to set up JNI.
Java code:
class YuvConvertor {
// .. Other stuff
static {
System.loadLibrary("yuv2rgb-lib");
}
// In practice I don't recommend nullable byte[], Use optional?
@Nullable
public static byte[] toNv21(Image image) {
byte[] nv21 = new byte[(int) (image.getWidth() * image.getHeight() * 1.5f)];
if (!yuv420toNv21(
image.getWidth(),
image.getHeight(),
image.getPlanes()[0].getBuffer(), // Y buffer
image.getPlanes()[1].getBuffer(), // U buffer
image.getPlanes()[2].getBuffer(), // V buffer
image.getPlanes()[0].getPixelStride(), // Y pixel stride
image.getPlanes()[1].getPixelStride(), // U/V pixel stride
image.getPlanes()[0].getRowStride(), // Y row stride
image.getPlanes()[1].getRowStride(), // U/V row stride
nv21)) {
return null;
}
return nv21;
}
public static native boolean yuv420toNv21(
int imageWidth,
int imageHeight,
ByteBuffer yByteBuffer,
ByteBuffer uByteBuffer,
ByteBuffer vByteBuffer,
int yPixelStride,
int uvPixelStride,
int yRowStride,
int uvRowStride,
byte[] nv21Output);
}
JNI code:
// yuv2rgb-lib.cc JNI code.
namespace {
void yuv420toNv21(int image_width, int image_height, const int8_t* y_buffer,
const int8_t* u_buffer, const int8_t* v_buffer, int y_pixel_stride,
int uv_pixel_stride, int y_row_stride, int uv_row_stride,
int8_t *nv21) {
// Copy Y channel.
for(int y = 0; y < image_height; ++y) {
int destOffset = image_width * y;
int yOffset = y * y_row_stride;
memcpy(nv21 + destOffset, y_buffer + yOffset, image_width);
}
if (v_buffer - u_buffer == sizeof(int8_t)) {
// format = nv21
// TODO: If the format is VUVUVU & pixel stride == 1 we can simply the copy
// with memcpy. In Android Camera2 I have mostly come across UVUVUV packaging
// though.
}
// Copy UV Channel.
int idUV = image_width * image_height;
int uv_width = image_width / 2;
int uv_height = image_height / 2;
for(int y = 0; y < uv_height; ++y) {
int uvOffset = y * uv_row_stride;
for (int x = 0; x < uv_width; ++x) {
int bufferIndex = uvOffset + (x * uv_pixel_stride);
// V channel.
nv21[idUV++] = v_buffer[bufferIndex];
// U channel.
nv21[idUV++] = u_buffer[bufferIndex];
}
}
}
} // nampespace
extern "C" {
jboolean Java_com_example_androidcv_camera_processing_YuvConvertor_yuv420toNv21(
JNIEnv *env, jclass clazz,
jint image_width, jint image_height, jobject y_byte_buffer,
jobject u_byte_buffer, jobject v_byte_buffer, jint y_pixel_stride,
jint uv_pixel_stride, jint y_row_stride, jint uv_row_stride,
jbyteArray nv21_array) {
auto y_buffer = static_cast<jbyte*>(env->GetDirectBufferAddress(y_byte_buffer));
auto u_buffer = static_cast<jbyte*>(env->GetDirectBufferAddress(u_byte_buffer));
auto v_buffer = static_cast<jbyte*>(env->GetDirectBufferAddress(v_byte_buffer));
jbyte* nv21 = env->GetByteArrayElements(nv21_array, nullptr);
if (nv21 == nullptr || y_buffer == nullptr || u_buffer == nullptr
|| v_buffer == nullptr) {
// Log this.
return false;
}
yuv420toNv21(image_width, image_height, y_buffer, u_buffer, v_buffer,
y_pixel_stride, uv_pixel_stride, y_row_stride, uv_row_stride,
nv21);
env->ReleaseByteArrayElements(nv21_array, nv21, 0);
return true;
}
} // extern "C"
I have the split the performance numbers into two part:
- Full
toBitmap(..)
method. toBitmap(..)
withouttoNv21(..)
to show the cost difference.
# | Average | Max | Min |
---|---|---|---|
Full toBitmap(..) |
31.5 ms | 62 ms | 41 ms |
Without toNv21(..) |
24.6 ms | 30 ms | 23 ms |
Table: The numbers are computed on Pixel 4a for an 8MP (3264x2448) image
I hope we can make do with these numbers! This is faster than the Java or Native implementation we had in How to use YUV (YUV_420_888) Image in Android.
Even faster?
I am fairly sure this can be made much faster by:
- Either, a custom RenderScript kernel than can read the three planes in Y, U & V directly.
- Faster NEON based implementation of native code.
Link to full code
Ok fine, take what you came for!
YUV_420_888 Image to Bitmap using ScriptIntrinsicYuvToRGB — GitHub Gist
Performance verdict
Appraoch with Native YUV 420 –> NV21 byte[] is faster that java or direct native implementations. See How to use YUV (YUV_420_888) Image in Android for more numbers.
# | Average | Max | Min | Comparison |
---|---|---|---|---|
Pure Java approach | 353 ms | 362 ms | 334 ms | 11.2x slower |
Standard Native approach | 89.5 ms | 91 ms | 88 ms | 2.8x slower |
ScriptIntrinsicYuvToRGB approach | 31.5 ms | 62 ms | 41 ms | NA |
Table: The numbers are computed on Pixel 4a for an 8MP (3264x2448) image. There are potential faster native approaches not included here.
But the GPU implementation using Vulcan or NEON extensions could be faster. I plan to do this full investigation soon. (I’ll update this article).
Just FYI, RenderScript is being deprecated in Android S (Android 12)
What why? I just learned about it
:(
And why the hell, did you just explain the whole stuff?
Relax, read more below first!
With Android S (Android 12), the team seems to have announced deprecation of RenderScript for following reasons in the blog post:
- RenderScript was introduced to allow developers to run computationally intensive code on CPU/GPU without making use of NDK or GPU specific APIs. It was abstracted by RenderScript.
- With Android’s evolution, NDK and GPU libraries using OpenGL have significantly improved. They give low level access to GPU hardware buffers and RenderScript no longer seems the most optimal way to accomplish the most performance critical tasks.
- Your current RenderScript will continue to work on existing devices, it would still compile for Android but on future Android devices, the internal implementation may be CPU only.
- Android team seems to have written a toolkit for migrating core intrinsics in RenderScript to highly optimized C++ implementations — android/renderscript-intrinsics-replacement-toolki
I’ll try out the intrinsic example by Android team and write about it!
If you know of a faster way to get this done with RenderScript or with other APIs in Android please let me know in the comment section.
References
- RenderScript — Android documentation
- C99 language — Wikipedia
- How to use YUV (YUV_420_888) Image in Android
- What are intrinsics? — StackOverflow
- RenderScript depreciation — Google Blog
- RenderScript migration — Android documentation
Attributions
Want to read more such similar contents?
I like to write articles on topic less covered on internet. They revolve around writing fast algorithms, image processing as well as general software engineering.
I publish many of them on Medium.
If you are already on medium - Please join 4200+ other members and Subscribe to my articles to get updates as I publish.
If you are not on Medium - Medium has millions of amazing articles from 100K+ authors. To get access to those, please join using my referral link. This will give you access to all the benefits of Medium and Medium shall pay me a piece to support my writing!
Thanks!