ROSIntegrationVision Fix UE 4.27 Compatibility

The original convertDepth function had two issues:

_mm_div_epi16 doesn't exist - there is no SSE/AVX integer division intrinsic Logic was incorrect - dividing float16 encoded bits as integers corrupts the values

This PR uses UE4's built-in FFloat16 class for portable float16→float32 conversion, then scales by 0.01 to convert cm→m.

Now no compilation errors, Produces correct depth values No longer requires F16C CPU support or manual UE4 recompilation

Dec 08 '25 18:12 khalidbourr

Hi @khalidbourr Thanks for the PR! I can't valide this functionality right now on my machine, but i was wondering if this has a negative impact on the compute time? Have you tested how fast this is in comparison? Will this slow down the overall image capturing compared to the old version?

Dec 08 '25 22:12 Sanic

Dear @Sanic, I haven’t evaluated it from that perspective yet. However, I did encounter build errors in Unreal Engine 4.27 on Linux, and this is the error message I received.”

FStaticMeshLODResources &LODModel = StaticMesh->RenderData->LODResources[PaintingMeshLODIndex]; ^ /home/vampiro/UnrealEngine-4.27/Engine/Source/Runtime/Engine/Classes/Engine/StaticMesh.h:519:2: note: 'RenderData' has been explicitly marked deprecated here UE_DEPRECATED(4.27, "Please do not access this member directly; use UStaticMesh::GetRenderData() or UStaticMesh::SetRenderData().") ^ /home/vampiro/UnrealEngine-4.27/Engine/Source/Runtime/Core/Public/Misc/CoreMiscDefines.h:234:43: note: expanded from macro 'UE_DEPRECATED' #define UE_DEPRECATED(Version, Message) [[deprecated(Message " Please update your code to the new API before upgrading to the next release, otherwise your project will no longer compile.")]] ^ In file included from /home/vampiro/Documents/Unreal Projects/AI4FOREST/Plugins/ROSIntegrationVision/Intermediate/Build/Linux/B4D820EA/UE4Editor/Development/ROSIntegrationVision/Module.ROSIntegrationVision.cpp:6: /home/vampiro/Documents/Unreal Projects/AI4FOREST/Plugins/ROSIntegrationVision/Source/ROSIntegrationVision/Private/VisionComponent.cpp:754:4: error: use of undeclared identifier '_mm_div_epi16'; did you mean '_mm_min_epi16'? _mm_div_epi16( ^~~~~~~~~~~~~ _mm_min_epi16 /home/vampiro/UnrealEngine-4.27/Engine/Extras/ThirdPartyNotUE/SDKs/HostLinux/Linux_x64/v19_clang-11.0.1-centos7/x86_64-unknown-linux-gnu/lib/clang/11.0.1/include/emmintrin.h:2412:1: note: '_mm_min_epi16' declared here _mm_min_epi16(__m128i __a, __m128i __b)

Dec 09 '25 00:12 khalidbourr

Alright. Can you see in your Log what the typical tick rate / delay is? There should be some debug outputs telling you how long generating and sending one Sensor image tuple took.

Dec 09 '25 14:12 Sanic

Once I do that I'll inform you.

Dec 10 '25 13:12 khalidbourr

Screenshot from 2025-12-11 02-22-43

I tested the VisionComponent tick timing on Linux (Ubuntu, Intel i7 7th Gen, GTX 1050, UE4.27, ROS Melodic). Initially, F16C was not enabled - I confirmed this with objdump -d libUE4Editor-ROSIntegrationVision.so | grep -i vcvtph2ps showing no output. I enabled F16C by adding -mf16c to LinuxToolChain.cs and also modified the convertDepth() function to use hardware intrinsics (_mm_cvtph_ps()) instead of the software FFloat16::GetFloat() loop. After rebuilding, objdump now shows vcvtph2ps instructions confirming F16C is compiled. However, performance remains at ~1000ms per tick (~1 FPS). Interestingly, the first ticks before ROS publishing are fast (~50ms), but once publishing starts, it drops to ~1 FPS - I think the bottleneck may be in ReadPixels, ROS network I/O, or thread synchronization rather than the depth conversion itself. I will check again, at the moment, my modif solve the building issue.

Dec 11 '25 01:12 khalidbourr

This is the current implementation of convertdepth I am using, not pushed yet!

void UVisionComponent::convertDepth(const uint16_t *in, __m128 *out) const { const size_t size = (Width * Height) / 4; const __m128 scale = _mm_set1_ps(0.01f);

for (size_t i = 0; i < size; ++i, in += 4, ++out)
{
    // F16C hardware conversion - 4 half-floats to 4 floats in ONE instruction!
    __m128i half4 = _mm_loadl_epi64((__m128i*)in);
    __m128 depth = _mm_cvtph_ps(half4);  // F16C intrinsic!
    *out = _mm_mul_ps(depth, scale);
}

}

Dec 11 '25 01:12 khalidbourr