Fix UE 4.27 Compatibility
The original convertDepth function had two issues:
_mm_div_epi16 doesn't exist - there is no SSE/AVX integer division intrinsic Logic was incorrect - dividing float16 encoded bits as integers corrupts the values
This PR uses UE4's built-in FFloat16 class for portable float16→float32 conversion, then scales by 0.01 to convert cm→m.
Now no compilation errors, Produces correct depth values No longer requires F16C CPU support or manual UE4 recompilation
Hi @khalidbourr Thanks for the PR! I can't valide this functionality right now on my machine, but i was wondering if this has a negative impact on the compute time? Have you tested how fast this is in comparison? Will this slow down the overall image capturing compared to the old version?
Dear @Sanic, I haven’t evaluated it from that perspective yet. However, I did encounter build errors in Unreal Engine 4.27 on Linux, and this is the error message I received.”
FStaticMeshLODResources &LODModel = StaticMesh->RenderData->LODResources[PaintingMeshLODIndex]; ^ /home/vampiro/UnrealEngine-4.27/Engine/Source/Runtime/Engine/Classes/Engine/StaticMesh.h:519:2: note: 'RenderData' has been explicitly marked deprecated here UE_DEPRECATED(4.27, "Please do not access this member directly; use UStaticMesh::GetRenderData() or UStaticMesh::SetRenderData().") ^ /home/vampiro/UnrealEngine-4.27/Engine/Source/Runtime/Core/Public/Misc/CoreMiscDefines.h:234:43: note: expanded from macro 'UE_DEPRECATED' #define UE_DEPRECATED(Version, Message) [[deprecated(Message " Please update your code to the new API before upgrading to the next release, otherwise your project will no longer compile.")]] ^ In file included from /home/vampiro/Documents/Unreal Projects/AI4FOREST/Plugins/ROSIntegrationVision/Intermediate/Build/Linux/B4D820EA/UE4Editor/Development/ROSIntegrationVision/Module.ROSIntegrationVision.cpp:6: /home/vampiro/Documents/Unreal Projects/AI4FOREST/Plugins/ROSIntegrationVision/Source/ROSIntegrationVision/Private/VisionComponent.cpp:754:4: error: use of undeclared identifier '_mm_div_epi16'; did you mean '_mm_min_epi16'? _mm_div_epi16( ^~~~~~~~~~~~~ _mm_min_epi16 /home/vampiro/UnrealEngine-4.27/Engine/Extras/ThirdPartyNotUE/SDKs/HostLinux/Linux_x64/v19_clang-11.0.1-centos7/x86_64-unknown-linux-gnu/lib/clang/11.0.1/include/emmintrin.h:2412:1: note: '_mm_min_epi16' declared here _mm_min_epi16(__m128i __a, __m128i __b)
Alright. Can you see in your Log what the typical tick rate / delay is? There should be some debug outputs telling you how long generating and sending one Sensor image tuple took.
Once I do that I'll inform you.
I tested the VisionComponent tick timing on Linux (Ubuntu, Intel i7 7th Gen, GTX 1050, UE4.27, ROS Melodic). Initially, F16C was not enabled - I confirmed this with objdump -d libUE4Editor-ROSIntegrationVision.so | grep -i vcvtph2ps showing no output. I enabled F16C by adding -mf16c to LinuxToolChain.cs and also modified the convertDepth() function to use hardware intrinsics (_mm_cvtph_ps()) instead of the software FFloat16::GetFloat() loop. After rebuilding, objdump now shows vcvtph2ps instructions confirming F16C is compiled. However, performance remains at ~1000ms per tick (~1 FPS). Interestingly, the first ticks before ROS publishing are fast (~50ms), but once publishing starts, it drops to ~1 FPS - I think the bottleneck may be in ReadPixels, ROS network I/O, or thread synchronization rather than the depth conversion itself. I will check again, at the moment, my modif solve the building issue.
This is the current implementation of convertdepth I am using, not pushed yet!
void UVisionComponent::convertDepth(const uint16_t *in, __m128 *out) const { const size_t size = (Width * Height) / 4; const __m128 scale = _mm_set1_ps(0.01f);
for (size_t i = 0; i < size; ++i, in += 4, ++out)
{
// F16C hardware conversion - 4 half-floats to 4 floats in ONE instruction!
__m128i half4 = _mm_loadl_epi64((__m128i*)in);
__m128 depth = _mm_cvtph_ps(half4); // F16C intrinsic!
*out = _mm_mul_ps(depth, scale);
}
}